Tissue-specific MicroRNAs and compositions and uses thereof

Information

  • Patent Grant
  • 8586726
  • Patent Number
    8,586,726
  • Date Filed
    Friday, January 15, 2010
    14 years ago
  • Date Issued
    Tuesday, November 19, 2013
    10 years ago
Abstract
The invention provides for isolated nucleic acid sequences of newly discovered micro RNAs that have been identified to exist in normal Human B cells and/or in tumor-related Human B cells, using an integrated bioinformatics method and pipeline described herein.
Description

All patents, patent applications and publications cited herein are hereby incorporated by reference in their entirety. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art as known to those skilled therein as of the date of the invention described and claimed herein.


This patent disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves any and all copyright rights.


SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Mar. 17, 2010, is named 19240US3.txt, and is 1,134,968 bytes in size.


LENGTHY TABLE

A lengthy table (for example, Table 11) is referenced in this application and has been filed as an Appendix to this invention. The specification of the application contains reference to the single table, Table 11, which consists of more than 51 pages, and is hereby incorporated by reference in its entirety. Table 11 contains information encompassing gene sequences pertaining to the analysis of cross-species conservation for miRNAs. The Table displays results for conservation of full-length mature miRNA sequences, and seed of the mature sequence.


BACKGROUND OF THE INVENTION

Various nucleic acid species are capable of modifying gene expression. These species include antisense RNA, siRNA, microRNA, RNA and DNA aptamers, anatgomirs, and decoy RNAs. Each of these nucleic acid species can inhibit target nucleic acid activity, including gene expression.


MicroRNAs (miRNAs, miR5) are 20-23 nucleotides (nt) RNA molecules that are produced by the processing of a larger enclosing stem-loop structure (>50 bp), called precursors, by cellular enzymes. miRNAs are processed from hairpin precursors of 70 nt (pre-miRNA) which are derived from primary transcripts (pri-miRNA) through sequential cleavage by the RNAse III enzymes drosha and dicer. miRNAs target the messenger RNA of other genes by binding to their 3′ UTR and interfering with their translation or causing degradation by enzyme targeting double-stranded RNA. miRNAs are non-coding RNAs (ncRNAs) that exist in a variety of organisms, including mammals, and are conserved in evolution. Many miRNAs tend to be clustered and transcribed as polycistrons and often have similar spatial temporal expression patterns. miRNAs have been implicated in various biological processes including developmental timing, differentiation, apoptosis, cell proliferation, organ development, and metabolism


SUMMARY OF THE INVENTION

The invention is based, at least in part, on the discovery of newly-identified microRNAs from normal and tumor-related Human B cells. Accordingly, in one aspect, the invention features an isolated nucleic acid, wherein the nucleic acid: (a) consists of from about 14 to about 31 nucleotides in length; (b) exhibits expression in a human tissue; (c) has a nucleotide sequence not present in an exon; and (d) consists essentially of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1-130 and 1094, and a nucleotide sequence which is about 97%, about 98%, or about 99% identical to a nucleic acid sequence comprising any one of SEQ ID NOS: 1-130 and 1094. In one embodiment, the nucleic acid is single stranded. In another embodiment, the nucleic acid is double-stranded. In a further embodiment, the human tissue comprises a lymphocyte. In some embodiments, the human tissue is a B cell. In other embodiments, the B cell comprises a Naïve B cell, a centroblast, a memory B cell, or a Ramos Burkitt Lymphoma cell.


An aspect of the invention provides for an isolated nucleic acid, wherein the nucleic acid: (a) consists of from about 14 to about 31 nucleotides in length; (b) exhibits expression in a human tissue; (c) has a nucleotide sequence not present in an exon; and (d) consists essentially of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 131-401, and a nucleotide sequence which is about 97%, about 98%, or about 99% identical to a nucleic acid sequence having a SEQ ID NO: 131-401. In one embodiment, the nucleic acid is single stranded. In another embodiment, the nucleic acid is double-stranded. In a further embodiment, the human tissue comprises a lymphocyte. In some embodiments, the human tissue is a B cell. In other embodiments, the B cell comprises a Naïve B cell, a centroblast, a memory B cell, or a Ramos Burkitt Lymphoma cell.


The invention provides for an isolated nucleic acid that is complementary to a nucleic acid described in the aspects herein. In one embodiment, the nucleic acid is single stranded. In another embodiment, the nucleic acid is double-stranded.


The invention provides for an isolated nucleic acid that is complementary to all but 1, 2, 3, 4, or 5 nucleotides of the nucleic acids described in the aspects herein. In one embodiment, the nucleic acid is complementary to at least 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or 31 contiguous nucleotides of a nucleic acid described in the aspects herein consisting from about 14 to about 31 nucleotides in length. In one embodiment, the nucleic acid is single stranded. In another embodiment, the nucleic acid is double-stranded.


The invention provides for a composition comprising one or more nucleic acids of described in the aspects herein, in any combination or permutation thereof. In one embodiment, the composition further comprises one or more carriers, excipients, solvents, bases, or a combination thereof.


The invention provides for a composition comprising one or more nucleic acids, wherein the one or more nucleic acids consist essentially of a nucleotide sequence of any one of SEQ ID NOS: 1-401 and 1094. In one embodiment, the composition further comprises one or more carriers, excipients, solvents, bases, or a combination thereof.


The invention provides for a method for modulating the activity of a target nucleic acid in a cell, wherein the method comprises contacting a cell with a nucleic acid described in the aspects herein. In one embodiment, the target nucleic acid is a mRNA, a mature miRNA, or a precursor to a mature miRNA. In another embodiment, the cell is a hematopoetic cell. In a further embodiment, the cell is a B cell. In some embodiments, the cell is in vitro or in vivo.





BRIEF DESCRIPTION OF THE FIGURES


FIG. 1 is a flow diagram of a computational pipeline.



FIG. 2 is a flow diagram of a computational pipeline.



FIG. 3 is a flow diagram of a computational pipeline.



FIG. 4 is a graph depicting frequency distributions.



FIG. 5 is a diagram for new miRNAs identified in CB, Memory, and Naïve cells.



FIG. 6 is a diagram for known miRNAs identified in CB, Memory, and Naïve cells.



FIG. 7 is a diagram for new miRNAs identified in CB and RA cells.



FIG. 8 is a diagram for known miRNAs identified in CB and RA cells



FIG. 9 is a bar graph depicting computational predictions of precursors from cloned mature miRNA.



FIG. 10 is a schematic depicting the experimental procedure. The experimental approach includes two main steps: cloning and sequencing of short-RNA and computational analysis of sequences in order to identify potential miRNAs.



FIG. 11 is a diagram representing the Computational analysis of short-RNA libraries. Short-RNA sequences were grouped in three main categories: miRNAs, short-RNAs of unknown function and short-RNAs not matching the human genome. Short-RNA sequences were aligned to the human genome (hg18 assembly) and if a favorable match was identified the sequences were subject to computational identification of candidate miRNAs. Short-RNAs which aligned in the same genomic location were clustered and considered as supporting sequences for the same miRNA. Annotations were used to eliminate RNA aligning with mRNA, tRNA, rRNA and other non-coding RNA species. Overall, 401 unique candidate mature miRNA were identified and compared to the miRBase database to detect previously reported miRNA. Among the short-RNAs lacking miRNA features 30% were annotated and the remaining might represent a part of the transcriptome whose functions are still unknown. Short-RNAs which could be matched to the human genome only with 2 or more mismatches were considered as potential short-RNA or miRNA with unknown genomic locations.



FIG. 12 is a bar graph depicting a computational prediction of precursors and mature miRNA. The number of predicted precursor miRNAs (pre-miR) and mature miRNAs (mature-miR) are plotted independently for each library and overall. The sequences matching miRNAs deposited in the miRBase database (v.11.0) are defined as “known” and conversely the sequences not previously reported are named “new”.



FIG. 13A is a line graph depicting the analysis of miRNA identified in B cells short-RNA libraries. It shows the previously reported (known) miRNA and newly identified (new) miRNAs as occurring in naïve, centroblasts, memory and Ramos cells.



FIG. 13B are schematics depicting number of miRNAs specifically or commonly identified in naïve, centroblasts and memory B cell short-RNA libraries. A larger overlap is observed for known compared to new miRNA (42% versus 15%).



FIG. 13C is a bar graph depicting the conservation analysis for orthologous miRNAs was performed in 5 mammal species for all miRNA reported in the miRBase database (miRBase-all) and for known and new mature miRNA identified in B cell libraries. Frequency of conserved miRNAs in each species is displayed.



FIG. 14 are photographs depicting the detection of miRNA by Northern Blot. FIG. 14A shows the detection of newly identified mature-miRNA species by Northern Blot in Ramos cell line, centroblasts (CB), and naïve B cells isolated from human tonsils. The naming of miRNA is provisional. FIG. 14B shows images displaying both the mature (20-25 nt) and the precursor (60-80 nt) miRNA species. miRNA expression can be regulated at transcriptional level (top panel) or at the processing level (bottom panel) when intermediate forms (pre-miRNA) are generated but are not fully processed to mature miRNA.



FIG. 15 is an image of a microarray-based miRNA expression profiling that distinguishes developmental stages of normal as well as malignant B cells. FIG. 15A represents unsupervised clustering performed using miRNA frequencies values (>=0.16) calculated as the fraction of the total pool of cloned miRNAs represented by a given miRNA in a library. FIG. 15A(1) and FIG. 15A(2) are joined at the hatched line (A-A) to comprise FIG. 15A. FIG. 15B shows the Unsupervised clustering of microarray-based miRNA expression profiles distinguishes centroblasts, naïve and memory B cells purified from tonsil tissue of six patients/each. FIG. 15B(1) and FIG. 15B(2) are joined at the hatched line (A-A) to comprise FIG. 15B.



FIG. 16 is an image of a microarray-based miRNA expression profiling of GC-derived lymphomas. Unsupervised clustering of miRNA expression profiles of Burkitt lymphomas (BL), follicular lymphomas (FL) and diffuse large B cell lymphomas (DLBCL).



FIG. 17 are graphs demonstrating the complexity of libraries. The curves represent the estimation of the numbers of mature miRNA expressed in each library. Discarding outliers (extreme 5%), the lowest and highest miRNA counts observed per library sample are plotted. The current set of predicted mature miRNAs represents more than 80% of the estimated miRNA set expressed in the libraries. FIG. 17(1) and FIG. 17(2) are joined at the hatched line (A-A) to comprise FIG. 17.



FIG. 18 is flow chart depicting an overview of computational analysis for the short RNA libraries.



FIG. 19 is a graph of the predicted precursor and mature miRNAs. The number of predicted precursor miRNAs (pre-miR) and mature miRNAs (mature-miR) are plotted independently for each library and overall. Throughout the figures, the sequences matching miRNAs deposited in the miRBase database (v.11.0) are defined as “known,” and the sequences that to our knowledge have not been previously reported are named “new.”



FIG. 20 shows graphs and charts pertaining to the abundance and evolutionary conservation of the B-cell miRNome. FIG. 20A is a graph of the frequencies of previously reported (known) and to our knowledge newly identified (new) miRNAs as occurring in centroblasts, memory and Ramos cells. Single occurrences miRNAs are not included. FIG. 20B are Venn diagrams showing the number of miRNAs cloned multiple times and identified in naïve, centroblasts and memory B cells. A larger overlap is observed for known compared to new miRNA (48% versus 38%). FIG. 20C are bar graphs that show the conservation analysis for orthologous miRNAs that was performed in 5 mammal species for all miRNA reported in the miRBase database (miRBase-all) and for known and new mature miRNA expressed in the B-cell libraries. The percentages of miRNAs having either perfect conservation for the entire mature miRNA (top panel) or for its seeds (bottom panel) are displayed.



FIG. 21 are photographs showing the detection of previously unreported miRNAs by RT-PCR and RNA blot. FIG. 21A shows representative results of RT-PCR detection of miRNA in Ramos cell line and tonsil cells. miR-30c was used as loading control. FIG. 21B shows the detection of mature-miRNA species by RNA blot in Ramos cell line, centroblasts (CB) and naïve B cells isolated from human tonsils. RNU44 was used as loading control. FIG. 21C are RNA blot images displaying both the mature (20-25 nt) and the precursor (60-80 nt) miRNA species. miRNA expression can be regulated at transcriptional level (top panel) or at the processing level (bottom panel) when intermediate forms (pre-miRNA) are generated but are not fully processed to mature miRNA. The naming of miRNAs is provisional.



FIG. 22 is a schematic that shows miRNA expression profiling distinguishes developmental stages of normal as well as malignant B cells. FIG. 22A depicts unsupervised clustering performed using miRNA frequencies values (≧0.08) calculated as the fraction of the total pool of cloned miRNAs represented by a given miRNA in a library. FIG. 22A(1) and FIG. 22A(2) are joined at the hatched line (A-A), FIG. 22A(2) and FIG. 22A(3) are joined at the hatched line (B-B), and FIG. 22A(3) and FIG. 22A(4) are joined at the hatched line (C-C) to comprise FIG. 22A. FIG. 22B shows unsupervised clustering of microarray-based miRNA expression profiles distinguishes centroblasts, naïve and memory B cells purified from tonsil tissue of six patients/each. FIG. 22B(1) and FIG. 22B(2) are joined at the hatched line (A-A) to comprise FIG. 22B.



FIG. 23 are graphs that show the complexity of the libraries. The curves represent the estimation of the numbers of mature miRNA (including single occurrence candidate miRNA) expressed in each library. Discarding outliers (extreme 5%), the lowest and highest miRNA counts observed per library sample are plotted. The current set of predicted mature miRNAs represents more than 85% of the estimated miRNA set expressed in the libraries.



FIG. 24 is a schematic that shows the computational analysis of short-RNA libraries. Short-RNA sequences were grouped in three main categories: miRNAs, short-RNAs of unknown function and short-RNAs not matching the human genome.



FIG. 25 is a bar graph that shows new miRNAs are found in association with Ago2 protein complex. New miRNAs as well as known (miR-16) are enriched in Ago2 compared to control IgG immunoprecipitates (IP). The binding is specific since other RNA species (5s rRNA) are not enriched in the Ago2 immunoprecipitates. Bars represent the internally normalized average of two independent qPCR assays, each from two sets of three pooled immunoprecipitations. Error bars are the standard deviation of the measurements. The 4 new miRNAs found to be associated with the Ago2 complex are representative of miRNA cloned at higher level in non-GC B cells (CU-1254), in GC B cells (CU-1403; CU-1276) or aberrantly over-expressed in Ramos Burkitt lymphoma cells (CU-1137).



FIG. 26 is a bar graph that shows enrichment for predicted miRNA targets in genes down-regulated in GC compared to Naïve B cells. Target prediction was performed for 15 new miRNAs expressed at higher level in GC compared to naïve B cells (>3 fold) by miRanda v1.0 and RNA22. Targets predicted by both algorithms were tested for enrichment in the down-regulated genes of the GC transcriptome. Eleven out of 15 GC-over-expressed miRNAs showed an increase in their candidate target enrichment p-value in GC vs naïve down-regulated genes compared to control populations (memory vs naïve), two showed a decrease and two showed no differences. Enrichment p-values are reported in Table 13.



FIG. 27 is a plot showing the correlation measurement between cloning and miRNA expression array data. miRNA normalized clone counts and average expressions measured by miRNA expression arrays are represented in a scatter plot format. The plot includes data for miRNAs which have been cloned more than once and were represented on the Agilent Human miRNA Microarray. Overall, this analysis include 89 miRNA sequences distributed as following in the three libraries: 54 in naïve, 80 in centroblasts (CB) and 48 in memory. The Spearman correlation is 0.7 corresponding to a p-value <3.9e-28.



FIG. 28 is a bar graph showing the analysis of single nucleotide mismatches identified in cloned known miRNA. The plot represents the percentage of short-RNA corresponding to known miRNA and displaying mismatches to the human genome. Nucleotide in position 1 to 3 starting from both 5′- and 3′-end as well as all remaining middle nucleotides (nt middle) of each sRNA were analyzed for single substitutions.





DETAILED DESCRIPTION OF THE INVENTION

This invention provides for the discovery of a large number of new micro RNAs that have been identified to exist in normal Human B cells and/or in tumor-related Human B cells, using an integrated bioinformatics method and pipeline described herein.


Micro RNAs (miRNAs) are naturally-occurring 19 to 25 nucleotide transcripts found in over one hundred distinct organisms (such as nematodes, fruit flies, and humans). miRNAs can be processed from 60- to 70-nucleotide foldback RNA precursor structures, which are transcribed from the miRNA gene. The miRNA precursor processing reaction requires Dicer RNase III and Argonaute family members (Sasaki et al., 2003 Genomics 82, 323-330). The miRNA precursor or processed miRNA products are easily detected, and an alteration in the levels of these molecules within a cell can indicate a perturbation in the chromosomal region containing the miRNA gene.


At least 222 separate miRNA genes have been identified in the human genome. For example, 2 miRNA genes (miR15a and miR16a) have been localized to a homozygously deleted region on chromosome 13 that is correlated with chronic lymphocytic leukemia (Calin et al. (2002), Proc. Natl. Acad. Sci. USA 99:15524-29). However, the distribution of miRNA genes throughout the genome, and the relationship of the miRNA genes to diverse chromosomal features, has not been systematically studied. A further review of miRNAs is provided in U.S. Pat. No. 7,232,806, U.S. Patent Application Publication No. 2006/0105360, and in the references: Landgraf et al., 2007, Cell 129: 1401-1414; Mendell, J T, 2005 Cell Cycle 4(9):1179-84; Shivdasani R A, 2006 Blood 108(12):3646-53; Hwang and Mendell, 2006 Br J Cancer 94(6):776-80; Hammond S M, 2006; Curr Opin Genet Dev. 16(1):4-9; Osada and Takahashi, 2007 Carcinogenesis 28(1):2-12; and Zhang et al., 2007 Dev Biol. 302(1):1-12, all of which are hereby incorporated by reference in their entirety.


All nucleic acid sequences herein are given in the 5′ to 3′ direction, for example the mature miRNA sequences listed in Table 1 (SEQ ID NOS: 1-401).


The unprocessed miRNA gene transcript is called a miRNA precursor (pre-miRNA) and comprises an RNA transcript of about 70 nucleotides in length. The pre-miRNA can be processed by digestion with an RNAse (such as, Dicer, Argonaut, or RNAse III, e.g., E. coli RNAse III)) into an active 19-25 nucleotide RNA molecule. This active 19-25 nucleotide RNA molecule is also called the processed miRNA gene transcript.


The active 19-25 nucleotide RNA molecule can be obtained from the miRNA precursor through natural processing routes (for example, using intact cells or cell lysates) or by synthetic processing routes (for example, using isolated processing enzymes, such as isolated Dicer, Argonaut, or RNAase III). The active 19-25 nucleotide RNA molecule can also be produced directly by biological or chemical syntheses, without having been processed from the miRNA precursor.


The invention provides for an isolated nucleic acid that: (a) consists of from about 14 to about 31 nucleotides in length; (b) exhibits expression in a human tissue; and (c) has a nucleotide sequence not present in an exon. In one embodiment, the isolated nucleic acid consists essentially of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1-401 and 1094, and a nucleotide sequence which is about 97%, about 98%, or about 99% identical to a nucleic acid sequence comprising any one of SEQ ID NOS: 1-401 and 1094. In some embodiments, the human tissue comprises a lymphocyte (for example, a human B cell). In other embodiments, the B cell comprises a Naïve B cell, a centroblast, or a memory B cell.


For example, an isolated nucleic acid, such as a miRNA of the invention, can be synthesized, or altered, or removed from the natural state through human intervention. A synthetic miRNA, or a miRNA partially or completely separated from the coexisting materials of its natural state, is considered isolated. An isolated miRNA can exist in substantially purified form, or can exist in a cell into which the miRNA has been delivered.


An isolated nucleic acid, such as a miRNA of the invention, can be obtained using a number of standard techniques utilized in the art. For example, the miRNA gene products can be chemically synthesized or recombinantly produced using methods known in the art. For example, a miRNA can be chemically synthesized using appropriately protected ribonucleoside phosphoramidites and a conventional DNA/RNA synthesizer. Commercial suppliers of synthetic RNA molecules or synthesis reagents include, e.g., Proligo (Hamburg, Germany), Dharmacon Research (Lafayette, Colo., USA), Rosetta Genomics (North Brunswick, N.J.), Pierce Chemical (part of Perbio Science, Rockford, Ill., USA), Glen Research (Sterling, Va., USA), ChemGenes (Ashland, Mass., USA), Ambion (Foster City, Calif., USA), and Cruachem (Glasgow, UK).


miRNA can also be expressed from recombinant circular or linear DNA plasmids using any suitable promoter. Suitable promoters for expressing RNA from a plasmid include, e.g., the U6 or H1 RNA pol III promoter sequences, or the cytomegalovirus promoters. Selection of other suitable promoters is within the skill in the art. Recombinant plasmids can comprise inducible or regulatable promoters for expression of the miRNA in cancer cells (such as hematopoietic cells, i.e., B cells). For example, a miRNA or a precursor miRNA of the invention (such as a miRNA molecule comprising any one of SEQ ID NOS: 1-401 and 1094) can be placed under the control of the CMV intermediate-early promoter, whereby the nucleic acid sequences encoding the miRNA molecule are located 3′ of the promoter, so that the promoter can initiate transcription of the miRNA gene product coding sequences.


miRNAs expressed from recombinant plasmids can be isolated from cultured cell expression systems by standard techniques. miRNAs which are expressed from recombinant plasmids can also be delivered to, and expressed directly in, the cancer cells. A miRNA can be expressed as an RNA precursor molecule from a single plasmid, and the precursor molecules are subsequently processed into functional miRNAs by a suitable processing system, including the processing systems naturally existing within a cell. Other suitable processing systems include, e.g., the in vitro Drosophila cell lysate system as described in U.S. Application Publication No. 2002/0086356 to Tuschl et al. and the E. coli RNAse III system described in U.S. Application Publication No. 2004/0014113 to Yang et al., which are herein incorporated by reference in their entireties.


Plasmids suitable for expressing a miRNA of the invention, methods for inserting nucleic acid sequences into the plasmid to express the miRNA of interest, and methods of delivering the recombinant plasmid to cells of interest are well-established and practiced in the art. See, for example, Zeng et al. (2002), Molecular Cell 9:1327-1333; Tuschl (2002), Nat. Biotechnol, 20:446-448; Brummelkamp et al. (2002), Science 296:550-553; Miyagishi et al. (2002), Nat. Biotechnol. 20:497-500; Paddison et al. (2002), Genes Dev. 16:948-958; Lee et al. (2002), Nat. Biotechnol. 20:500-505; and Paul et al. (2002), Nat. Biotechnol. 20:505-508, the entire disclosures of which are herein incorporated by reference.


miRNA molecules of the invention can also be expressed from recombinant viral vectors. The RNA expressed from the recombinant viral vectors can either be isolated from cultured cell expression systems by standard techniques, or can be expressed directly in cancer cells (such as hematopoietic cells, i.e., B cells). For example, the recombinant viral vectors can comprise sequences that encode the miRNA molecule of interest and any suitable promoter for expressing the RNA sequences. Vectors can also comprise inducible or regulatable promoters for expression of the miRNA molecule in cells, such as cancer cell. As discussed previously, non-limiting examples of suitable promoters include the U6 or H1 RNA pol III promoter sequences, or the cytomegalovirus promoters. Selection of other suitable promoters is practiced by those of ordinary skill in the art.


Any viral vector that can harbor the nucleotide sequences for the miRNA molecules of the invention can be used. Non-limiting examples of such vectors include: vectors derived from adenovirus (AV); adeno-associated virus (AAV); retroviruses (e.g., lentiviruses (LV), Rhabdoviruses, murine leukemia virus); herpes virus, and the like. The tropism of the viral vectors can be modified by pseudotyping the vectors with envelope proteins or other surface antigens from other viruses, or by substituting different viral capsid proteins, as appropriate. For example, lentiviral vectors can be pseudotyped with surface proteins from vesicular stomatitis virus (VSV), rabies, Ebola, Mokola, and the like. For example, AAV vectors can be made to target different cells by engineering the vectors to express different capsid protein serotypes. An AAV vector expressing a serotype 2 capsid on a serotype 2 genome is called AAV 2/2. This serotype 2 capsid gene in the AAV 2/2 vector can be replaced by a serotype 5 capsid gene to produce an AAV 2/5 vector. Techniques for constructing AAV vectors which express different capsid protein serotypes are within the skill in the art; see, e.g., Rabinowitz J. E. et al. (2002), J Virol 76:791-801, the entire disclosure of which is herein incorporated by reference.


Recombinant viral vectors suitable for expressing miRNA molecules of the invention, methods for inserting nucleic acid sequences for expressing RNA in the vector, methods of delivering the viral vector to cells of interest, and recovery of the expressed RNA molecules are within the skill in the art. See, for example, Dornburg (1995), Gene Therap. 2:301-310; Eglitis (1988), Biotechniques 6:608-614; Miller (1990), Hum. Gene Therap. 1:5-14; and Anderson (1998), Nature 392:25-30, the entire disclosures of which are herein incorporated by reference. Useful viral vectors can be those derived from AV and AAV. A suitable AV vector for expressing a mRNA molecule of the invention, a method for constructing the recombinant AV vector, and a method for delivering the vector into target cells, are described in Xia et al. (2002), Nat. Biotech. 20:1006-1010, the entire disclosure of which is herein incorporated by reference. Suitable AAV vectors for expressing a miRNA molecule having a sequence shown in Table 1 (i.e., any one of SEQ ID NOS: 1-401), methods for constructing the recombinant AAV vector, and methods for delivering the vectors into target cells are described in Samulski et al. (1987), J. Virol. 61:3096-3101; Fisher et al. (1996), J. Virol., 70:520-532; Samulski et al. (1989), J. Virol. 63:3822-3826; U.S. Pat. No. 5,252,479; U.S. Pat. No. 5,139,941; International Patent Application No. WO 94/13788; and International Patent Application No. WO 93/24641, the entire disclosures of which are herein incorporated by reference.


Inhibition of RNA can effectively inhibit expression of a gene from which the RNA is transcribed. Inhibitors are selected from the group comprising: siRNA; interfering RNA or RNAi; dsRNA; RNA Polymerase III transcribed DNAs; ribozymes; and antisense nucleic acid, which can be RNA, DNA, or artificial nucleic acid. Also within the scope of the present invention are oligonucleotide sequences that include antisense oligonucleotides, antagomirs (also referred to as miRNA inhibitory nucleic acids), aptamers, and ribozymes that function to inhibit miRNA expression via purportedly binding to or degrading a miRNA molecule comprising any one of SEQ ID NOS: 1-401 and 1094.


The invention provides for a nucleic acid molecule that is substantially complementary to an isolated nucleic acid of the invention described above. “Substantially complementary” means that two sequences are substantially complementary that a duplex can be formed between them. The duplex can have one or more mismatches but the region of duplex formation is sufficient to down-regulate expression of the target nucleic acid. The region of substantial complementarity can be perfectly paired. In one embodiment, there can be nucleotide mismatches in the region of substantial complementarity. In one embodiment, the region of substantial complementarity will have no more than 1, 2, 3, 4, or 5 mismatches.


For example, an antagomir, an antisense RNA, a small interfering RNA (siRNA), a short hairpin RNA (snRNA), and the like) can be complementary to the guide strand of a miRNA having a nucleotide sequence shown in Table 1, positioned in the RNA silencing complex. This nucleic acid molecule can be single stranded or can be double stranded, and can inhibit the expression or activity of a miRNA molecule of the invention. In one embodiment, the nucleic acid molecule that inhibits a miRNA molecule of the invention (such as those described above) can complement at least 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or 31 contiguous nucleotides of a miRNA having a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 1-401 and 1094, and a nucleotide sequence which is about 97%, about 98%, or about 99% identical to a nucleic acid sequence comprising any one of SEQ ID NOS: 1-401 and 1094.


The invention also provides a method for modulating a target nucleic acid in a cell (for example, a miRNA molecule having a nucleotide sequence comprising any one of SEQ ID NOS: 1-401 and 1094) via contacting the cell with a nucleic acid of the invention (for example, those described above). For example, the nucleic acid can be substantially complementary to at least 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or 31 contiguous nucleotides of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 1-401 and 1094, and a nucleotide sequence which is about 97%, about 98%, or about 99% identical to a nucleic acid sequence comprising any one of SEQ ID NOS: 1-401 and 1094.


Expression of a miRNA molecule of the invention can be inhibited by an antisense oligonucleotide. Antisense oligonucleotides can comprise antisense DNA, RNA, and DNA/RNA molecule and act via altering the activity of the target RNA by binding to a target nucleic acid (such as a miRNA of interest) by means of RNA-RNA, RNA-DNA or RNA-PNA (protein nucleic acid) interactions (for a review, see Stein and Cheng, 1993 Science 261, 1004 and Woolf et al., U.S. Pat. No. 5,849,902). Antisense oligonucleotides suitable for use in the present methods are single-stranded nucleic acids (e.g., RNA, DNA, RNA-DNA chimeras, PNA) that generally comprise a nucleic acid sequence complementary to a contiguous nucleic acid sequence in a miRNA molecule. For example, the antisense oligonucleotide comprises a nucleic acid sequence that is 50-100% complementary, 75-100% complementary, or 95-100% complementary to a contiguous nucleic acid sequence in a miRNA molecule of the invention having a nucleic acid sequence of SEQ ID NO: 1-401, shown in Table 1. However, in some instances, an antisense molecule can form a loop and binds to a substrate nucleic acid which forms a loop. Thus, an antisense molecule can be complementary to two (or more) non-contiguous substrate sequences, or two (or more) non-contiguous sequence portions of an antisense molecule can be complementary to a target sequence, or both. For a review of current antisense strategies, see Schmajuk et al., 1999, J. Biol. Chem., 274, 21783-21789; Delihas et al., 1997, Nature, 15, 751-753; Stein et al., 1997, Antisense N A. Drug Dev., 7, 151; Crooke, 2000, Methods Enzymol., 313, 3-45; Crooke, 1998, Biotech. Genet. Eng. Rev., 15, 121-157; Crooke, 1997, Ad. Pharmacol., 40, 1-49.


Antisense DNA can also be used to target nucleic acid by means of DNA-RNA interactions, thereby activating RNase H, which digests the target nucleic acid in the duplex. The antisense oligonucleotides can comprise one or more RNAse H activating region, which is capable of activating RNAse H to cleave a target nucleic acid. Antisense DNA can be synthesized chemically or expressed via the use of a single stranded DNA expression vector or equivalent thereof. An RNase H activating region refers to a region (generally greater than or equal to 4-25 nucleotides in length, for example, from 5-11 nucleotides in length) of a nucleic acid compound capable of binding to a target nucleic acid to form a non-covalent complex that is recognized by cellular RNase H enzyme (see for example Arrow et al., U.S. Pat. No. 5,849,902; Arrow et al., U.S. Pat. No. 5,989,912). The RNase H enzyme binds to a nucleic acid compound-target nucleic acid complex and cleaves the target nucleic acid sequence.


Antisense nucleic acids can be produced chemically or biologically, or can be expressed from a recombinant plasmid or viral vector, as described above for the isolated miRNA molecules. For example, antisense oligonucleotides of at least about 15 bases and complementary to unique regions of the miRNA molecules of the invention can be synthesized, e.g., by conventional phosphodiester techniques (Dallas et al., (2006) Med. Sci. Monit. 12(4):RA67-74; Kalota et al., (2006) Handb. Exp. Pharmacol. 173:173-96; Lutzelburger et al., (2006) Handb. Exp. Pharmacol. 173:243-59). Exemplary methods for producing and testing are within the skill in the art; see, e.g., Stein and Cheng (1993), Science 261:1004 and U.S. Pat. No. 5,849,902 to Woolf et al., the entire disclosures of which are herein incorporated by reference.


Antisense polynucleotides include, but are not limited to: morpholinos, 2′-O-methyl polynucleotides, DNA, RNA and the like. RNA polymerase III transcribed DNAs contain promoters, such as the U6 promoter. These DNAs can be transcribed to produce small hairpin RNAs in the cell that can function as siRNA or linear RNAs that can function as antisense RNA. The inhibitor can be polymerized in vitro, recombinant RNA, contain chimeric sequences, or derivatives of these groups. The inhibitor can contain ribonucleotides, deoxyribonucleotides, synthetic nucleotides, or any suitable combination such that the target RNA and/or gene is inhibited. In addition, these forms of nucleic acid can be single, double, triple, or quadruple stranded. (see for example Bass (2001) Nature, 411, 428 429; Elbashir et al., (2001) Nature, 411, 494 498; and PCT Publication Nos. WO 00/44895, WO 01/36646, WO 99/32619, WO 00/01846, WO 01/29058, WO 99/07409, WO 00/44914).


siRNA comprises a double stranded structure that can contain 15 to 50 base pairs, or 21 to 25 base pairs, and having a nucleotide sequence identical or nearly identical to an expressed target gene or RNA within the cell. The siRNA comprise a sense RNA strand and a complementary antisense RNA strand annealed together by standard Watson-Crick base-pairing interactions. The sense strand comprises a nucleic acid sequence which is substantially identical to a nucleic acid sequence contained within the target miRNA molecule. “Substantially identical” to a target sequence contained within the target mRNA refers to a nucleic acid sequence that is identical to the target sequence, or that differs from the target sequence by one or two nucleotides. The sense and antisense strands of the siRNA can comprise two complementary, single-stranded RNA molecules, or can comprise a single molecule in which two complementary portions are base-paired and are covalently linked by a single-stranded “hairpin” area.


The siRNA can also be altered RNA that differs from naturally-occurring RNA by the addition, deletion, substitution and/or alteration of one or more nucleotides. Such alterations can include addition of non-nucleotide material, such as to the end(s) of the siRNA or to one or more internal nucleotides of the siRNA, or modifications that make the siRNA resistant to nuclease digestion, or the substitution of one or more nucleotides in the siRNA with deoxyribonucleotides. One or both strands of the siRNA can also comprise a 3′ overhang. As used herein, a 3′ overhang refers to at least one unpaired nucleotide extending from the 3′-end of a duplexed RNA strand. For example, the siRNA can comprise at least one 3′ overhang of from 1 to about 6 nucleotides (which includes ribonucleotides or deoxyribonucleotides) in length, or from 1 to about 5 nucleotides in length, or from 1 to about 4 nucleotides in length, or from about 2 to about 4 nucleotides in length. For example, each strand of the siRNA can comprise 3′ overhangs of dithymidylic acid (“TT”) or diuridylic acid (“uu”).


siRNA can be produced chemically or biologically, or can be expressed from a recombinant plasmid or viral vector, as described above for the miRNA molecules of the invention having a sequence shown in Table 1. Exemplary methods for producing and testing dsRNA or siRNA molecules are described in U.S. Patent Application Publication No. 2002/0173478 to Gewirtz, U.S. Patent Application Publication No. 2007/0072204 to Hannon et al., and in U.S. Patent Application Publication No. 2004/0018176 to Reich et al., the entire disclosures of which are herein incorporated by reference.


Expression of a miRNA molecule of the invention can also be inhibited by a short hairpin RNA (shRNA). The hairpin RNAs can be synthesized exogenously or can be formed by transcribing from RNA polymerase III promoters in vivo. Examples of making and using such hairpin RNAs for gene silencing in mammalian cells are described in, for example, Paddison et al., 2002, Genes Dev, 16:948-58; McCaffrey et al., 2002, Nature, 418:38-9; McManus et al., 2002, RNA, 8:842-50; Yu et al., 2002, Proc Natl Acad Sci USA, 99:6047-52). Such hairpin RNAs are engineered in cells or in an animal to ensure continuous and stable suppression of a desired gene. It is known in the art that siRNAs can be produced by processing a hairpin RNA in the cell.


Expression of a miRNA molecule of the invention can also be inhibited by a ribozyme. Ribozymes are enzymatic RNA molecules capable of catalyzing the specific cleavage of RNA. The mechanism of ribozyme action involves sequence specific hybridization of the ribozyme molecule to complementary target RNA sequences (for example those shown in Table 1), followed by endonucleolytic cleavage. Engineered hammerhead motif ribozyme molecules that specifically and efficiently catalyze endonucleolytic cleavage of a miRNA sequence shown in Table 1, are also within the scope of the present invention. Scanning the target molecule for ribozyme cleavage sites that include the following sequences, GUA, GUU, and GUC initially identifies specific ribozyme cleavage sites within any potential RNA target. Once identified, short RNA sequences of between about 15 and 20 ribonucleotides corresponding to the region of the target gene containing the cleavage site can be evaluated for predicted structural features such as secondary structure that can render the oligonucleotide sequence unsuitable.


The suitability of candidate targets can also be evaluated by testing their accessibility to hybridization with complementary oligonucleotides using, e.g., ribonuclease protection assays (see Romkes et al., 2005, Methods Mol. Biol.; 291:387-98; Dvorak et al., 2003, Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub 147(2):131-5). The construction and production of hammerhead ribozymes is well known in the art and is described more fully in Haseloff and Gerlach, 1988, Nature, 334:585-591. Ribozymes can also include RNA endoribonucleases such as the one which occurs naturally in Tetrahymena thermophila (known as the IVS or L-19 IVS RNA) and which has been described (see, e.g., Zaug, et al., 1984, Science, 224:574-578; Zaug and Cech, 1986, Science, 231:470-475; Zaug, et al., 1986, Nature, 324:429-433; published International patent application No. WO88/04300 by University Patents Inc.; Been and Cech, 1986, Cell, 47:207-216).


Both the antisense oligonucleotides and ribozymes of the present invention can be prepared by known methods. These include techniques for chemical synthesis such as, e.g., by solid phase phosphoamite chemical synthesis. Alternatively, antisense RNA molecules can be generated by in vitro or in vivo transcription of DNA sequences encoding the RNA molecule. Such DNA sequences can be incorporated into a wide variety of vectors that incorporate suitable RNA polymerase promoters such as the T7 or SP6 polymerase promoters.


Alternatively, expression of a miRNA molecule of the invention can be inhibited by an antagomir. An antagomir is a single-stranded, double stranded, partially double stranded or hairpin structured chemically modified oligonucleotide agent that comprises at least 12 or more contiguous nucleotides substantially complementary to an endogenous miRNA or agents that include 12 or more contiguous nucleotides substantially complementary to a target sequence of an miRNA or pre-miRNA nucleotide sequence. The antagomir can be RNA, DNA, or a combination of RNA and DNA, an is antisense with respect to its target nucleotide sequence. An antagomir can target RNA, e.g., an endogenous pre-miRNA or miRNA of the subject. For example, the antagomir can target a miRNA having a nucleic acid sequence shown in Table 1. Exemplary methods for producing and testing antagomirs are discussed in U.S. Patent Application Publication No. 2007/0123482 and U.S. Patent Application Publication No. 005/0182005, in addition to Mattes et al., 2007 Am J Resp Cell Mol Biol 36: 8-12; Krützfeldt et al., 2007 Nuc Acid Res 35(9): 2885-2892, which are all incorporated by reference in their entireties.


Various modifications to the nucleic acid molecules of the present invention can be introduced as a means of increasing intracellular stability and half-life. Some modifications include but are not limited to the addition of flanking sequences of ribonucleotides or deoxyribonucleotides to the 5′ and/or 3′ ends of the molecule, or the use of phosphorothioate or 2′-O-methyl rather than phosphodiesterase linkages within the oligonucleotide backbone.


Expression of a miRNA molecule of the invention can be inhibited by an aptamer. Aptamer nucleic acid sequences are readily made that bind to a wide variety of target molecules. The aptamer nucleic acid sequences of the invention can be comprised entirely of RNA or partially of RNA, or entirely or partially of DNA and/or other nucleotide analogs. A nucleic acid aptamer is a nucleic acid or a nucleic acid-like molecule that is capable of binding to a specific molecule of interest with high affinity and specificity. A nucleic acid aptamer also can be a nucleic acid molecule that mimics the three dimensional structure of active portions of miRNAs. A nucleic acid-aptamer can be between about 9 and about 300 nucleotides or the like in length. More commonly, an aptamer is between about 30 and about 100 nucleotides or the like in length.


Aptamers are developed to bind specific ligands by employing known in vivo or in vitro selection techniques known as SELEX (Systematic Evolution of Ligands by Exponential Enrichment). Nucleic acid-aptamers can be prepared by any known method, including synthetic, recombinant, and purification methods. Such methods are described in, for example, Ellington and Szostak (1990) Nature 346:818, Tuerk and Gold (1990) Science 249:505, James W., (2001) Current Opinion in Pharmacology, 1:540-546, Colas et al., (1996) Nature 380:548-550, U.S. Pat. No. 5,582,981; PCT Publication No. WO 00/20040; U.S. Pat. No. 5,270,163; Lorsch and Szostak (1994) Biochem. 33:973; Mannironi et al., (1997) Biochem. 36:9726; Blind (1999) Proc. Nat'l. Acad. Sci. USA 96:3606-3610; Huizenga and Szostak (1995) Biochem. 34:656-665; PCT Publication Nos. WO 99/54506, WO 99/27133, and WO 97/42317; and U.S. Pat. No. 5,756,291, all of which are incorporated by reference in their entireties.


Expression of a given miRNA molecule can be inhibited or decreased by inducing RNA interference of the miRNA molecule with an isolated double-stranded or single-stranded RNA molecule. For example, the miRNA inhibitor molecule can be those molecules discussed above, such as an antagomir, an antisense RNA, a small interfering RNA (siRNA), a short hairpin RNA (snRNA), and the like) which has at least about 75%, 80%, 90%, 95%, 98%, 99% or 100%, sequence homology to a portion of a miRNA molecule having a sequence shown in Table 1. For further discussion of modulation of miRNA expression, see: Lu et al., (2005) Adv Genet. 54:117-42; Leung and Whittaker (2005) Pharmacol Ther. 107(2):222-39; Takeshita and Ochiva (2006) Cancer Sci. 97(8):689-96; and Alexander et al., (2007) Arch Immunol Ther Exp (Warsz). 2007 May-June; 55(3):139-49.


A subject in need thereof, according to the invention, can refer to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals. In some embodiments, the subject can be a mouse, a rat, a bird, a dog, a cat, a cow, a horse, a sheep, or a pig. In exemplary embodiments, a mammal is a human.


Methods for determining RNA expression levels in cells from a biological sample are within the level of skill in the art. For example, tissue sample can be removed from a subject suspected of having cancer associated with a cancer-associated chromosomal feature by conventional biopsy techniques. In another example, a blood sample can be removed from the subject, and white blood cells isolated for DNA extraction by standard techniques. The blood or tissue sample should be obtained from the subject prior to initiation of radiotherapy, chemotherapy or other therapeutic treatment. A corresponding control tissue or blood sample can be obtained from unaffected tissues of the subject, from a normal human individual or population of normal individuals, or from cultured cells corresponding to the majority of cells in the subject's sample. The control tissue or blood sample is then processed along with the sample from the subject, so that the levels of miRNA molecules in cells from the subject's sample can be compared to the corresponding miRNA molecule levels from cells of the control sample. For example, the relative miRNA expression in the control and normal samples can be conveniently determined with respect to one or more RNA expression standards. The standards can comprise, for example, a zero miRNA expression level, the miRNA expression level in a standard cell line, or the average level of miRNA expression previously obtained for a population of normal human controls.


Suitable techniques for determining the level of RNA transcripts of a gene of interest in cells are within the skill in the art. According to one such method, total cellular RNA can be purified from cells by homogenization in the presence of nucleic acid extraction buffer, followed by centrifugation. Nucleic acids are precipitated, and DNA is removed by treatment with DNase and precipitation. The RNA molecules are then separated by gel electrophoresis on agarose gels according to standard techniques, and transferred to nitrocellulose filters by, e.g., the so-called “Northern” blotting technique. The RNA is then immobilized on the filters by heating. Detection and quantification of specific RNA is accomplished using appropriately labeled DNA or RNA probes complementary to the RNA in question. See, for example, Molecular Cloning: A Laboratory Manual, J. Sambrook et al., eds., 2nd edition, Cold Spring Harbor Laboratory Press, 1989, Chapter 7, the entire disclosure of which is incorporated by reference.


Suitable probes for Northern blot hybridization of a given miRNA molecule can be produced from the nucleic acid sequences provided in Table 1. Methods for preparation of labeled DNA and RNA probes, and the conditions for hybridization thereof to target nucleotide sequences, are described in Molecular Cloning: A Laboratory Manual, J. Sambrook et al., eds., 2nd edition, Cold Spring Harbor Laboratory Press, 1989, Chapters 10 and 11, the disclosures of which are herein incorporated by reference. For example, the nucleic acid probe can be labeled with, e.g., a radionuclide such as 3H, 32P, 33P, 14C, or 35S; a heavy metal; or a ligand capable of functioning as a specific binding pair member for a labeled ligand (e.g., biotin, avidin or an antibody), a fluorescent molecule, a chemiluminescent molecule, an enzyme or the like.


Probes can be labeled to high specific activity by either the nick translation method of Rigby et al. (1977), J. Mol. Biol. 113:237-251 or by the random priming method of Fienberg et al. (1983), Anal. Biochem. 132:6-13, the entire disclosures of which are herein incorporated by reference. Fienberg et al. provides a useful method for synthesizing 32P-labeled probes of high specific activity from single-stranded DNA or from RNA templates. For example, by replacing preexisting nucleotides with highly radioactive nucleotides according to the nick translation method, 32P-labeled nucleic acid probes can be prepared with a specific activity well in excess of 108 cpm/microgram. Autoradiographic detection of hybridization can then be performed by exposing hybridized filters to photographic film. Densitometric scanning of the photographic films exposed by the hybridized filters provides an accurate measurement of miRNA molecule levels. Using another approach, miRNA molecule levels can be quantified by computerized imaging systems, such the Molecular Dynamics 400-B 2D Phosphorimager available from Amersham Biosciences, Piscataway, N.J.


Where radionuclide labeling of DNA or RNA probes is not practical, the random-primer method can be used to incorporate an analogue, for example, the dTTP analogue 5-(N—(N-biotinyl-epsilon-aminocaproyl)-3-aminoallyl)deoxyuridine triphosphate, into the probe molecule. The biotinylated probe oligonucleotide can be detected by reaction with biotin-binding proteins, such as avidin, streptavidin, and antibodies (e.g., anti-biotin antibodies) coupled to fluorescent dyes or enzymes that produce color reactions.


In addition to Northern and other RNA blotting hybridization techniques, determining the levels of RNA transcripts can be accomplished using the technique of in situ hybridization. This technique requires fewer cells than the Northern blotting technique, and involves depositing whole cells onto a microscope cover slip and probing the nucleic acid content of the cell with a solution containing radioactive or otherwise labeled nucleic acid (e.g., cDNA or RNA) probes. This technique is well-suited for analyzing tissue biopsy samples from subjects. The practice of the in situ hybridization technique is described in more detail in U.S. Pat. No. 5,427,916, the entire disclosure of which is incorporated herein by reference. Suitable probes for in situ hybridization of a given miRNA molecule can be produced from the nucleic acid sequences provided in Table 1, as described above.


The relative number of miRNA transcripts in cells can also be determined by reverse transcription of miRNA transcripts, followed by amplification of the reverse-transcribed transcripts by polymerase chain reaction (RT-PCR). The levels of miRNA gene transcripts can be quantified in comparison with an internal standard, for example, the level of mRNA from a housekeeping gene present in the same sample, such as myosin or glyceraldehyde-3-phosphate dehydrogenase (G3PDH). The methods for quantitative RT-PCR and variations thereof are within the skill in the art.


It is desirable to simultaneously determine the expression level of a plurality of different of miRNA molecules in a sample, for example determine the expression level of the transcripts of known miRNAs correlated with cancer or other cell division disorders (for example, a hematopoietic cell division disorder, such as a B cell lymphoma). Since examining cancer-specific expression levels for hundreds of miRNA molecules is time consuming, requires a large amount of total RNA (at least 20 μg for each Northern blot) and utilizes autoradiographic techniques that require radioactive isotopes, an oligolibrary in microchip format can be constructed containing a set of probe oligonucleotides specific for a set of miRNA molecules (for example, miRNA molecules having any one nucleic acid sequence of SEQ ID NOS: 1-401, shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10).


A nucleic acid microchip array is a plurality of probe elements, each probe element comprising one or more nucleic acid molecules immobilized on one or more solid surfaces to which sample nucleic acids can be hybridized. Microarrays are known in the art and comprise a surface to which probes that correspond in sequence to gene products (e.g., cDNAs, mRNAs, cRNAs, polypeptides, and fragments thereof), can be specifically hybridized or bound at a known position. The microarray can be an array (i.e., a matrix) in which each position represents a discrete binding site for an RNA, and in which binding sites are present for products of most of the genes in the organism's genome, or a specific tissue or cellular subset of the organism. Here, the binding site can be a nucleic acid or nucleic acid analogue to which a cognate cDNA or RNA, such as miRNA molecule of the invention, can specifically hybridize. The nucleic acid or analogue of the binding site can be, e.g., a synthetic miRNA oligomer.


The nucleic acid or analogue is attached to a solid support, which can be made from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, or other materials. A useful method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al., 1995. See also DeRisi et al., 1996; Shalon et al., 1996; Schena et al., 1996. Each of these articles is incorporated by reference in its entirety.


The microchip is prepared from gene-specific oligonucleotide probes generated from known miRNAs. A nucleic acid array can contain two different oligonucleotide probes for each miRNA, one containing the active sequence and the other being specific for the precursor of the miRNA (for example, see Table 1). The array can also contain controls such as one or more mouse sequences differing from human orthologs by only a few bases, which can serve as controls for hybridization stringency conditions. tRNAs from both species can also be printed on the microchip, providing an internal, relatively stable positive control for specific hybridization. One or more appropriate controls for non-specific hybridization can also be included on the microchip. For this purpose, sequences are selected based upon the absence of any homology with any known miRNAs. For example, the array can also contain miRNA sequences found to be specific for human B cells. Non-limiting examples of such miRNA's include: mir-15, mir-17, mir-14, mir-124a-3, mir-99b, mir-167a, mir-167b, mir-129-1, mir-30c-2, mir-143, mir-27b, mir-125b-1, mir-128a, mir-140, mir-142, mir-191, mir-125b-2, mir-127, mir-129-2, mir-146a, mir-154, mir-185, mir-186, mir-322, mir-124a-1, mir-124a-2, mir-30c-1, mir-302a, and mir-99b.


The microchip can be fabricated by techniques known in the art. For example, probe oligonucleotides of an appropriate length, e.g., 40 nucleotides, are 5′-amine modified at position C6 and printed using commercially available microarray systems, e.g., the GeneMachine OmniGrid™ 100 Microarrayer and Amersham CodeLink™ activated slides. Labeled cDNA oligomer corresponding to the target RNAs is prepared by reverse transcribing the target RNA with labeled primer. Following first strand synthesis, the RNA/DNA hybrids are denatured to degrade the RNA templates. The labeled target cDNAs thus prepared are then hybridized to the microarray chip under hybridizing conditions, e.g. 6.times.SSPE/30% formamide at 25° C. for 18 hours, followed by washing in 0.75.times.TNT at 37° C. for 40 minutes. At positions on the array where the immobilized probe DNA recognizes a complementary target cDNA in the sample, hybridization occurs. The labeled target cDNA marks the exact position on the array where binding occurs, allowing automatic detection and quantification. The output consists of a list of hybridization events, indicating the relative abundance of specific cDNA sequences, and therefore the relative abundance of the corresponding complementary miRs, in the patient sample. According to one embodiment, the labeled cDNA oligomer is a biotin-labeled cDNA, prepared from a biotin-labeled primer. The microarray is then processed by direct detection of the biotin-containing transcripts using, e.g., Streptavidin-Alexa647 conjugate, and scanned utilizing conventional scanning methods. Images intensities of each spot on the array are proportional to the abundance of the corresponding miR in the patient sample.


Other methods for making microarrays (see U.S. Patent Application Publication No. 2006/0051771, which is incorporate by reference in its entirety), e.g., by masking (Fodor et al., 1991; Maskos and Southern, 1992), can also be used. In principal, any type of array, for example, dot blots on a nylon hybridization membrane (see Sambrook et al., 1989, which is incorporated in its entirety for all purposes), can be used, although, as will be recognized by those of skill in the art, very small arrays are useful because hybridization volumes will be smaller.


Labeled cDNA can be prepared from mRNA by oligo dT-primed or random-primed reverse transcription, both of which are well known in the art. Reverse transcription can be carried out in the presence of a dNTP conjugated to a detectable label, for example, a fluorescently labeled dNTP. Alternatively, isolated mRNA can be converted to labeled antisense RNA synthesized by in vitro transcription of double-stranded cDNA in the presence of labeled dNTPs (Lockhart et al., 1996, which is incorporated by reference in its entirety for all purposes). In alternative embodiments, the cDNA or aRNA probe can be synthesized in the absence of detectable label and can be labeled subsequently, e.g., by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent. Alternatively, cDNA or aRNA can be labeled indirectly by incorporation of 5-(3-aminoallyl) dNTPs or rNTPs to provide a amine reactive group for subsequent addition of label with any moiety bearing an N-Hydroxysuccinimide (NHS) ester.


Fluorescently labeled probes can be used, including suitable fluorophores such as fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, Fluor X (Amersham) and others (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif.). It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished. In another embodiment, a label other than a fluorescent label is used. For example, a radioactive label, or a pair of radioactive labels with distinct emission spectra, can be used (see Zhao et al., 1995; Pietu et al., 1996).


The analysis of microarray data can be accomplished using methods of statistical analysis known to those skilled in the art. For example, clustering analysis is commonly used for interpretation of microarray data. It provides both a visual representation of complex data and a method for measuring similarity between experiments. Some widely used methods for clustering microarray data include: hierarchical, K-means, and self-organizing map.


Southern blot hybridization techniques are also within the skill in the art. For example, genomic DNA isolated from a subject's sample can be digested with restriction endonucleases. This digestion generates restriction fragments of the genomic DNA that can be separated by electrophoresis, for example, on an agarose gel. The restriction fragments are then blotted onto a hybridization membrane (e.g., nitrocellulose or nylon), and hybridized with labeled probes specific for a given miRNA molecule(s). A deletion or mutation of these genes is indicated by an alteration of the restriction fragment patterns on the hybridization membrane, as compared to DNA from a control sample that has been treated identically to the DNA from the subject's sample. Probe labeling and hybridization conditions suitable for detecting alterations in gene structure or sequence can be readily determined by one of ordinary skill in the art. The miRNA nucleic acid probes for Southern blot hybridization can be designed based upon the nucleic acid sequences having SEQ ID NOS: 1-401, provided in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10. Nucleic acid probe hybridization can then be detected by exposing hybridized filters to photographic film, or by employing computerized imaging systems, such the Molecular Dynamics 400-B 2D Phosphorimager available from Amersham Biosciences, Piscataway, N.J.


Human miRNAs are associated with different classes of chromosomal features that are subsequently associated with cancer (Xu and Li (2007) Chin Med J (Engl). 120(11):996-9; Bandres et al., (2007) DNA Cell Biol. 26(5):273-82). These cancers are purportedly partly caused by perturbing the chromosome or genomic DNA caused by the cancer-associated chromosomal feature, which can affect expression of oncogenes or tumor-suppressor genes located near the site of perturbation. A given cancer can be treated by restoring the level of miRNA expression associated with that cancer to normal. For example, if the level of miRNA expression is down-regulated in cancer cells of a subject, then the cancer can be treated by increasing the miRNA expression level. Alternatively, if the miRNA expression level is up-regulated in cancer cells of a subject, then the cancer can be treated by decreasing the miRNA expression level.


For example, the level of a miRNA in a cancerous or neoplastic cell of a subject (for example a hematopoietic malignancy or a hematopoietic neoplasm) is first determined relative to control cells. Techniques suitable for determining the relative level of a miRNA molecule in cells have been described above. If miRNA expression is down-regulated in the cancer or neoplastic cell relative to control cells, then the cancer or neoplastic cells are treated with an effective amount of a composition comprising a isolated miRNA molecule which is down-regulated (for example, a miRNA of the invention comprising any one of SEQ ID NOS: 1-401, as shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10). If miRNA expression is up-regulated in cancer or neoplastic cells relative to control cells, then the cancer or neoplastic cells are treated with an effective amount of a composition that inhibits miRNA expression (for example, a miRNA of the invention comprising any one of SEQ ID NOS: 1-401, as shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10).


One skilled in the art can also readily determine an appropriate dosage for the administration of an isolated miRNA molecule to a given subject. For example, a miRNA molecule can be administered to the subject once (e.g., as a single injection or deposition). Alternatively, a miRNA molecule of the invention can be administered once or twice daily to a subject for a period of from about two to about twenty-eight days, for example, from about seven to about ten days. Furthermore, the miRNA molecule of the invention can be co-administrated with another therapeutic, such as a chemotherapy drug. Where a dosage regimen comprises multiple administrations, the effective amount of the miRNA molecule administered to the subject can comprise the total amount of gene product administered over the entire dosage regimen.


The miRNA molecules of the invention comprising any one of SEQ ID NOS: 1-401, as shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10, can be administered to a subject by any means suitable for delivering the miRNA molecules to cells of the subject, such as hematopoietic cells (either cancerous or neoplastic). For example, miRNA molecules can be administered by methods suitable to transfect cells (such as of the subject with the miRNA molecules of the invention. Transfection methods for eukaryotic cells (such as hematopoietic malignant cells or a hematopoietic neoplastic cells) are well known in the art, and include direct injection of the nucleic acid into the nucleus or pronucleus of a cell; electroporation; liposome transfer or transfer mediated by lipophilic materials; receptor mediated nucleic acid delivery, bioballistic or particle acceleration; calcium phosphate precipitation, and transfection mediated by viral vectors.


The compositions of this invention can be formulated and administered to inhibit a variety of disease states by any means that produces contact of the active ingredient with the agent's site of action in the body of a mammal. They can be administered by any conventional means available for use in conjunction with pharmaceuticals, either as individual therapeutic active ingredients or in a combination of therapeutic active ingredients. They can be administered alone, but are generally administered with a pharmaceutical carrier selected on the basis of the chosen route of administration and standard pharmaceutical practice.


Pharmaceutical compositions for use in accordance with the invention can be formulated in conventional manner using one or more physiologically acceptable carriers or excipients. The therapeutic compositions of the invention can be formulated for a variety of routes of administration, including systemic and topical or localized administration. Techniques and formulations generally can be found in Remmington's Pharmaceutical Sciences, Meade Publishing Co., Easton, Pa. (1985), the entire disclosure of which is herein incorporated by reference. For systemic administration, injection is useful, including intramuscular, intravenous, intraperitoneal, and subcutaneous. For injection, the therapeutic compositions of the invention can be formulated in liquid solutions, for example, in physiologically compatible buffers such as Hank's solution or Ringer's solution. In addition, the therapeutic compositions can be formulated in solid form and redissolved or suspended immediately prior to use. Lyophilized forms are also included. Pharmaceutical compositions of the present invention are characterized as being at least sterile and pyrogen-free. These pharmaceutical formulations include formulations for human and veterinary use.


The present pharmaceutical formulations comprise the miRNA molecules of the invention comprising any one of SEQ ID NOS: 1-401, as shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10 (e.g., 0.1 to 90% by weight), or a physiologically acceptable salt thereof, mixed with a pharmaceutically-acceptable carrier. The pharmaceutical formulations of the invention can also comprise the miRNA molecules of the invention comprising any one of SEQ ID NOS: 1-401, as shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10, which are encapsulated by liposomes and a pharmaceutically-acceptable carrier. Useful pharmaceutically-acceptable carriers are water, buffered water, normal saline, 0.4% saline, 0.3% glycine, hyaluronic acid, and the like.


Pharmaceutical compositions of the invention can also comprise conventional pharmaceutical excipients and/or additives. Suitable pharmaceutical excipients include stabilizers, antioxidants, osmolality adjusting agents, buffers, and pH adjusting agents. Suitable additives include physiologically biocompatible buffers (e.g., tromethamine hydrochloride), additions of chelants (such as, for example, DTPA or DTPA-bisamide) or calcium chelate complexes (as for example calcium DTPA, CaNaDTPA-bisamide), or, optionally, additions of calcium or sodium salts (for example, calcium chloride, calcium ascorbate, calcium gluconate or calcium lactate). Pharmaceutical compositions of the invention can be packaged for use in liquid form, or can be lyophilized.


For solid pharmaceutical compositions of the invention, conventional nontoxic solid pharmaceutically-acceptable carriers can be used; for example, pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharin, talcum, cellulose, glucose, sucrose, magnesium carbonate, and the like.


For oral administration, the therapeutic compositions can take the form of, for example, tablets or capsules prepared by conventional means with pharmaceutically acceptable excipients such as binding agents (e.g., pregelatinised maize starch, polyvinylpyrrolidone or hydroxypropyl methylcellulose); fillers (e.g., lactose, microcrystalline cellulose or calcium hydrogen phosphate); lubricants (e.g., magnesium stearate, talc or silica); disintegrants (e.g., potato starch or sodium starch glycolate); or wetting agents (e.g., sodium lauryl sulphate). The tablets can be coated by methods well known in the art. Liquid preparations for oral administration can take the form of, for example, solutions, syrups or suspensions, or they can be presented as a dry product for constitution with water or other suitable vehicle before use. Such liquid preparations can be prepared by conventional means with pharmaceutically acceptable additives such as suspending agents (e.g., sorbitol syrup, cellulose derivatives or hydrogenated edible fats); emulsifying agents (e.g., lecithin or acacia); non-aqueous vehicles (e.g., ationd oil, oily esters, ethyl alcohol or fractionated vegetable oils); and preservatives (e.g., methyl or propyl-p-hydroxybenzoates or sorbic acid). The preparations can also contain buffer salts, flavoring, coloring and sweetening agents as appropriate.


Preparations for oral administration can be suitably formulated to give controlled release of the active agent. For buccal administration the therapeutic compositions can take the form of tablets or lozenges formulated in a conventional manner. For administration by inhalation, the compositions for use according to the present invention are conveniently delivered in the form of an aerosol spray presentation from pressurized packs or a nebuliser, with the use of a suitable propellant, e.g., dichlorodifluoromethane, trichlorofluoromethane, dichlorotetrafluoroethane, carbon dioxide or other suitable gas. In the case of a pressurized aerosol the dosage unit can be determined by providing a valve to deliver a metered amount. Capsules and cartridges of e.g., gelatin for use in an inhaler or insufflate or can be formulated containing a powder mix of the therapeutic agents and a suitable powder base such as lactose or starch.


The therapeutic compositions can be formulated for parenteral administration by injection, e.g., by bolus injection or continuous infusion. Formulations for injection can be presented in unit dosage form, e.g., in ampoules or in multi-dose containers, with an added preservative. The compositions can take such forms as suspensions, solutions or emulsions in oily or aqueous vehicles, and can contain formulatory agents such as suspending, stabilizing and/or dispersing agents. Alternatively, the active ingredient can be in powder form for constitution with a suitable vehicle, e.g., sterile pyrogen-free water, before use.


Suitable enteral administration routes for the present methods include oral, rectal, or intranasal delivery. Suitable parenteral administration routes include intravascular administration (e.g. intravenous bolus injection, intravenous infusion, intra-arterial bolus injection, intra-arterial infusion and catheter instillation into the vasculature); peri- and intra-tissue injection (e.g., peri-tumoral and intra-tumoral injection, intra-retinal injection, or subretinal injection); subcutaneous injection or deposition including subcutaneous infusion (such as by osmotic pumps); direct application to the tissue of interest, for example by a catheter or other placement device (e.g., a retinal pellet or a suppository or an implant comprising a porous, non-porous, or gelatinous material); and inhalation. The miRNA molecules of the invention are administered by injection or infusion.


In addition to the formulations described previously, the therapeutic compositions can also be formulated as a depot preparation. Such long acting formulations can be administered by implantation (for example subcutaneously or intramuscularly) or by intramuscular injection. Thus, for example, the therapeutic compositions can be formulated with suitable polymeric or hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt.


Systemic administration can also be by transmucosal or transdermal means. For transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art, and include, for example, for transmucosal administration bile salts and fusidic acid derivatives. In addition, detergents can be used to facilitate permeation. Transmucosal administration can be through nasal sprays or using suppositories. For topical administration, the compositions of the invention are formulated into ointments, salves, gels, or creams as generally known in the art. A wash solution can be used locally to treat an injury or inflammation to accelerate healing. For oral administration, the therapeutic compositions are formulated into conventional oral administration forms such as capsules, tablets, and tonics.


A composition of the present invention can also be formulated as a sustained and/or timed release formulation. Such sustained and/or timed release formulations can be made by sustained release means or delivery devices that are well known to those of ordinary skill in the art, such as those described in U.S. Pat. Nos. 3,845,770; 3,916,899; 3,536,809; 3,598,123; 4,008,719; 4,710,384; 5,674,533; 5,059,595; 5,591,767; 5,120,548; 5,073,543; 5,639,476; 5,354,556; and 5,733,566, the disclosures of which are each incorporated herein by reference. The pharmaceutical compositions of the present invention can be used to provide slow or sustained release of one or more of the active ingredients using, for example, hydropropylmethyl cellulose, other polymer matrices, gels, permeable membranes, osmotic systems, multilayer coatings, microparticles, liposomes, microspheres, or the like, or a combination thereof to provide the desired release profile in varying proportions. Suitable sustained release formulations known to those of ordinary skill in the art, including those described herein, can be readily selected for use with the pharmaceutical compositions of the invention. Thus, single unit dosage forms suitable for oral administration, such as, but not limited to, tablets, capsules, gelcaps, caplets, powders, and the like, that are adapted for sustained release are encompassed by the present invention.


In the present methods, the miRNA molecules of the current invention comprising any one of SEQ ID NOS: 1-401, as shown in Table 1, or any one miRNA molecule of Table 7, Table 9, or Table 10, can be administered to the subject either as naked RNA, in conjunction with a delivery reagent, or as a nucleic acid (e.g., a recombinant plasmid or viral vector) comprising sequences which expresses the gene product. Suitable delivery reagents for administration of the miRNA molecules include the Mims Transit TKO lipophilic reagent; lipofectin; lipofectamine; cellfectin; or polycations (e.g., polylysine), or liposomes.


The dosage administered will be a therapeutically effective amount of the composition sufficient to result in amelioration of symptoms of B cell lymphoma disease and can vary depending upon known factors such as the pharmacodynamic characteristics of the particular active ingredient and its mode and route of administration; age, sex, health and weight of the recipient; nature and extent of symptoms; kind of concurrent treatment, frequency of treatment and the effect desired.


Toxicity and therapeutic efficacy of therapeutic compositions of the present invention can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (The Dose Lethal To 50% Of The Population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD50/ED50. Therapeutic agents which exhibit large therapeutic indices are useful. Therapeutic compositions that exhibit some toxic side effects can be used.


Appropriate doses of small molecule agents depends upon a number of factors known to those or ordinary skill in the art, e.g., a physician. The dose(s) of the small molecule will vary, for example, depending upon the identity, size, and condition of the subject or sample being treated, further depending upon the route by which the composition is to be administered, if applicable, and the effect which the practitioner desires the small molecule to have upon the nucleic acid or polypeptide of the invention. Exemplary doses include milligram or microgram amounts of the small molecule per kilogram of subject or sample weight (e.g., about 1 microgram per kilogram to about 500 milligrams per kilogram, about 100 micrograms per kilogram to about 5 milligrams per kilogram, or about 1 microgram per kilogram to about 50 micrograms per kilogram.


These methods described herein are by no means all-inclusive, and further methods to suit the specific application will be apparent to the ordinary skilled artisan. Moreover, the effective amount of the compositions can be further approximated through analogy to compounds known to exert the desired effect.


The practice of aspects of the present invention can employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, molecular biology, transgenic biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. Such techniques are explained fully in the literature. See, for example, Molecular Cloning A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch and Maniatis (Cold Spring Harbor Laboratory Press: 1989); DNA Cloning, Volumes I and II (D. N. Glover ed., 1985); Oligonucleotide Synthesis (M. J. Gait ed., 1984); Mullis et al. U.S. Pat. No. 4,683,195; Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. 1984); Transcription And Translation (B. D. Hames & S. J. Higgins eds. 1984); Culture Of Animal Cells (R. I. Freshney, Alan R. Liss, Inc., 1987); Immobilized Cells And Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide To Molecular Cloning (1984); the treatise, Methods In Enzymology (Academic Press, Inc., N.Y.); Gene Transfer Vectors For Mammalian Cells (J. H. Miller and M. P. Calos eds., 1987, Cold Spring Harbor Laboratory); Methods In Enzymology, Vols. 154 and 155 (Wu et al. eds), Immunochemical Methods In Cell And Molecular Biology (Mayer and Walker, eds., Academic Press, London, 1987); Handbook Of Experimental Immunology, Volumes I-IV (D. M. Weir and C. C. Blackwell, eds., 1986); Manipulating the Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986). All patents, patent applications and references cited herein are incorporated in their entirety by reference.


EXAMPLES

A number of Examples are provided below to facilitate a more complete understanding of the present invention. The following examples illustrate the exemplary modes of making and practicing the present invention. However, the scope of the invention is not limited to specific embodiments disclosed in these Examples, which are for purposes of illustration only, since alternative methods can be utilized to obtain similar results.


Example 1

Experimental Analysis:


We have used an established process for the identification of miRNAs candidates by sequencing size-fractionated cDNA libraries from three normal B cell populations, including Naïve B cells, Centroblasts and Memory B cells and from a Burkitt Lymphoma cell line (Ramos). Briefly, short RNAs (sRNA) extracted from the cells were linked to adaptor oligonucleotides, reverse transcribed, PCR amplified, cloned into an expression vector, and sequenced. Individual sequences, corresponding to short RNAs, were then analyzed bioinformatically to determine whether they would constitute appropriate miRNA candidates.


Bioinformatics Analysis:


The computational analysis starts with the identification of short RNA (sRNA) sequences from cloned cDNA sequences using the adaptor oligonucleotides as oriented markers. Then, each sRNA was matched to the human genome (ncbi36, October 2006) by using Megablast (NCBI).


A Bayesian evidence integration scheme was used to identify candidate sRNAs from partial matches to the human genome. Single and multiple base mismatches can occur for several reasons, including polymorphysms, sequencing errors, PCR errors that are more frequent in the 3′ and 5′ region of the miRNA cDNA, and RNA editing enzymes.


To identify sequences that correspond to real sRNAs, we used the following Bayesian approach to compute the posterior probability that a sequence is a candidate miRNA genomic match given specific base pair substitutions under the assumption that individual mutations are uncorrelated (Naïve Bayes). The individual p(mi|match) and p(mi) can be measured using matches to the miRNAs deposited in the miRNABase database. Sequences with p (match|m1, m2, . . . , mn)>0.5 were considered bona-fide matches to the human genome sequences and further processed together with the exact matching ones. m1, m2, . . . , mn:










p


(


match
|

m
1


,

m
2

,





,

m
n


)


=



p


(


m
1

,

m
2

,





,


m
n

|
match


)












p


(
match
)



p


(


m
1

,

m
2

,





,

m
n


)









=




p


(
match
)







p


(


m
i

|
match

)



p


(

m
i

)












Resulting sRNA genome locations, both for exact and partial matches, were analyzed and merged, if necessary, to remove overlapping sequences. For each sRNA hit a sequence from −80 bp upstream to +80 b.p. downstream of the sRNA match was selected leading to a set of candidate microRNA genes, including the full length precursors.


These longer sequences were then analyzed to determine if they can lead to the formation of precursor miRNA structures, including favorable energetics to for the established stem-loop secondary RNA structure required for further miRNA processing. Using a set of established mammalian miRNA's and data from the scientific literature the following criteria were established:

    • 1. Mature sequence, which is defined by matching sRNA's, should occupy only one arm of hairpin (not the loop).
    • 2. Loop structure can not be shorter that 3 b.p. or longer than 20 b.p.
    • 3. Hairpin stem should not host additional secondary RNA structures with more that 10 b.p.
    • 4. The ratio between the number of complementary base pairs and the total number of base pairs comprising hairpin arms should be bigger that 0.55.
    • 5. Free energy should not be bigger then −20 kcal/mole.


We do not apply highly popular conservation criteria because of:

    • significant incompleteness of many genome assemblies especially in intergenic regions
    • Risk to lose unique or diverged miRNA's (this invention is directed to obtaining an exhaustive search and outcome)
    • Possibility of further experimental verification of candidates.


Since the exact sites of transcription initiation and termination for miRNA genes is not known and since concentration and temperature can vary, it is important to consider not just the optimal folding variant but an entire distribution of variants near the optimal one. We used the Vienna RNA package to perform this type of analysis, with appropriate modifications to allow the analysis of suboptimal folding variants.


Candidate miRNA genes were filtered against non-coding RNA database and repeats database (repbase). Database of human non-coding RNA's is manually compiled and contains genes like ribosomal RNA's, snRNA, tRNA's, Y-RNA and others. Comparison performed using MEGABLAST program (NCBI).


Finally, candidate miRNAs emerging from this analysis were compared to miRNABase database to identify previously known miRNAs. Candidate miRNAs that were not identified in the miRNABase were considered. Diagrams illustrating these steps are included as FIGS. 1-3.


Validation:


Some of the newly identified miRNAs were tested by Northern in the corresponding population in which the sRNA was isolated.


Newly Identified miRNAs


The following have been identified as newly identified miRNAs (miRNAs) in one or more of the four B cell populations from which the size-fractionated cDNA libraries were isolated.


Example 2
Protocol for microRNA Cloning

Preparation and Labeling of Decade Marker (Ambion #7778)


Decade marker is radio labeled using [γ-32P]ATP according to the manufacturer instructions.


Labeling of the RNA Carrier


20 pmol of #909 carrier RNA oligo is radio labeled using [γ-32P]ATP and T4 Polynucleotide kinase (NEB).


Purification of 18-26mers from Total RNA


A 15% denaturing polyacrylamide gel was prepared using the Sequagel System (National Diagnostics) following the manufacturer instructions. A metal plate was placed on the front of the gel and was pre-run at 50 W for 30 minutes using a running buffer of 0.5×TBE. Each RNA sample [long of total RNA] was subsequently spiked 1 pmol (2.5 n1) of [γ-32P] labeled-#909 carrier and an equal volume of 2×RNA loading buffer (Ambion) was added. For total RNA, extraction procedures like Trizol Reagent (Invitrogen) that purify all sizes RNA are recommended. Samples were then boiled for 5 minutes and were loaded with decade marker. After samples were loaded onto the gel, it was run for 3-4 hours at 50 W. After the gel was run, the apparatus was disassemble and one of the glass pieces was removed. The hot area of the gel was cut out and placed on the side of the lane for alignment with the markers. The gel was subsequently covered with a plastic wrap and was exposed to a phosphoimage screen for about 30 minutes to an 1 hour. The gel image was printed at 100% magnification and placed under the glass. It was then aligned with the hot spot that had been previously cut. The RNA band was cut approximately from 18mers up to 26mers using the marker and the 1 nt ladder below the carrier as reference. RNA was eluted from the gel using 2 ml of 0.3M NaCl in DEPC H2O, and the eluted sample was subsequently rotated overnight at 4° C. The supernatant was then recovered and 450 μl (max) was distributed into separate tubes. A 2× volume of 100% EtOH and 10 ng glycogen were added to each tube, which were then incubated at −20° C. for at least 2 hours. Samples were then spun for 30 minutes at 14,000 rpm, and were subsequently washed with 75% EtOH. The pellets were then air dried and dissolved in 10 μl, DEPC H2O.


3′-Adaptor Ligation and Purification


The following were first mixed in a tube and were then incubated at 25° C. for 6 hours:

















1.5 μl
10X 3′ Ligation Buffer*



7.5 μl
purified small RNA



  2 μl
App.17.91 [100 pmol/μl] (adenylated)



1.5 μl
T4 RNA Ligase [30 U/μl] (Amersham)



2.5 μl
H2O





*500 mM Tris-HCl (pH 7.5), 100 mM MgCl2, 100 mM DTT, 600 μg/ml BSA. Store at −20° C.






15 μl, of 2× Loading Buffer (Ambion) was added to the samples and boiled for 5 minutes, which were then loaded onto a 12% denaturing polyacrylamide gel thata was prepared using the Sequagel System (National Diagnostics) following the manufacturer instructions. A metal plate was placed on the front of the gel and was pre-run at 50 W for 30 minutes using a running buffer of 0.5×TBE. The decade RNA marker and 1 μl, of [γ-32P] labeled-#909 carrier was then loaded into separated lanes and run for 3-4 hours at 50 W. After the gel was run, the apparatus was disassemble and the gel was exposed to a phosphoimage screen for 2-4 hours. The gel image was printed at 100% magnification and placed under the glass. The ligation product ranging approximately from 35 up to 42-45 nt was cut and RNA was eluted from the gel using 2 ml of 0.3M NaCl in DEPC H2O. The eluted sample was subsequently rotated overnight at 4° C. The supernatant was then recovered and 450 μl (max) was distributed into separate tubes. A 2× volume of 100% EtOH and 10 mg glycogen were added to each tube, which were then incubated at −20° C. for at least 2 hours. Samples were then spun for 30 minutes at 14,000 rpm, and were subsequently washed with 75% EtOH. The pellets were then air dried and dissolved in 12 μl DEPC H2O.


5′-Adaptor Ligation and Purification


The following were first mixed in a tube and were then incubated at 25° C. for 6 hours:

















1 μl
10X Ligation Buffer (Amersham)



6 μl
purified small RNA



2 μl
17.93 oligo [100 pmol/μl]



1 μl
T4 RNA Ligase [30 U/μl] (Amersham)









Samples can be stored at 4° C. after the 6 hour incubation until further processed. 10 μl of 2× Loading Buffer (Ambion) was added to the samples and boiled for 5 minutes, which were then loaded onto a 10% denaturing polyacrylamide gel thata was prepared using the Sequagel System (National Diagnostics) following the manufacturer instructions. A metal plate was placed on the front of the gel and was pre-run at 50 W for 30 minutes using a running buffer of 0.5×TBE. The decade RNA marker and 1 μl of [γ-32P] labeled-#909 carrier was then loaded into separated lanes and run for 3 hours at 50 W. After the gel was run, the apparatus was disassemble and the gel was exposed to a phosphoimage screen for 12-18 hours. The gel image was printed at 100% magnification and placed under the glass. The ligation product ranging approximately from 35 up to 48-62 nt was cut and RNA was eluted from the gel using 2 ml of 0.3M NaCl in DEPC H2O. The eluted sample was subsequently rotated overnight at 4° C. The supernatant was then recovered and 450 μl (max) was distributed into separate tubes. A 2× volume of 100% EtOH and 10 ng glycogen were added to each tube, which were then incubated at −20° C. for at least 2 hours. Samples were then spun for 30 minutes at 14,000 rpm, and were subsequently washed with 75% EtOH. The pellets were then air dried and dissolved in 20 μl DEPC H2O.


Reverse Transcription


The following were first mixed in a PCR tube and were then incubated at 80° C. for 2 minutes:

















10 μl
ligated RNA



 3 μl
#918 primer [100 μM]









The sample was subsequently spun down to cool. Using the First Strand Synthesis System kit (Invitrogen), the following was added to the sample which was then incubated at 48° C. for 2 minutes:

















5 μl
5X First Strand Buffer



7 μl
dNTPs



3 μl
0.1M DTT









3 μl of SuperScriptII (Invitrogen) was then added to the sample, which was further incubated at 48° C. for 1 hour. Samples can then be stored at −20° C. until further processing.


PCR Amplification


The PCR reaction was prepared as follows:

    • 5% Reverse Transcription product
    • 1×PCR buffer
    • 1.5 mM MgCl2
    • 0.8 mM dNTP
    • 2 μM #913 primer
    • 2 μM #914 primer
    • 2 U Taq polymerase


The sample was then amplified according to the following protocol:











2





94





C
.






30





94





C
.






30





52





C
.






30





72





C
.






10





72





C
.




}

×
30





cycles




The sample was then extracted using Phenol/CIA and was precipitated with sodium acetate. The sample was then spun for 30 minutes at 14,000 rpm, and was subsequently washed with 75% EtOH. The pellet was then air dried and dissolved in 45 DEPC H2O.


PACI Digestion


The digestion was prepared as follows and was subsequently incubate at 37° C. for 90 minutes:

    • 1×NEB Buffer 1
    • 1×BSA
    • 1,000 U/μl PACI
    • 42.5 μl DNA


The sample was then extracted using Phenol/CIA and was precipitated with sodium acetate. The sample was then spun for 30 minutes at 14,000 rpm, and was subsequently washed with 75% EtOH. The pellet was then air dried and dissolved in 15 μl DEPC H2O, Non-denaturing Loading Buffer was added to the sample and the sample was subsequently loaded onto a 12% non-denaturing acrylamide gel using pUC19/Sau3AI as a marker (Ambion #7760), which was run at 13 W for approximately 1 hr. The gel was then stained with 1:10,000 SybrGold (Molecular Probes #S-11494) in 0.5×TBE for 30 to 60 minutes. The smear of digested samples was then cut between 46 and 75 nt according to the marker size. Cut out samples were then eluted with 500 μl of 0.3M NaCl in 1.5 ml screw top tubes, rotating overnight at 4° C. Samples were spun briefly, and about 450 μl of eluted volume was recovered for each sample. The samples were then extracted using Phenol/CIA and were precipitated with sodium acetate. The samples were then spun for 30 minutes at 14,000 rpm, and were subsequently washed with 75% EtOH. The pellet was then air dried and dissolved in 10 μl DEPC H2O. If the amount of digested product on the gel appeared to be low, a second PCR amplification was required.


Second PCR Amplification


The PCR reaction was prepared as follows:

    • 2% Reverse Transcription product
    • 1×PCR buffer
    • 1.5 mM MgCl2
    • 0.8 mM dNTP
    • 2 μM #913 primer
    • 2 μM #914 primer
    • 2 U Taq polymerase


The sample was then amplified according to the following protocol:











2





94





C
.






30





94





C
.






30





52





C
.






30





72





C
.






10





72





C
.




}

×
20





cycles




The amount of product generated was examined via adding non-denaturing Loading Buffer to 5 μl of sample, which was then loaded onto a 12% non-denaturing acrylamide gel, and was subsequently run at 13 W for approximately 1 hr. Samples were cut out and eluted with 500 μl of 0.3M NaCl in 1.5 ml screw top tubes, rotating overnight at 4° C. Samples were spun briefly, and about 450 μl of eluted volume was recovered for each sample. The samples were then extracted using Phenol/CIA and were precipitated with sodium acetate. The samples were then spun for 30 minutes at 14,000 rpm, and were subsequently washed with 75% EtOH. The pellet was then air dried and dissolved in 45 DEPC H2O, Samples were further quantified, wherein amount of sampled needed to be within the range of 200-500 ng/μl.


BanI Digestion


The following digestion was prepared for each sample as follows, and were subsequently incubated at 37° C. for 2.5 hours:

    • 1×NEB Buffer 4
    • 1×BSA
    • 1000 U/μl BanI


Samples were then extracted using Phenol/CIA and were precipitated with sodium acetate. The samples were spun for 30 minutes at 14,000 rpm, and were subsequently washed with 75% EtOH. The pellet was then air dried and dissolved in 26 μl DEPC H2O.


Concatemerization


The following sample was mixed in a 0.5 or 0.2 ml tube as follows, and was incubated at 25° C. for 2 hours:

















26 μl
Digested DNA



 3 μl
10X Ligation Buffer (Roche)



 1 μl
T4 DNA Ligase [5 U/μl] (Roche)









1 μl of dNTP as well as 1 μl of DNA Polymerase I was subsequently added, and the sample was then incubated at 16° C. for 1 hour. The sample was run on a 2% LM agarose-EtBr gel in 0.5×TBE with 100 bp Ladder loaded in the first lane. The concatamers were then cut starting from about 500 bp and above. Samples were then gel purified with QIAquick gel extraction kit (Qiagen cat. #28706), using the manufacturer's instructions.


Vector Digestion


The pCR2.1-TOPO (Invitrogen, 3.9 Kb) vector was digested according to the manufacturer's instructions for the EcoRV enzyme (NEB). The samples was then subjected to gel electrophoresis using a 1% agarose gel. The sample was then gel purified with QIAquick gel extraction kit (Qiagen cat. #28706), using the manufacturer's instructions.


Ligation


The ligation reaction was set up using the Rapid Ligation Kit (Roche cat. #11635379001) according to the procedure recommended by the manufacturer. 10 μl of the ligation product was used for transformation in high efficiency bacteria (One Shot OminMAX—Invitrogen cat. #C8540-03). X-gal/IPTG Ampicillin agarose plates were prepared fresh and a specified volume of the transformation reaction was spread onto the plates. Plated samples were then incubated overnight at 37° C. Colonies were then obtained the next day and grown in 30 μl of LB medium at 37° C. for 1 hour. Once DNA was obtained from the amplified bacteria, 2.5 μl was used as template for the colony PCR reaction.


Colony PCR


The PCR reaction was prepared as follows:

    • 2.5 μl template
    • 1×PCR buffer
    • 1.5 mM MgCl2
    • 0.8 mM dNTP
    • 0.2 μM #913 primer
    • 0.2 μM #914 primer
    • 1 U Taq polymerase


The sample was then amplified according to the following protocol:











5





94





C
.






30





94





C
.






30





52





C
.






30





72





C
.






10





72





C
.




}

×
30





cycles




1 μl of PCR product was examined on a 1.5% agarose gel and clones were selected carrying an insert longer than 350 bp. Samples were then confirmed by sequencing.


Oligonucleotides










909 carrier RNA oligo:



5′-(P)-UGUCAGUUUGUUAAUUAACCCAA-3′
[SEQ ID NO: 517]







5′ phosphate; 3′ none; includes PACI restriction site










App. 17.91 (3′ end Donor oligo):



5′-rAppCTGTAGGCACCATCAAT/3ddC-3′
[SEQ ID NO: 518]







5′ adenylated containing a pyrophosphate; 3′ modified terminal dideoxy-C(ddC) (available from IDT Inc. as the “miRNA cloning linker”); includes BanI restriction site.










17.93 (5′ end Acceptor oligo):



5′-ATCGTAGGCACCTGAAA-3′
[SEQ ID NO: 519]







It includes BanI restriction site.












#918 RT primer oligo:




5′-ATTGATGGTGCCTAC-3′
[SEQ ID NO: 520]







It includes BanI restriction site.










#913 (5′ PCR primer oligo):



5′-ATCGTAGGCACCTGAAA-3′
[SEQ ID NO: 521]







It includes BanI restriction site.










#914 (3′ PCR primer oligo):



5′-ATTGATGGTGCCTACAG-3′
[SEQ ID NO: 522]







It includes BanI restriction site.


Screening/Sequencing Primers:










M13F 5′-GTAAAACGACGGCCAG-3′
[SEQ ID NO: 523]





M13R 5′-CAGGAAACAGCTATGAC-3′
[SEQ ID NO: 524]






REFERENCES



  • Lau et al., Science (2001) 294:858-62; Chen et al., Science (2004) 303: 83-6



Example 3
Identification of the Human Mature B Cells miRNome

The discovery of microRNAs (miRNAs) has added a new dimension to the mechanisms that regulate gene expression in normal cell development. Initial evidence also shows that structural or functional alterations of miRNAs are associated with tumorigenesis. The full set of microRNAs (miRNAs) in the human genome is not known. Since presently known miRNAs have been identified by virtue of their abundant expression in a few cell types, tissue-specific miRNAs may remain unrevealed. To understand the role of miRNAs in B-cell function and lymphomagenesis, short-RNA libraries were generated from normal human B cells at different stages of development (naïve, germinal-center, memory) and from a Burkitt lymphoma cell-line. The combination of cloning and computational analysis identified 401 miRNA (miRNome) expressed during normal B-cell development and/or in transformed B-cells. Most notably, this analysis identified 272 new miRNAs that were not previously reported. Numerous miRNAs are expressed in a stage-specific as well as transformation-specific fashion, suggesting specific functional roles. These results significantly increase the number of presently known miRNAs and provide a resource for the study of their role in B-cell development, immune function, and lymphomagenesis.


A new level of post-transcriptional regulation has been revealed with the discovery of microRNAs (miRNAs) a class of short-RNAs that impair translation or induce mRNA degradation by binding to the 3′ untranslated region of target mRNA1,2. The most recent release of the miRBase database (v.11.0)34 reports 839 human miRNAs, but the discovery of miRNAs is still an on-going process with variable predictions about the total number of miRNAs expressed in mammalian cells ranging from one thousand to several thousands5,6. The reported miRNAs have been identified from a limited number of cell types or from tissues whose cellular heterogeneity may favor the identification of ubiquitous and abundant miRNA. In fact, a recent report aiming to the identification of the miRNA expression profiles from 26 different mammalian tissues and cell types led to the discovery of only 12 new human miRNA7. These findings led to the conclusion that most miRNAs are known and that most of them are ubiquitously expressed. Nonetheless, additional analyses of purified cell populations may lead to the identification of tissue- and stage of differentiation-specific miRNAs, as has been the case for messenger RNAs.


The role of specific miRNAs in B-cell immunity and malignancy has only just begun to be elucidated. Using mouse models, miR-155 has been demonstrated to affect regulation of germinal center response through modulation of cytokine production8,9. Recently, miR-150 has been shown to target c-Myb, a critical transcription factor involved in the control of B cell differentiation10. In B cell lymphomas, 13q31 amplification has been associated with the over-expression of the miR-17-92 cluster and its enforced expression in a murine B cell lymphoma model showed a role in accelerating tumor development11. Furthermore, miR-15a and miR-16 have been implicated in the pathogenesis of B cell chronic lymphocytic leukemia (CLL)12,13.


To further explore the role of miRNAs in B cell function and lymphomagenesis, this study was aimed at identifying the miRNAs expressed (miRNome) in the human mature B-cell compartment, including naïve, germinal centers (GCs), and memory B cells. These B cell subpopulations are relevant for the development of antibody-mediated immunity as well as for tumorigenesis, since common human B-cell malignancies originate from the malignant transformation of GC B cells (most B-cell non-Hodgkin lymphomas, B-NHL), or naïve and memory B cells (mantle cell lymphoma and chronic lymphocytic leukemia)14,15. Using a combination of cloning and computational analysis, we report the identification of 401 miRNA representing the mature B cell miRNome, including 272 new miRNAs, and illustrate their pattern of expression during B cell differentiation and transformation.


Construction of Short-RNA Libraries from Human B Cell Sub Populations


Short-RNA libraries were generated by cloning RNA fractions of 18-30 nt from human centroblasts, naïve and memory B cells purified from tonsils as well as from the Burkitt lymphoma cell line Ramos, which is representative of malignant transformation of GC B cells (FIG. 10). Approximately 3,500 sequences were analyzed from each library, corresponding to 13,991 total short-RNAs (2,661 non-redundant sequences). Using a bootstrap approach16,17, we estimated the expected number of miRNAs that could be predicted using our computational pipeline from various size of short-RNA libraries. The results suggested that at the current sequencing depth, 80% of the possible predictions have been identified FIG. 17).


The cloned sequences were matched to the human genome assembly from March 2006 (hg18) to retrieve the genomic regions from which the short-RNAs originated. One or more genomic locations were identified for approximately 80% of the cloned sequences considering both perfect matches and single mismatches. Consistent with previous observations, 3′-end mismatches were the most common and showed a clear preference for A in the last position (18. Approximately 546 short-RNA sequences did not align to the human genome according to the above criteria and are likely due to PCR errors introduced during the cloning procedure (FIG. 11). Nevertheless, a small subset of these short-RNAs lacking a corresponding genomic region in Homo sapiens have been cloned with high frequencies in multiple libraries and showed differential expression during B cell differentiation, suggesting they may represent bona fide short-RNA species. However, given the difficulty of assigning genomic coordinates to these sequences they were omitted from further analyses.


Computational Prediction of miRNA Precursors


In order to identify candidate miRNAs among the cloned sequences, we developed a computational pipeline aiming at the identification of potential miRNA precursors based on the investigation of their genomic location and folding characteristics (FIG. 18 and Supplementary Methods). Briefly, short RNA sequences were mapped to the human genome and candidate genomic precursors (+/−90 nt) were then retrieved and analyzed for secondary structure, size and energy of the loop, and number of complimentary base pairs in the stem of the loop (Supplementary Methods). The prediction was performed on the full set of non redundant short-RNAs (2,115 sequences) for which one or more genomic locations could be identified (FIG. 11). The analysis led to the identification of candidate precursors for 1,667 short-RNA sequences, which were then clustered to account for the variability observed at the miRNA 3′-ends (and less dramatically at the 5′-ends) including nucleotide substitutions and deletions. Moreover, editing of miRNA has been previously reported19,20 and a few cases compatible with an editing process have been observed in the libraries described here. Since most clusters of short-RNA are affected by these modifications, we applied the following criteria in order to define mature miRNA sequences: i) each nucleotide must occur in more than 50% of the cloned sequences; sequences supported by a short-RNA set that is fully contained in a larger set were eliminated while matching clusters with partial containment were merged (Supplementary Methods). After annotating each candidate mature miRNA, those which had evidences of originating from mRNA, rRNA, tRNA and other ncRNA (yRNA, sn/snoRNA) and occurred once were eliminated. Overall, the computational analysis identified 401 mature miRNA (Table 1).


The human miRNAs deposited in the miRBase database (v.11.0) were identified only at the end of the analysis and any lack of prediction was checked by matching the starting set with the final predictions. Overall, previously reported miRNA represented 32% of cloned and computationally validated mature miRNA. In addition, our analysis identified 146 previously reported precursors as well as 761 genomic locations containing precursors potentially coding for 272 new mature miRNA and 19 new precursors for 8 mature miRNAs deposited in the miRBase database (FIG. 12 and Table 1).









TABLE 1A







List of known and newly identified mature miRINAs.











SEQ




ID
ID NO.
Mature miRNA sequence
Annotations













CU-1026
1
TGTAGTGTTTCCTACTTTATGGA
Mature:hsa-miR-142-3p:MIMAT0000434





CU-1064
2
TAGCTTATCAGACTGATGTTGA
Mature:hsa-miR-21:MIMAT0000076





CU-1061
3
TAAAGTGCTTATAGTGCAGGTAG
Mature:hsa-miR-20a:MIMAT0000075





CU-1035
4
TAGCAGCACATCATGGTTTACA
Mature:hsa-miR-15b:MIMAT0000417





CU-1037
5
TAGCAGCACGTAAATATTGGCG
Mature:hsa-miR-16:MIMAT0000069





CU-1001
6
TGAGGTAGTAGGTTGTATAGTT
Mature:hsa-let-7a:MIMAT0000062





CU-1116
7
TATTGCACTTGTCCCGGCCTGT
Mature:hsa-miR-92a:MIMAT0000092





CU-1018
8
TCCCACCGCTGCCACCA
Mature:hsa-miR-1280:MIMAT0005946





CU-1006
9
TGAGGTAGTAGATTGTATAGTT
Mature:hsa-let-7f:MIMAT0000067





CU-1079
10
TAGCACCATCTGAAATCGGTTA
Mature:hsa-miR-29a:MIMAT0000086





CU-1033
11
TAGCAGCACATAATGGTTTGT
Mature:hsa-miR-15a:MIMAT0000068





CU-1124
12
CCCATAAAGTAGAAAGCACTA
Mature:hsa-miR-142-5p:MIMAT0000433





CU-1007
13
TGAGGTAGTAGTTTGTACAGTT
Mature:hsa-let-7g:MIMATOOOO414





CU-1008
14
TGAGGTAGTAGTTTGTGCTGTT
Mature:hsa-let-7i:MIMATOOOO415





CU-1082
15
TAGCACCATTTGAAATCGGTTA
Mature:hsa-miR-29c:MIMATOOOO681





CU-1085
16
TGTAAACATCCTACACTCTCAGC
Mature:hsa-miR-30c:MIMAT0000244





CU-1039
17
CAAAGTGCTTACAGTGCAGGTAG
Mature:hsa-miR-17:MIMAT0000070





CU-1071
18
CATTGCACTTGTCTCGGTCTGA
Mature:hsa-miR-25:MIMAT0000081





CU-1046
19
CAACGGAATCCCAAAAGCAGCTG
Mature:hsa-miR-191:MIMATOOOO440





CU-1057
20
TGTGCAAATCCATGCAAAACTGA
Mature:hsa-miR-19b:MIMAT0000074





CU-1024
21
TACCACAGGGTAGAACCACGGA
Mature:hsa-miR-140-3p:MIMAT0004597





CU-1084
22
TGTAAACATCCTACACTCAGCT
Mature:hsa-miR-30b:MIMAT0000420





CU-1003
23
TGAGGTAGTAGGTTGTGTGGTT
Mature:hsa-let-7b:MIMAT0000063





CU-1080
24
TAGCACCATTTGAAATCAGTGTT
Mature:hsa-miR-29b:MIMAT0000100





CU-1012
25
TAAAGTGCTGACAGTGCAGAT
Mature:hsa-miR-106b:MIMAT0000680





CU-1092
26
TCCCTGTCCTCCAGGAGCTC
Mature:hsa-miR-339-5p:MIMAT0000764





CU-1072
27
TTCAAGTAATCCAGGATAGGCT
Mature:hsa-miR-26a:MIMAT0000082





CU-1118
28
CAAAGTGCTGTTCGTGCAGGTAG
Mature:hsa-miR-93:MIMAT0000093





CU-1067
29
TGTCAGTTTGTCAAATACCCCA
Mature:hsa-miR-223:MIMAT0000280





CU-1027
30
TGAGAACTGAATTCCATGGGTT
Mature:hsa-miR-146a:MIMAT0000449





CU-1029
31
TCTCCCAACCCTTGTACCAGT
Mature:hsa-miR-150:MIMAT0000451





CU-1015
32
TCCCTGAGACCCTAACTTGTGA
Mature:hsa-miR-125b:MIMAT0000423





CU-1093
33
TCTCACACAGAAATCGCACCCGTC
Mature:hsa-miR-342-3p:MIMAT0000753





CU-1016
34
GTCCCTGTTCGGGCGCCA
Mature:hsa-miR-1274b:MIMAT0005938





CU-1056
35
TGTGCAAATCTATGCAAAACTGA
Mature:hsa-miR-19a:MIMAT0000073





CU-1086
36
TGTAAACATCCCCGACTGGAAG
Mature:hsa-miR-30d:MIMAT0000245





CU-1065
37
AGCTACATTGTCTGCTGGGTT
Mature:hsa-miR-221:MIMAT0000278





CU-1004
38
AGAGGTAGTAGGTTGCATAGTT
Mature:hsa-let-7d:MIMAT0000065





CU-1011
39
CCGCACTGTGGGTACTTGCT
Star:hsa-miR-106b*:MIMAT0004672





CU-1010
40
AGCAGCATTGTACAGGGCTATGA
Mature:hsa-miR-103:MIMAT0000101





CU-1050
41
AACTGGCCCTCAAAGTCCCGCT
Mature:hsa-miR-193b:MIMATOOO2819





CU-1091
42
GCCCCTGGGCCTATCCTAGAA
Mature:hsa-miR-331-3p:MIMAT0000760





CU-1023
43
AGCTGGTGTTGTGAATCAGGCCGT
Mature:hsa-miR-138:MIMAT0000430





CU-1101
44
TGAGGGGCAGAGAGCGAGACTT
Mature:hsa-miR-423-5p:MIMAT0004748





CU-1066
45
AGCTACATCTGGCTACTGGGTCT
Mature:hsa-miR-222:MIMAT0000279





CU-1017
46
GTGGGGGAGAGGCTGTA
Mature:hsa-miR-1275:MIMAT0005929





CU-5001
47
CTATACGACCTGCTGCCTTTC
Star:hsa-let-7d*:MIMAT0004484





CU-1032
48
TTAATGCTAATCGTGATAGGGGT
Mature:hsa-mIR-155:MIMAT0000646





CU-1108
49
AGGGGGAAAGTTCTATAGTC
Mature:hsa-miR-625:MIMAT0003294





CU-1055
50
ACAGTAGTCTGCACATTGGTT
Mature:hsa-miR-199b-3p:MIMAT0004563





CU-1042
51
AACATTCAACGCTGTCGGTGAGTT
Mature:hsa-miR-181a:MIMAT0000256





CU-1113
52
TGGAAGACTAGTGATTTTGTTGT
Mature:hsa-miR-7:MIMAT0000252





CU-1098
53
TAATGCCCCTAAAAATCCTTAT
Mature:hsa-miR-365:MIMAT0000710





CU-1052
54
TAGCAGCACAGAAATATTGGCA
Mature:hsa-miR-195:MIMAT0000461





CU-1568
55
TGAGGTAGTAGGTTGTAT
Mature:hsa-let-7c:MIMAT0000064





CU-1103
56
TCCTGTACTGAGCTGCCCCGAG
Mature:hsa-miR-486-5p:MIMATOOO2177





CU-1014
57
TCCCTGAGACCCTTTAACCTGTGA
Mature:hsa-miR-125a-5p:MIMAT0000443





CU-1068
58
ATCACATTGCCAGGGATTTCCA
Mature:hsa-miR-23a:MIMAT0000078





CU-1019
59
TCACAGTGAACCGGTCTCTTT
Mature:hsa-mIR-128:MIMAT0000424





CU-1076
60
CACTAGATTGTGAGCTCCTGGA
Mature:hsa-miR-28-3p:MIMAT0004502





CU-1111
61
CAACAAATCACAGTCTGCCAT
Star:hsa-miR-7-1*:MIMAT0004553





CU-1062
62
CAAAGTGCTTATAGTGCAGGTAG
Mature:hsa-miR-20b-mm:MIMAT0001413





CU-1115
63
AGGTTGGGATCGGTTGCAATGCT
Star:hsa-miR-92a-1*:MIMAT0004507





CU-1126
64
TCATTCATTGCTGTCGGTGGGTT
Mature:hsa-mir-181b-1:MI0000270





CU-1096
65
TCCCCCAGGTGTGATTCTGATT
Mature:hsa-miR-361-3p:MIMAT0004682





CU-1054
66
CCCAGTGTTCAGACTACCTGTTC
Mature:hsa-miR-199a-5p:MIMAT0000231





CU-1125
67
ACCAATATTACTGTGCTGCTT
Star:hsa-miR-16-2*:MIMAT0004518





CU-1087
68
TGTAAACATCCTTGACTGGAAGCT
Mature:hsa-miR-30e:MIMAT0000692





CU-1045
69
TAAGGTGCATCTAGTGCAGATA
Mature:hsa-miR-18a:MIMAT0000072





CU-1069
70
ATCACATTGCCAGGGATTACCA
Mature:hsa-miR-23b:MIMAT0000418





CU-1044
71
ACTGCCCTAAGTGCTCCTTCTG
Star:hsa-miR-18a*:MIMAT0002891





CU-1083
72
TGTAAACATCCTCGACTGGA
Mature:hsa-miR-30a:MIMAT0000087





CU-1009
73
TACAGTACTGTGATAACTGAAG
Mature:hsa-miR-101:MIMAT0000099





CU-1030
74
CTAGACTGAAGCTCCTTGAGG
Mature:hsa-miR-151-3p:MIMAT0000757





CU-1095
75
TGGCAGTGTCTTAGCTGGTTGTT
Mature:hsa-miR-34a:MIMAT0000255





CU-1119
76
TGAGGTAGTAAGTTGTATTGTT
Mature:hsa-miR-98:MIMAT0000096





CU-1028
77
TGAGAACTGAATTCCATAGGCTGT
Mature:hsa-miR-146b-5p:MIMAT0002809





CU-1031
78
TCGAGGAGCTCACAGTCTAGTA
Mature:hsa-miR-151-5p:MIMAT0004697





CU-1100
79
AGCTCGGTCTGAGGCCCCTCAG
Mature:hsa-miR-423-3p:MIMAT0001340





CU-1038
80
ACTGCAGTGAAGGCACTTGTAG
Star:hsa-miR-17*:MIMAT0000071





CU-1040
81
ACCATCGACCGTTGATTGTA
Star:hsa-miR-181a*:MIMAT0000270





CU-1053
82
TCACCACCTTCTCCACCCAG
Mature:hsa-miR-197:MIMAT0000227





CU-1075
83
TCACAGTGGCTAAGTTCTG
Mature:hsa-miR-27b:MIMAT0000419





CU-1073
84
TCAAGTAATTCAGGATAGGTT
Mature:hsa-miR-26b:MIMAT0000083





CU-1100
85
GGGTTTACGTTGGGAGAACT
Mature:hsa-miR-629:MIMAT0004810





CU-1088
86
TGGGTTGAGAGGGCGA
Mature:hsa-miR-320a:MIMAT0000510





CU-1005
87
TGAGGTAGGAGGTTGTATAGTT
Mature:hsa-let-7e:MIMAT0000066





CU-1081
88
TGACCGATTTCTCCTGGTGTT
Star:hsa-miR-29c*:MIMAT0004673





CU-1117
89
TATTGCACTCGTCCCGGCC
Mature:hsa-miR-92b:MIMAT0003218





CU-1094
90
GGGGTGCTATCTGTGATTGA
Mature:hsa-miR-342-5p:MIMAT0004694





CU-1021
91
GCATGGGTGGTTCAGTGGTAGAA
Mature:hsa-miR-1308:MIMAT0005947





CU-1089
92
CTGGCCCTCTCTGCCCTT
Mature:hsa-miR-328:MIMAT0000752





CU-1047
93
CTGACCTATGAATTGACAGC
Mature:hsa-miR-192:MIMAT0000222





CU-1099
94
CTCCTGACTCCAGGTCCTGTG
Star:hsa-miR-378*:MIMAT0000731





CU-1105
95
CGTCAACACTTGCTGGTT
Mature:hsa-miR-505:MIMAT0002876





CU-1034
96
CGAATCATTATTTGCTGCTCT
Star:hsa-miR-15b*:MIMAT0004586





CU-5002
97
CATCGGGAATGTCGTGTCCGCC
Star:hsa-mir-425*:MI0001448





CU-1025
98
CAGTGGTTTTACCCTATGGTA
Mature:hsa-miR-140-5p:MIMAT0000431





CU-1022
99
CAGTGCAATGATGAAAGGGCAT
Mature:hsa-miR-130b:MIMAT0000691





CU-1104
100
CAGCAGCACACTGTGGTTTGT
Mature:hsa-miR-497:MIMAT0002820





CU-1106
101
CACGCTCATGCACACACCCAC
Mature:hsa-miR-574-3p:MIMAT0003239





CU-1077
102
AAGGAGCTCACAGTCTATTGAG
Mature:hsa-miR-28-5p:MIMAT0000085





CU-1123
103
TTGGTCCCCTTCAACCAGCTGT
Mature:hsa-miR-133a:MIMAT0000427





CU-1074
104
TTCACAGTGGCTAAGTTCCGA
Mature:hsa-miR-27a:MIMAT0000084





CU-1097
105
TTATCAGAATCTCCAGGGGTAA
Mature:hsa-miR-361-5p:MIMAT0000703





CU-1043
106
TGGAGAGAAAGGCAGTTCCTGAT
Mature:hsa-miR-185:MIMAT0000455





CU-1112
107
TGAGACCTCTGGGTTCTGAGCT
Mature:hsa-miR-769-5p:MIMAT0003886





CU-1122
108
TCTTTGGTTATCTAGCTGTATGA
Mature:hsa-miR-9:MIMAT0000441





CU-1109
109
TCTAGTAAGAGTGGCAGTCGA
Mature:hsa-miR-628-3p:MIMAT0003297





CU-1090
110
TATTGCACATTACTAAGTTGA
Mature:hsa-miR-32:MIMAT000009O





CU-1013
111
TAAGGCACGCGGTGAATGCCA
Mature:hsa-miR-124:MIMAT0000422





CU-1058
112
TAACACTGTCTGGTAACGATGTT
Mature:hsa-miR-200a:MIMAT0000682





CU-1059
113
GTGAAATGTTTAGGACCACTAG
Mature:hsa-miR-203:MIMAT0000264





CU-1102
114
GCAGTCCATGGGCATATACACA
Mature:hsa-miR-455-3p:MIMAT0004784





CU-1107
115
GAGCTTATTCATAAAAGTGCAG
Mature:hsa-miR-590-5p:MIMAT00032





CU-1114
116
CTGCCCTGGCCCGAGGGACCGA
Mature:hsa-miR-874:MIMAT0004911





CU-1002
117
CTATACAACCTACTGCCTTC
Star:hsa-let-7b*:MIMAT00044





CU-1049
118
CGGGGTTTTGAGGGCGAGATGA
Star:hsa-miR-193b*:MIMAT0004767





CU-1051
119
CCAGTGGGGCTGCTGTTATCTG
Star:hsa-miR-194*:MIMAT0004671





CU-1036
120
CCAGTATTAACTGTGCTGCTGA
Star:hsa-miR-16-1*:MIMAT0004489





CU-1121
121
CACCCGTAGAACCGACCTTGCG
Mature:hsa-miR-99b:MIMAT0000689





CU-1120
122
CAAGCTCGTGTCTGTGGGTCCG
Star:hsa-miR-99b*:MIMAT0004678





CU-1063
123
CAACACCAGTCGATGGGCTGTA
Star:hsa-miR-21*:MIMAT0004494





CU-1070
124
AGGCGGAGACTTGGGCAATT
Star:hsa-miR-25*:MIMAT0004498





CU-1060
125
ACTGCATTATGAGCACTTAAAGT
Star:hsa-miR-20a*:MIMAT0004493





CU-1078
126
ACTGATTTCTTTTGGTGTTCA
Star:hsa-miR-29a*:MIMAT0004503





CU-1020
127
ACTCGGCGTGGCGTCGGTCGTGG
Mature:hsa-miR-1307:MIMAT0005951





CU-1041
128
ACCACTGACCGTTGACTGTAC
Star:hsa-miR-181a-2*:MIMAT0004558





CU-1048
129
AACTGGCCTACAAAGTCCCAGT
Mature:hsa-miR-193a-3p:MIMAT0000459





CU-1127
130
TGTCTGAGCGTCGCT
preCursor:hsa-mir-1826:MI0008194





CU-1132
131
GCCGGGTACTTTCGTATTTT
NEW





CU-1137
132
GCTAAGGAAGTCCTGTGCTCAGTTTT
NEW





CU-1130
133
CCCGGGTTTCGGCACCA
NEW





CU-1136
134
TCGGGCGGGAGTGGTGGCTTT
NEW





CU-1383
135
TAGAGGCACCGCCTGCCCA
NEW





CU-1131
136
CGGGGCGCGGCCTCGCTG
NEW





CU-1135
137
CCCACGGGGGTCTCCGGGCGAG
NEW





CU-1392
138
CCCACGGGAAACAGCA
NEW





CU-1133
139
CAGCCCGGCCTGGCTCCTCCAT
NEW





CU-1134
140
CACGGAAGGTGGCCCGG
NEW





CU-1170
141
CTGTAGGCACCTGAAA
NEW





CU-1153
142
CCCCCCACTGCTAAATTTGACTGGCTT
NEW





CU-1191
143
GCCCGCATCCTCCACCA
NEW





CU-1140
144
CCCGGCCAACGCACCA
NEW





CU-1173
145
ATCCCACTCCTGACACCA
NEW





CU-1149
146
CCGGGCGGAAACACCA
NEW





CU-1159
147
TGTCAGTTTGTTAATTA
NEW





CU-1178
148
AGGGTGTGCGTGTTTTT
NEW





CU-1142
149
TCGATTCCCGGCCCATGCACCA
NEW





CU-1164
150
TGAGAGCGCTCGGTTTTT
NEW





CU-1148
151
TGGTGTGGTCTGTTGTTTT
NEW





CU-1221
152
TGTGCTCCGGAGTTACCTCGTTT
NEW





CU-1186
153
TCCCCGACACCTCCACCA
NEW





CU-1224
154
CTGTAGGCATCATCAAT
NEW





CU-1180
155
AACCGAGCGTCCAAGCTCTTTCCATTTT
NEW





CU-1155
156
TCCCCGCACCTCCACCA
NEW





CU-1212
157
TCCCCGGCACTTCCACCA
NEW





CU-1213
158
TCACCCCATAAACACCA
NEW





CU-1193
159
CTGTAGGCACCATCATAA
NEW





CU-1202
160
CCCACCAGAGTCGCCA
NEW





CU-1220
161
TTCCCCGACGGGGAGCCA
NEW





CU-1175
162
GGCGTGATTCATACCTTTT
NEW





CU-1194
163
GCGGGCGGACCTTTT
NEW





CU-1205
164
CGGCTCGAAGGACCA
NEW





CU-1187
165
CCCCGGCCCCGCGTA
NEW





CU-1206
166
CCCACCTCTGACACCA
NEW





CU-1210
167
CCACGAGGTCGGCCGG
NEW





CU-1156
168
CAGGATCGGCCCACT
NEW





CU-1197
169
ATGTGGTGGCTTACTTTT
NEW





CU-1183
170
ATCCCGGACGAGCCCA
NEW





CU-1570
171
ATCCCCAGCATCTCCACCA
NEW





CU-1146
172
AGAAAGGCCGAATTTTA
NEW





CU-1165
173
TGTCAGTTTTTACCCAA
NEW





CU-1160
174
TGTCAGTTTGAACCCAA
NEW





CU-1189
175
TGTAGTGTTTCTTACTTTA
NEW





CU-1219
176
TGGCGAAGGTCGGCCGCG
NEW





CU-1203
177
TGCAGGGCCGGCGGGGAGG
NEW





CU-1211
178
TCGGGCGGCGGGCGT
NEW





CU-1190
179
TCGGCTTTCCCTGCTAACTGGGCTTTTT
NEW





CU-1144
180
TCAGAGCGCGGGCCGACCCC
NEW





CU-1376
181
TCAACACCCACTCCCTC
NEW





CU-1138
182
TATCAATGATGCTTCTGAGA
NEW





CU-1384
183
TAACCCCAGGGTTGGTCA
NEW





CU-1154
184
GGGGTCCCCGGTAGA
NEW





CU-1171
185
GGGCGTGGGTGTGATGATTC
NEW





CU-1199
186
GGGAGGTGAGTAGGTCTG
NEW





CU-1226
187
GGAGACGTGGCCGAGAG
NEW





CU-1572
188
GCGGAATACCACGGGGA
NEW





CU-1151
189
GCAGGCGGGGGATTAGCTA
NEW





CU-1227
190
GCAGCGGAACGTCGGCGCGC
NEW





CU-1200
191
GACGTCACCCTCCTCA
NEW





CU-1152
192
CTTGGACTAACCTGGTGTA
NEW





CU-1158
193
CTGTAGGCCACCATCCA
NEW





CU-1216
194
CTGTAGGCACCACCA
NEW





CU-1188
195
CTGGTAGGCACCTGAAA
NEW





CU-1157
196
CTGATGTTGATGCATATGATGACA
NEW





CU-1207
197
CGGTGGAACCTGCATTGGTTT
NEW





CU-1181
198
CGGGGCCGGGGCTAGGGT
NEW





CU-1185
199
CGGGCCGCCCCCGCCCACCG
NEW





CU-1163
200
CGGGCCCCGGGGCTCG
NEW





CU-1366
201
CGGCCTATCCGGAATGCCCC
NEW





CU-1225
202
CGGACCTCCCTGGCCC
NEW





CU-1145
203
CGCGGCCAGTGTCCCCTTGTA
NEW





CU-1201
204
CGACACACGGCCCGTGGCGC
NEW





CU-1141
205
CCTCATAAATACCGG
NEW





CU-1172
206
CCTCACTGGGGGCTCCA
NEW





CU-1209
207
CCTCACCTGGAGCACCA
NEW





CU-1147
208
CCGTACTGGCCACCA
NEW





CU-1223
209
CCGCCGCCCCCCCCT
NEW





CU-1217
210
CCGCCCCGACCTTAGCTA
NEW





CU-1176
211
CCCGTCCGCTGCGCCA
NEW





CU-1139
212
CCCGTCCACTCCGCCA
NEW





CU-1166
213
CCCCGGCCCATGCACCA
NEW





CU-1177
214
CCCCGGCATCTCCATCA
NEW





CU-1214
215
CCCCAGTACCTCCACCA
NEW





CU-1184
216
CCCAGCGGTGCCTCCA
NEW





CU-1574
217
CCACGCTCTGCTACCA
NEW





CU-1360
218
CCACCCTGGAGCCTCCGT
NEW





CU-1150
219
ATGGTAGGCACCTGAAA
NEW





CU-1162
220
ATGGGCGGTCCTCGTT
NEW





CU-1179
221
ATGGCCTGGACCCCACTCCT
NEW





CU-1161
222
ATGGCCGCATATATTTT
NEW





CU-1218
223
ATCCTGTTCGTGACGCCA
NEW





CU-1204
224
ATCCTGCTCACAGCCCCA
NEW





CU-1168
225
AGCGAGGGTTCCGCCGGCC
NEW





CU-1195
226
ACTGGGGAGGGGGAGGAGCCTCGAGG
NEW





CU-1215
227
ACCCCGAGGGGACGGGCG
NEW





CU-1208
228
ACAGCGCTGTGTTCCCGT
NEW





CU-1192
229
ACAAAAAAAAAAGCCCAACCCT
NEW





CU-1373
230
AACTAAAACCCCTACGCA
NEW





CU-1196
231
AAAGGAGCCGAATCTTT
NEW





CU-1251
232
CCCACCCAGGGACGCCA
refseqGeneIntron-annotate





CU-1254
233
CCCCGGCACCTCCACCA
refseqGeneIntron-annotate





CU-1298
234
ATCCCGGACGAGCCCCCA
refseqGeneIntron-annotate





CU-1229
235
CCCACGTTGGGCGCCA
refseqGeneIntron-annotate





CU-1276
236
TCGATTCCCGGCCAATGCACCA
refseqGeneIntron-annotate





CU-1303
237
TCCCACTTCTGACACCA
refseqGeneIntron-annotate





CU-1270
238
TCGTAGGCACCTGAAA
refseqGeneIntron-annotate





CU-1242
239
TCCCCGTACGGGCCACCA
refseqGeneIntron-annotate





CU-1273
240
TTGACCGCTCTGACCA
refseqGeneIntron-annotate





CU-1328
241
CCCAGCGGGGCCTCCA
refseqGeneIntron-annotate





CU-1257
242
CAGGAACGGTGCACCA
mRNAaLL-annotate;refseqGeneIntron-annotate





CU-1241
243
AGTCCCATCTGGGTCGCCA
refseqGeneIntron-annotate





CU-1575
244
CCCCCCACTGCTAAATTTGACTGGA
refseqGeneIntron-annotate;rnaGene-annotate





CU-1274
245
GTTTGTTAATTAACCCAA
refseqGeneIntron-annotate





CU-1243
246
GTCCCTTCGTGGTCGCCA
refseqGeneIntron-annotate





CU-1284
247
CTGTAGCACCTGAAA
mRNAall-annotate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-1300
248
TCCTCACACGGGGCACCA
refseqGeneIntron-annotate





CU-1278
249
TAACGGCCGCGGTACCC
refseqGeneIntron-annotate





CU-1264
250
GAGGGGGACCAAAAAAAA
refseqGeneIntron-annotate





CU-1275
251
CCCGCATTCTCCACCA
refseqGeneIntron-annotate





CU-1246
252
GGGGGGTAAAAAAAAA
refseqGeneIntron-annotate





CU-1315
253
TCCACCGCTGCCACCA
refseqGeneIntron-annotate





CU-1277
254
GAGCCATGATGATACCACTGAGC
refseqGeneIntron-annotate





CU-1288
255
CGTCCATGATGTTCCGCAA
mRNAall-annotate;snoRNA-annotate;piRNA-annotate;





wgRNA-annotate;refseqGeneIntron-annotate





CU-1234
256
CATCCTCTGCTACCA
mRNAall-annotate;refseqGeneIntron-annotate;





exEID-annotate





CU-1345
257
AGAACACTACGAGCCACA
mRNA-annotate;refseqGeneIntron-annotate





CU-1352
258
ACCCCACTTCTGGTACCA
refseqGeneIntron-annotate





CU-1323
259
TGTATTGTGAGACATTC
mRNAall-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;rnaGene-annotate





CU-1324
260
TCTCGGTGGAACCTCCA
refseqGeneIntron-annotate





CU-1302
261
TCCCCGGCACCTCCAA
refseqGeneIntron-annotate





CU-1269
262
TACCGAGCCTGGTGATAGC
refseqGeneIntron-annotate





CU-1281
263
GCAGCGCCAGCCTCCCGCCCTAC
refseqGeneIntron-annotate





CU-1292
264
CCGCCTGGGGAGTAC
refseqGeneIntron-annotate





CU-1339
265
ATCCCCAGCACCTCCACCA
refseqGeneIntron-annotate





CU-1293
266
AGCAGTGATGTCCTGAAAATTCTGAAG
refseqGeneIntron-annotate





CU-1307
267
ACCCCACTATGCTTAGCCCT
mRNA-annotate;refseqGeneIntron-annotate





CU-1294
268
AAAGGACCTGGCGGTGCTTC
mRNA-annotate;refseqGeneIntron-annotate





CU-1325
269
TTGCCACACTGCAACACCTT
refseqGeneIntron-annotate





CU-1333
270
TTCCTTGGATGTCTGAGTGAC
refseqGeneIntron-annotate





CU-1310
271
TTAACCACCAAGATCGCTGATGCAC
refseqGeneIntron-annotate





CU-1299
272
TGTTCGCCGACCGTTGA
refseqGeneIntron-annotate





CU-1265
273
TGGGGTCTGGGAGGGA
refseqGeneIntron-annotate





CU-1322
274
TGGGAGAGCAGGGTATTGT
refseqGeneIntron-annotate





CU-1279
275
TGCAGATGATGTAAAAGA
snoRNA-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;rnaGene-annotate





CU-1267
276
TCGCTATGATGATGGATTCCAAAA
mRNAall-annotate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1308
277
TCCGAAAGGCCTCCCGCACCG
refseqGeneIntron-annotate





CU-1331
278
TCCCGCACCTCCACCA
refseqGeneIntron-annotate





CU-1297
279
TAGATGAATAGGTAAAGAG
refseqGeneIntron-annotate





CU-1235
280
GTGTATGATGACCTCATGTAGCCTGAAC
refseqGeneIntron-annotate





CU-1253
281
GTGAAGCGTTCCATATTTTT
mRNAall-annotate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1348
282
GGGGGGGGGTTTGGAA
refseqGeneIntron-annotate





CU-1337
283
GGGGGGAGGGAAGGCAA
refseqGeneIntron-annotate





CU-1316
284
GGGGGCTGGGCTGGGTA
refseqGeneIntron-annotate





CU-1343
285
GGGGCCGCCGCCTGTGT
refseqGeneIntron-annotate





CU-1326
286
GGGAGTCCGCGGCGAGC
refseqGeneIntron-annotate





CU-1329
287
GGGACCTGGGGACCA
refseqGeneIntron-annotate





CU-1286
288
GGCTTGGTCTAGGGGTA
refseqGeneIntron-annotate





CU-1332
289
GGCTGGGACCCTGGACAC
refseqGeneIntron-annotate





CU-1262
290
GGCGACCTGCGACTCCTT
refseqGeneIntron-annotate





CU-1236
291
GGAGGGGGGAAACAAA
refseqGeneIntron-annotate





CU-1317
292
GGAGGGGGGAAAAAAAAAA
computGene-annotate;refseqGeneIntron-annotate





CU-1327
293
GGAAGACCTGCACCACTGTC
mRNAall-annotate;computGene-annotate;





refseqGeneIntron-annotate;exeID-annotate





CU-1239
294
GCGGGTGTCAGGCCT
refseqGeneIntron-annotate





CU-1266
295
GCCGGGCGTGGTGGTCTG
refseqGeneIntron-annotate





CU-1261
296
GCCGCCGAGACCCCAGGACCC
refseqGeneIntron-annotate





CU-1260
297
GCAGCCGTGCTTTTA
refseqGeneIntron-annotate





CU-1259
298
GCAAATGATGCCCTCTGATC
refseqGeneIntron-annotate





CU-1349
299
GAGGGGGGTCAAAAAAA
refseqGeneIntron-annotate





CU-1272
300
CTTGATGATGAGCAGGATCTGAGT
refseqGeneIntron-annotate





CU-1341
301
CTGTAGGCACTGAAA
refseqGeneIntron-annotate





CU-1231
302
CTGTAGGCACCATTAA
refseqGeneIntron-annotate





CU-1313
303
CTGCTTAAGTCCTGACCAG
refseqGeneIntron-annotate





CU-1296
304
CTGAGCACCTTTCCCTTCC
refseqGeneIntron-annotate





CU-1291
305
CTAGCCCCAAACCCA
piRNA-annotate;refseqGeneIntron-annotate





CU-1245
306
CGGTCACACGATTAACCCA
mRNA-annotate;refseqGeneIntron-annotate





CU-1338
307
CGGGGGGAGGAAAAAA
refseqGeneIntron-annotate





CU-1268
308
CGGGGGGAAAAAAAAA
refseqGeneIntron-annotate





CU-1290
309
CGGGGCCGCACGCGC
refseqGeneIntron-annotate





CU-1319
310
CGGGAGTGGGGTGGCGCCCAG
refseqGeneIntron-annotate





CU-1318
311
CGGGAGCCCCGGGTT
refseqGeneIntron-annotate





CU-1569
312
CGGACCTGATAAATTCCCAC
refseqGeneIntron-annotate





CU-1320
313
CGCGGCTCTTGCGGT
refseqGeneIntron-annotate





CU-1249
314
CGCCTGAGTCAGAAC
refseqGeneIntron-annotate





CU-1240
315
CGCCGCCGCCCCCCCC
mRNAall-annotate;refseqGeneIntron-annotate;





exEID-annotate





CU-1351
316
CCTTCCTTGGATGTCTGAGTGAG
mRNAall-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;rnaGene-annotate





CU-1354
317
CCTCGCTGGGGCCTCCA
refseqGeneIntron-annotate





CU-1233
318
CCTCACAGGGACGCCA
refseqGeneIntron-annotate





CU-1289
319
CCTAGGAGTGCGACAATT
mRNAall-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;rnaGene-annotate





CU-1283
320
CCGCTCTGAGACCTA
refseqGeneIntron-annotate





CU-1228
321
CCGCCCGTCACCCTCCTCAAGTA
mRNA-annotate;refseqGeneIntron-annotate





CU-1344
322
CCCGGGCGGCACACCA
refseqGeneIntron-annotate





CU-1271
323
CCCGCGGGCTTGCTGGGCGTCCC
refseqGeneIntron-annotate





CU-1321
324
CCCCTGCGATTTCCCCA
refseqGeneIntron-annotate;rnaGene-annotate





CU-1285
325
CCCCGGCATCTCCACTA
refseqGeneIntron-annotate





CU-1571
326
CCCCAGTGAGTGCCCTCTTCC
refseqGeneIntron-annotate





CU-1353
327
CCCAGAGACGCCGTCCTCGA
refseqGeneIntron-annotate





CU-1355
328
CCCACCGAGGATGCCA
refseqGeneIntron-annotate





CU-1238
329
CCATCACTACCCACCA
refseqGeneIntron-annotate





CU-1347
330
CCACTCCAGCCTAGCCCC
refseqGeneIntron-annotate





CU-1295
331
CAGTACAGGCACACCTC
refseqGeneIntron-annotate





CU-1256
332
CACGTCGGGGTCCCCA
refseqGeneIntron-annotate





CU-1250
333
CACGATTAACCCAAGTC
mRNA-annotate;refseqGeneIntron-annotate





CU-1305
334
CACCACACCCGGGCCA
refseqGeneIntron-annotate





CU-1287
335
CAACACAGGCATGCT
refseqGeneIntron-annotate





CU-1314
336
ATAGGGTTTACGACCTCGATGTTGGATCA
refseqGeneIntron-annotate





CU-1311
337
ATACCATGATGAACAATAGCTGAGA
refseqGeneIntron-annotate





CU-1282
338
AGGGTTCAGCTGTCTC
refseqGeneIntron-annotate





CU-1350
339
AGGCTGTGATGGACCTGGCTGAGCCTG
refseqGeneIntron-annotate





CU-1252
340
AGAGAGTAGGGGGAGGT
refseqGeneIntron-annotate





CU-1334
341
ACTGTCCCTGTCTACTA
refseqGeneIntron-annotate





CU-1340
342
ACCGCATCTGGCCTATTTTT
refseqGeneIntron-annotate





CU-1342
343
ACCAGACCTCCTGTGCGAAG
refseqGeneIntron-annotate





CU-1304
344
ACAGCCCGGATCCCAGCCCACTTA
refseqGeneIntron-annotate





CU-1230
345
ACACTGAGCCACAACCCA
refseqGeneIntron-annotate





CU-1312
346
AAGGGCTTGGCTTAATTA
refseqGeneIntron-annotate





CU-1255
347
AACCCGGAAGGCGGAGGTTGCGG
computGene-annotate;refseqGeneIntron-annotate





CU-1336
348
AACCCCACACCAACC
refseqGeneIntron-annotate





CU-1346
349
AACAAGCTTCTTTGACGTCCCATCCAC
refseqGeneIntron-annotate





CU-1369
350
TCCCCGGCATCTCCACCA
computGene-annotate





CU-1370
351
CTGATTGCTCCTGTCTGATT
mRNAall-annotate;exEID-annotate;rnaGene-annotate





CU-1371
352
TCTAGAGGAGCCTGTTCTGTA
mRNA-annotate





CU-1381
353
TCGATTCCCGGTCAGGGAACCA
repeats-annotate





CU-1380
354
ATAGGTTTGGTCCTAGCCTTTCT
piRNA-annotate





CU-1363
355
CGTTCGCGCTTTCCCCTG
rnaGene-annotate





CU-1396
356
TAAGTGTTTGTGGGTTA
rnaGene-annotate





CU-1361
357
GGCGGCGGGAGACCCA
computGene-annotate





CU-1359
358
CCCCGGCAGGTTTGA
rnaGene-annotate





CU-1573
359
TGCCGTGATCGTATAGTGGTTA
piRNA-annotate





CU-1169
360
TCAGACTACTCTCCTCCGCCCATT
mRNAall-annotate





CU-1167
361
GGACACAGAGGCTTCG
mRNAall-annotate





CU-1395
362
CTGACAGCCGGGGTTTTGGA
computGene-annotate





CU-1365
363
CGGCGGGGCCTGGAGTCTG
mRNAall-annotate;computGene-annotate;exEID-annotate





CU-1375
364
CCTGGCTCGCTGCGCCA
computGene-annotate





CU-1182
365
CCGCCCCACCCCGCGCGC
mRNAall-annotate;exEID-annotate





CU-1174
366
CCCGAACGCTGCCAACCC
exEID-annotate





CU-1385
367
AGACCCGCGGGCGCTCTCCAGTC
rnaGene-annotate





CU-1524
368
CCCCCACAACCGCGCTTGACTAGC
mRNAall-annotate;yRNA-eliminate;rnaGene-annotate





CU-1453
369
CCCTGCTCGCTGCGCCA
refseqGeneExon-eliminate





CU-1477
370
CTCCCACTGCTTCACTTGACTAGC
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1466
371
CCCATCCTCGTCGCCA
refseqGeneExon-eliminate





CU-1222
372
TCACGTCGGGGTCACCA
Morozov-eliminate





CU-1388
373
TCCCTGGTGGTCTAGTGGTTAGGATTCG
tRNAcomputational-annotate;rnaGene-annotate;HStRNA-





eliminate;piRNA-annotate





CU-1428
374
GGTAGCGTGGCCGAG
RNAcomputational-annotate;tRNA-eliminate;





HStRNA-eliminate;rnaGene-annotate





CU-1488
375
TCCTGCCGCGGTCGCCA
refseqGeneExon-eliminate





CU-1557
376
GGAGAGAACGCGGTCTGAGTGGT
snoRNA-eliminate;wgRNA-annotate;rnaGene-annotate





CU-1379
377
TCGGGTGCGAGAGGTCCCGGGT
tRNAcomputational-annotate;HStRNA-eliminate;





rnaGene-annotate





CU-1542
378
GGCTGGTCCGATGGTAGTGGGTT
mRNAall-annotate;yRNA-eliminate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-1550
379
CGGAAGCGTGCTGGGCCC
tRNAcomputational-annotate;tRNA-eliminate; 





rnaGene-annotate;HStRNAeliminate;piRNA-annotate





CU-1232
380
CCCGGGCGGCGCACCA
Morozov-eliminate;refseqGeneIntron-annotate





CU-1513
381
GCGGGTGATGCGAACTGGAGTCTGAGC
computGene-annotate;snoRNA-annotate;snoRNA-





eliminate; wgRNA-annotate;rnaGene-annotate





CU-1368
382
GACGAGGTGGCCGAGTGG
tRNAcomputational-annotate;rnaGene-annotate;





HStRNA-eliminate;piRNA-annotate





CU-1474
383
GGGGGTGTAGCTCAG
RNAcomputational-annotate;tRNA-eliminate;rnaGene-





annotate;HStRNA-eliminate;piRNA-annotate





CU-1470
384
CTCCTGGCTGGCTCGCCA
mRNAall-annotate;computGene-annotate;





refseqGeneExon-eliminate;exEID-annotate





CU-1471
385
CGGGAGGCCCGGGTT
rnaGene-annotate;tRNAcomputational-annotate;piRNA-





annotate;tRNAeliminate;refseqGeneIntron-





annotate;mRNA-annotate;HStRNA-eliminate





CU-1538
386
GGCTGGTCCGAGTGCAGTGGTGTTTA
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1486
387
CTGCTGTGATGACATTC
computGene-annotate;snoRNA-annotate;snoRNA-





eliminate;wgRNA-annotate;rnaGene-annotate





CU-1386
388
GTCACGCGGGAGACC
RNAcomputational-annotate;mRNAall-annotate;





rnaGene-annotate;HStRNA-eliminate;piRNA-annotate





CU-1382
389
CCTCGTTAGTATAGTGGTGAGTATCCC
tRNAcomputational-annotate;rnaGene-annotate;





HStRNA-eliminate;piRNA-annotate





CU-1433
390
GGCCGGTTAGCTCAG
mRNAall-annotate;exE ID-annotate;rnaGene-annotate;





tRNAcomputational-annotate;piRNA-annotate;





refseqGeneIntron-annotate;refseqGeneExon-





eliminate;HStRNA-eliminate





CU-1403
391
GCATTGGTGGTTCAGTGGTAGA
rnaGene-annotate;tRNAcomputational-annotate;piRNA-





annotate;tRNA-eliminate;refseqGeneIntron-annotate;





HStRNA-eliminate





CU-1362
392
CTGTCACGCGGGAGA
RNAcomputational-annotate;mRNAall-annotate;





rnaGene-annotate;HStRNA-eliminate;piRNA-annotate





CU-1490
393
CTACGGGGATGATTTT
mRNAall-annotate;snoRNA-annotate;snoRNA-eliminate;





wgRNA-annotate;rnaGene-annotate





CU-1469
394
CCAGGGGCTGAGGGCA
snoRNA-eliminate;refseqGeneIntron-annotate;





wgRNA-annotate





CU-1457
395
TCTCACTACTGCACTTGACTA
mRNAall-annotate;yRNA-eliminate;refseqGeneIntron-





annotate;exEID-annotate;rnaGene-annotate





CU-1440
396
GGTTATCACGTTCGCC
RNAcomputational-annotate;tRNA-eliminate;rnaGene-





annotate;HStRNA-eliminate;piRNA-annotate





CU-1528
397
AGGGGTATGATTCTCGCT
tRNAcomputational-annotate;tRNA-eliminate;HStrNA-





eliminate;rnaGene-annotate





CU-1545
398
CCACGAGGAAGAGAGGTAGC
snoRNA-eliminate;wgRNA-annotate;snoRNA-annotate





CU-1244
399
GTCAGGATGGCCGAGCGGTCT
RNAcomputational-annotate;rnaGene-annotate;HStRNA-





eliminate;refseqGeneIntron-annotate





CU-1390
400
GGGGATGTAGCTCAG
tRNAcomputational-annotate;rnaGene-annotate;HStRNA-





eliminate;piRNA-annotate





CU-1377
401
GCAGCGATGGCCGAG
tRNAcomputational-annotate;HStRNA-eliminate;rnaGene-





annotate
















TABLE 1B







List of known and newly identified mature miRNAs including information on frequencies.












Corrected Counts
Frequencies

















SEQ
Naïve
Memory
Centroblasts
Ramos
Naïve
Memory
Centroblasts
Ramos


Mature miRNA sequence
ID NO.
(N)
(M)
(CB)
(RA)
(N)
(M)
(CB)
(RA)



















TGTAGTGTTTCCTACTTTATGGA
1
1329
592
635
391
38.5
19.83
24.93
16.02





TAGCTTATCAGACTGATGTTGA
2
196
353
144
13
5.68
11 .83
5.65
0.53





TAAAGTGCTTATAGTGCAGGTAG
3
54
19
49.82
257.89
1.56
0.64
1 .96
10.57





TAGCAGCACATCATGGTTTACA
4
38
61
176.84
105
1.1
2.04
6.94
4.3





TAGCAGCACGTAAATATTGGCG
5
131
97
53
35
3.79
3.25
2.08
1.43





TGAGGTAGTAGGTTGTATAGTT
6
62.84
78.99
92.19
63.25
1.82
2.65
3.62
2.59





TATTGCACTTGTCCCGGCCTGT
7
17
21
46
207
0.49
0.7
1 .81
8.48





TCCCACCGCTGCCACCA
8
68
97
25
28
1.97
3.25
0.98
1.15





TGAGGTAGTAGATTGTATAGTT
9
41.28
44
64
51.38
1.2
1.47
2.51
2.11





TAGCACCATCTGAAATCGGTTA
10
78
60
42
22
2.26
2.01
1.65
0.9





TAGCAGCACATAATGGTTTGT
11
90
39
32.16
8
2.61
1.31
1.26
0.33





CCCATAAAGTAGAAAGCACTA
12
88
53
7
10
2.55
1.78
0.27
0.41





TGAGGTAGTAGTTTGTACAGTT
13
41.28
47
30.77
21.16
1.2
1.57
1.21
0.87





TGAGGTAGTAGTTTGTGCTGTT
14
23
24
32
42
0.67
0.8
1 .26
1.72





TAGCACCATTTGAAATCGGTTA
15
44
41
16
1
1 .27
1.37
0.63
0.04





TGTAAACATCCTACACTCTCAGC
16
27
25
26
20
0.78
0.84
1.02
0.82





CAAAGTGCTTACAGTGCAGGTAG
17
9
6
10.18
65.04
0.26
0.2
0.4
2.67





CATTGCACTTGTCTCGGTCTGA
18
11
9
34
39
0.32
0.3
1.33
1.6





CAACGGAATCCCAAAAGCAGCTG
19
17
21
36
18
0.49
0.7
1.41
0.74





TGTGCAAATCCATGCAAAACTGA
20
0
1
25
65
0
0.03
0.98
2.66





TACCACAGGGTAGAACCACGGA
21
31
22
17
21
0.9
0.74
0.67
0.86





TGTAAACATCCTACACTCAGCT
22
31
11
27
16
0.9
0.37
1.06
0.66





TGAGGTAGTAGGTTGTGTGGTT
23
19.48
19
29
5.08
0.56
0.64
1.14
0.21





TAGCACCATTTGAAATCAGTGTT
24
22
14
12
4
0.64
0.47
0.47
0.16





TAAAGTGCTGACAGTGCAGAT
25
7
6
13
26
0.2
0.2
0.51
1.07





TCCCTGTCCTCCAGGAGCTC
26
6
3
3
32
0.17
0.1
0.12
1.31





TTCAAGTAATCCAGGATAGGCT
27
2
8
13
16
0.06
0.27
0.51
0.66





CAAAGTGCTGTTCGTGCAGGTAG
28
9
2
13
14
0.26
0.07
0.51
0.57





TGTCAGTTTGTCAAATACCCCA
29
25
10
1
0
0.72
0.34
0.04
0





TGAGAACTGAATTCCATGGGTT
30
4
7
21
4
0.12
0.23
0.82
0.16





TCTCCCAACCCTTGTACCAGT
31
12
18
2
0
0.35
0.6
0.08
0





TCCCTGAGACCCTAACTTGTGA
32
0
1
28
2
0
0.03
1.1
0.08





TCTCACACAGAAATCGCACCCGTC
33
10
8
8
3
0.29
0.27
0.31
0.12





GTCCCTGTTCGGGCGCCA
34
12
10
6
1
0.35
0.34
0.24
0.04





TGTGCAAATCTATGCAAAACTGA
35
0
0
9
19
0
0
0.35
0.78





TGTAAACATCCCCGACTGGAAG
36
7
3
14
3
0.2
0.1
0.55
0.12





AGCTACATTGTCTGCTGGGTT
37
17
6
4
0
0.49
0.2
0.16
0





AGAGGTAGTAGGTTGCATAGTT
38
2
4
10
10
0.06
0.13
0.39
0.41





CCGCACTGTGGGTACTTGCT
39
8
6
2
8
0.23
0.2
0.08
0.33





AGCAGCATTGTACAGGGCTATGA
40
1
1
10
11
0.03
0.03
0.39
0.45





AACTGGCCCTCAAAGTCCCGCT
41
0
0
2
21
0
0
0.08
0.86





GCCCCTGGGCCTATCCTAGAA
42
1
0
10
10
0.03
0
0.39
0.41





AGCTGGTGTTGTGAATCAGGCCGT
43
0
0
15
5
0
0
0.59
0.2





TGAGGGGCAGAGAGCGAGACTT
44
5
1
7
4
0.14
0.03
0.27
0.16





AGCTACATCTGGCTACTGGGTCT
45
6
6
5
0
0.17
0.2
0.2
0





GTGGGGGAGAGGCTGTA
46
2
6
3
5
0.06
0.2
0.12
0.2





CTATACGACCTGCTGCCTTTC
47
6
3
4
1
0.17
0.1
0.16
0.04





TTAATGCTAATCGTGATAGGGGT
48
3
4
5
1
0.09
0.13
0.2
0.04





AGGGGGAAAGTTCTATAGTC
49
0
2
0
11
0
0.07
0
0.45





ACAGTAGTCTGCACATTGGTT
50
0
0
13
0
0
0
0.51
0





AACATTCAACGCTGTCGGTGAGTT
51
0
0
7
6
0
0
0.27
0.25





TGGAAGACTAGTGATTTTGTTGT
52
1
1
1
8
0.03
0.03
0.04
0.33





TAATGCCCCTAAAAATCCTTAT
53
0
0
6
4
0
0
0.24
0.16





TAGCAGCACAGAAATATTGGCA
54
4
0
5
0
0.12
0
0.2
0





TGAGGTAGTAGGTTGTAT
55
0.11
0.01
0.01
0.13
0
0
0
0.01





TCCTGTACTGAGCTGCCCCGAG
56
0
0
7
1
0
0
0.27
0.04





TCCCTGAGACCCTTTAACCTGTG
57
0
0
8
0
0
0
0.31
0





ATCACATTGCCAGGGATTTCCA
58
0
0.5
7
0
0
0.02
0.27
0





TCACAGTGAACCGGTCTCTTT
59
1
0
0
6
0.03
0
0
0.25





CACTAGATTGTGAGCTCCTGGA
60
2
0
4
1
0.06
0
0.16
0.04





CAACAAATCACAGTCTGCCAT
61
3
0
1
3
0.09
0
0.04
0.12





CAAAGTGCTTATAGTGCAGGTAG
62
0
1
1
0.08
0
0.03
0.04
0





AGGTTGGGATCGGTTGCAATGCT
63
0
0
0
7
0
0
0
0.29





ACATTCATTGCTGTCGGTGGGTT
64
0
0
1
6
0
0
0.04
0.25





TCCCCCAGGTGTGATTCTGATT
65
4
1
0
1
0.12
0.03
0
0.04





CCCAGTGTTCAGACTACCTGTTC
66
0
0
6
0
0
0
0.24
0





ACCAATATTACTGTGCTGCTT
67
1
1
2
2
0.03
0.03
0.08
0.08





TGTAAACATCCTTGACTGGAAGCT
68
2
0
3
0
0.06
0
0.12
0





TAAGGTGCATCTAGTGCAGATA
69
0
0
1
4
0
0
0.04
0.16





ATCACATTGCCAGGGATTACCA
70
0
0.5
3
1
0
0.02
0.12
0.04





ACTGCCCTAAGTGCTCCTTCTG
71
0
0
0
5
0
0
0
0.2





TGTAAACATCCTCGACTGGA
72
1
0
3
0
0.03
0
0.12
0





TACAGTACTGTGATAACTGAAG
73
1
0
0
3
0.03
0
0
0.12





CTAGACTGAAGCTCCTTGAGG
74
2
1
1
0
0.06
0.03
0.04
0





TGGCAGTGTCTTAGCTGGTTGTT
75
0
1
2
0
0
0.03
0.08
0





TGAGGTAGTAAGTTGTATTGTT
76
0
1
1
1
0
0.03
0.04
0.04





TGAGAACTGAATTCCATAGGCTGT
77
1
0
2
0
0.03
0
0.08
0





TCGAGGAGCTCACAGTCTAGTA
78
1
0
1
1
0.03
0
0.04
0.04





AGCTCGGTCTGAGGCCCCTCAG
79
0
0
2
1
0
0
0.08
0.04





ACTGCAGTGAAGGCACTTGTAG
80
0
0
0
3
0
0
0
0.12





ACCATCGACCGTTGATTGTA
81
0
1
0
2
0
0.03
0
0.08





TTCACCACCTTCTCCACCCAG
82
0
0
0
2
0
0
0
0.08





TTCACAGTGGCTAAGTTCTG
83
0
0
2
0
0
0
0.08
0





TTCAAGTAATTCAGGATAGGTT
84
0
0
1
1
0
0
0.04
0.04





TGGGTTTACGTTGGGAGAACT
85
0
0
0
2
0
0
0
0.08





TGGGTTGAGAGGGCGA
86
1
0
1
0
0.03
0
0.04
0





TGAGGTAGGAGGTTGTATAGTT
87
0
0
1.02
0
0
0
0.04
0





TGACCGATTTCTCCTGGTGTT
88
2
0
0
0
0.06
0
0
0





TATTGCACTCGTCCCGGCC
89
0
0
1
1
0
0
0.04
0.04





GGGGTGCTATCTGTGATTGA
90
2
0
0
0
0.06
0
0
0





GCATGGGTGGTTCAGTGGTAGA
91
0
0
2
0
0
0
0.08
0





CTGGCCCTCTCTGCCCTT
92
0
0
1
1
0
0
0.04
0.04





CTGACCTATGAATTGACAGC
93
0
0
0
2
0
0
0
0.08





CTCCTGACTCCAGGTCCTGTG
94
0
0
0
2
0
0
0
0.08





CGTCAACACTTGCTGGTT
95
0
0
1
1
0
0
0.04
0.04





CGAATCATTATTTGCTGCTCT
96
0
0
1
1
0
0
0.04
0.04





CATCGGGAATGTCGTGTCCGCC
97
0
2
0
0
0
0.07
0
0





CAGTGGTTTTACCCTATGGTA
98
0
0
1
1
0
0
0.04
0.04





CAGTGCAATGATGAAAGGGCAT
99
0
0
2
0
0
0
0.08
0





CAGCAGCACACTGTGGTTTGT
100
0
0
2
0
0
0
0.08
0





CACGCTCATGCACACACCCAC
101
0
0
2
0
0
0
0.08
0





AAGGAGCTCACAGTCTATTGAG
102
0
0
2
0
0
0
0.08
0





TTGGTCCCCTTCAACCAGCTGT
103
0
0
1
0
0
0
0.04
0





TTCACAGTGGCTAAGTTCCGA
104
0
1
0
0
0
0.03
0
0





TTATCAGAATCTCCAGGGGTAA
105
1
0
0
0
0.03
0
0
0





TGGAGAGAAAGGCAGTTCCTGAT
106
0
0
1
0
0
0
0.04
0





TGAGACCTCTGGGTTCTGAGCT
107
0
0
0
1
0
0
0
0.04





TCTTTGGTTATCTAGCTGTATGA
108
0
0
0
1
0
0
0
0.04





TCTAGTAAGAGTGGCAGTCGA
109
0
0
0
1
0
0
0
0.04





TATTGCACATTACTAAGTTGA
110
1
0
0
0
0.03
0
0
0





TAAGGCACGCGGTGAATGCCA
111
1
0
0
0
0.03
0
0
0





TAACACTGTCTGGTAACGATGTT
112
0
0
0
1
0
0
0
0.04





GTGAAATGTTTAGGACCACTAG
113
0
0
1
0
0
0
0.04
0





GCAGTCCATGGGCATATACACA
114
0
0
1
0
0
0
0.04
0





GAGCTTATTCATAAAAGTGCAG
115
0
0
1
0
0
0
0.04
0





CTGCCCTGGCCCGAGGGACCG
116
0
0
0
1
0
0
0
0.04





CTATACAACCTACTGCCTTC
117
0
0
1
0
0
0
0.04
0





CGGGGTTTTGAGGGCGAGATGA
118
0
0
0
1
0
0
0
0.04





CCAGTGGGGCTGCTGTTATCTG
119
0
0
1
0
0
0
0.04
0





CCAGTATTAACTGTGCTGCTGA
120
0
0
0
1
0
0
0
0.04





CACCCGTAGAACCGACCTTGCG
121
0
0
1
0
0
0
0.04
0





CAAGCTCGTGTCTGTGGGTCCG
122
0
0
1
0
0
0
0.04
0





CAACACCAGTCGATGGGCTGTA
123
0
0
0
1
0
0
0
0.04





AGGCGGAGACTTGGGCAATT
124
0
0
0
1
0
0
0
0.04





ACTGCATTATGAGCACTTAAAGT
125
0
0
0
1
0
0
0
0.04





ACTGATTTCTTTTGGTGTTCA
126
0
0
0
1
0
0
0
0.04





ACTCGGCGTGGCGTCGGTCGTGG
127
0
0
0
1
0
0
0
0.04





ACCACTGACCGTTGACTGTAC
128
0
1
0
0
0
0.03
0
0





AACTGGCCTACAAAGTCCCAGT
129
0
0
1
0
0
0
0.04
0





TGTCTGAGCGTCGCT
130
0
0
4
0
0
0
0.16
0





GCCGGGTACTTTCGTATTTT
131
3
3
0
34
0.09
0.1
0
1.39





GCTAAGGAAGTCCTGTGCTCAGT
132
0
0
1
19
0
0
0.04
0.78


TTT














CCCGGGTTTCGGCACCA
133
0
3
0
1
0
0.1
0
0.04





TCGGGCGGGAGTGGTGGCTTT
134
0
0
0
1
0
0
0
0.04





TAGAGGCACCGCCTGCCCA
135
0
1
0
0
0
0.03
0
0





CGGGGCGCGGCCTCGCTG
136
1
0
0
0
0.03
0
0
0





CCCACGGGGGTCTCCGGGCGAG
137
1
0
0
0
0.03
0
0
0





CCCACGGGAAACAGCA
138
0
1
0
0
0
0.03
0
0





CAGCCCGGCCTGGCTCCTCCAT
139
0
1
0
0
0
0.03
0
0





CACGGAAGGTGGCCCGG
140
0
1
0
0
0
0.03
0
0





CTGTAGGCACCTGAAA
141
1
0
0
148.06
0.03
0
0
6.07





CCCCCCACTGCTAAATTTGACTGG
142
18
8
61
22
0.52
0.27
2.39
0.9


CTT














GCCCGCATCCTCCACCA
143
38
61
2
4
1.1
2.04
0.08
0.16





CCCGGCCAACGCACCA
144
28.76
36.71
4.12
4
0.83
1.23
0.16
0.16





ATCCCACTCCTGACACCA
145
7
13
11.31
3
0.2
0.44
0.44
0.12





CCGGGCGGAAACACCA
146
9
9
6
0
0.26
0.3
0.24
0





TGTCAGTTTGTTAATTA
147
1
1
3
16
0.03
0.03
0.12
0.66





AGGGTGTGCGTGTTTTT
148
0
0
0
20
0
0
0
0.82





TCGATTCCCGGCCCATGCACCA
149
1
2
10
4
0.03
0.07
0.39
0.16





GAGAGCGCTCGGTTTTT
150
0
0
1
9
0
0
0.04
0.37





TGGTGTGGTCTGTTGTTTT
151
0
0
0
9
0
0
0
0.37





TGTGCTCCGGAGTTACCTCGTTT
152
0
0
0
8
0
0
0
0.33





TCCCCGACACCTCCACCA
153
2
2
2
1
0.06
0.07
0.08
0.04





CTGTAGGCATCATCAAT
154
0
0
1
3.57
0
0
0.04
0.15





CCGAGCGTCCAAGCTCTTTCCATTT
155
0
0
0
5
0
0
0
0.2


TT














TCCCCGCACCTCCACCA
156
0
2
1
1
0
0.07
0.04
0.04





TCCCCGGCACTTCCACCA
157
0
3
0
0
0
0.1
0
0





TCACCCCATAAACACCA
158
2
1
0
0
0.06
0.03
0
0





CTGTAGGCACCATCATAA
159
0
0
0
2.43
0
0
0
0.1





CCCACCAGAGTCGCCA
160
1
2
0
0
0.03
0.07
0
0





TTCCCCGACGGGGAGCCA
161
1
0
0
1
0.03
0
0
0.04





GGCGTGATTCATACCTTTT
162
0
0
0
2
0
0
0
0.08





GCGGGCGGACCTTTT
163
1
1
0
0
0.03
0.03
0
0





CGGCTCGAAGGACCA
164
0
2
0
0
0
0.07
0
0





CCCCGGCCCCGCGTA
165
0
2
0
0
0
0.07
0
0





CCCACCTCTGACACCA
166
0
1
1
0
0
0.03
0.04
0





CCACGAGGTCGGCCGG
167
0
2
0
0
0
0.07
0
0





CAGGATCGGCCCACT
168
2
0
0
0
0.06
0
0
0





ATGTGGTGGCTTACTTTT
169
0
0
0
2
0
0
0
0.08





ATCCCGGACGAGCCCA
170
0
2
0
0
0
0.07
0
0





ATCCCCAGCATCTCCACCA
171
0
0
2
0
0
0
0.08
0





AGAAAGGCCGAATTTTA
172
0
0
1
1
0
0
0.04
0.04





TGTCAGTTTTTACCCAA
173
0
0
0
1
0
0
0
0.04





TGTCAGTTTGAACCCAA
174
0
0
0
1
0
0
0
0.04





TGTAGTGTTTCTTACTTTA
175
1
0
0
0
0.03
0
0
0





TGGCGAAGGTCGGCCGCG
176
0
1
0
0
0
0.03
0
0





TGCAGGGCCGGCGGGGAGG
177
0
1
0
0
0
0.03
0
0





TCGGGCGGCGGGCGT
178
1
0
0
0
0.03
0
0
0





TCGGCTTTCCCTGCTAACTGGGCT
179
0
0
0
1
0
0
0
0.04


TTTT














TCAGAGCGCGGGCCGACCCC
180
1
0
0
0
0.03
0
0
0





TCAACACCCACTCCCTC
181
0
1
0
0
0
0.03
0
0





TATCAATGATGCTTCTGAGA
182
0
0
0
1
0
0
0
0.04





TAACCCCAGGGTTGGTCA
183
0
1
0
0
0
0.03
0
0





GGGGTCCCCGGTAGA
184
0
1
0
0
0
0.03
0
0





GGGCGTGGGTGTGATGATTC
185
0
0
0
1
0
0
0
0.04





GGGAGGTGAGTAGGTCTG
186
0
1
0
0
0
0.03
0
0





GGAGACGTGGCCGAGAG
187
0
1
0
0
0
0.03
0
0





GCGGAATACCACGGGGA
188
0
1
0
0
0
0.03
0
0





GCAGGCGGGGGATTAGCTA
189
1
0
0
0
0.03
0
0
0





GCAGCGGAACGTCGGCGCGC
190
0
1
0
0
0
0.03
0
0





GACGTCACCCTCCTCA
191
0
1
0
0
0
0.03
0
0





CTTGGACTAACCTGGTGTA
192
0
0
1
0
0
0
0.04
0





CTGTAGGCCACCATCCA
193
0
0
0
1
0
0
0
0.04





CTGTAGGCACCACCA
194
1
0
0
0
0.03
0
0
0





CTGGTAGGCACCTGAAA
195
0
0
0
1
0
0
0
0.04





CTGATGTTGATGCATATGATGACA
196
0
0
0
1
0
0
0
0.04





CGGTGGAACCTGCATTGGTTT
197
0
0
1
0
0
0
0.04
0





CGGGGCCGGGGCTAGGGT
198
0
1
0
0
0
0.03
0
0





CGGGCCGCCCCCGCCCACCG
199
0
1
0
0
0
0.03
0
0





CGGGCCCCGGGGCTCG
200
0
1
0
0
0
0.03
0
0





CGGCCTATCCGGAATGCCCC
201
0
1
0
0
0
0.03
0
0





CGGACCTCCCTGGCCC
202
1
0
0
0
0.03
0
0
0





CGCGGCCAGTGTCCCCTTGTA
203
1
0
0
0
0.03
0
0
0





CGACACACGGCCCGTGGCGC
204
1
0
0
0
0.03
0
0
0





CCTCATAAATACCGG
205
0
1
0
0
0
0.03
0
0





CCTCACTGGGGGCTCCA
206
1
0
0
0
0.03
0
0
0





CCTCACCTGGAGCACCA
207
0
0
1
0
0
0
0.04
0





CCGTACTGGCCACCA
208
0
1
0
0
0
0.03
0
0





CCGCCGCCCCCCCCT
209
0
1
0
0
0
0.03
0
0





CCGCCCCGACCTTAGCTA
210
0
1
0
0
0
0.03
0
0





CCCGTCCGCTGCGCCA
211
0
1
0
0
0
0.03
0
0





CCCGTCCACTCCGCCA
212
0
1
0
0
0
0.03
0
0





CCCCGGCCCATGCACCA
213
0
1
0
0
0
0.03
0
0





CCCCGGCATCTCCATCA
214
1
0
0
0
0.03
0
0
0





CCCCAGTACCTCCACCA
215
0
1
0
0
0
0.03
0
0





CCCAGCGGTGCCTCCA
216
0
1
0
0
0
0.03
0
0





CCACGCTCTGCTACCA
217
1
0
0
0
0.03
0
0
0





CCACCCTGGAGCCTCCGT
218
0
1
0
0
0
0.03
0
0





ATGGTAGGCACCTGAAA
219
0
0
0
1
0
0
0
0.04





ATGGGCGGTCCTCGTT
220
0
1
0
0
0
0.03
0
0





ATGGCCTGGACCCCACTCCT
221
0
0
0
1
0
0
0
0.04





ATGGCCGCATATATTTT
222
0
0
0
1
0
0
0
0.04





ATCCTGTTCGTGACGCCA
223
0
0
1
0
0
0
0.04
0





ATCCTGCTCACAGCCCCA
224
0
1
0
0
0
0.03
0
0





AGCGAGGGTTCCGCCGGCC
225
0
0
0
1
0
0
0
0.04





ACTGGGGAGGGGGAGGAGCCTCGA
226
0
0
0
1
0
0
0
0.04


GG














ACCCCGAGGGGACGGGCG
227
0
1
0
0
0
0.03
0
0





ACAGCGCTGTGTTCCCGT
228
0
0
1
0
0
0
0.04
0





ACAAAAAAAAAAGCCCAACCCT
229
0
0
0
1
0
0
0
0.04





AACTAAAACCCCTACGCA
230
0
0
1
0
0
0
0.04
0





AAAGGAGCCGAATCTTT
231
0
0
1
0
0
0
0.04
0





CCCACCCAGGGACGCCA
232
223
218
6
2
6.46
7.3
0.24
0.08





TCCCCGGCACCTCCACCA
233
60.47
101.82
40.28
34
1.75
3.41
1 .58
1.39





ATCCCGGACGAGCCCCCA
234
48
60
80
45
1.39
2.01
3.14
1.84





CCCACGTTGGGCGCCA
235
37
50
1
0
1.07
1.68
0.04
0





TCGATTCCCGGCCAATGCACCA
236
2.24
15.29
35.88
4
0.06
0.51
1.41
0.16





ATCCCACTTCTGACACCA
237
11
9
26.69
14
0.32
0.3
1.05
0.57





TCGTAGGCACCTGAAA
238
0
0
0
7.94
0
0
0
0.33





TCCCCGTACGGGCCACCA
239
11
6
3
2
0.32
0.2
0.12
0.08





TTGACCGCTCTGACCA
240
4
9
2
5
0.12
0.3
0.08
0.2





CCCAGCGGGGCCTCCA
241
11
8
1
0
0.32
0.27
0.04
0





CAGGAACGGTGCACCA
242
6
10
2
0
0.17
0.34
0.08
0





AGTCCCATCTGGGTCGCCA
243
4
2
3
6
0.12
0.07
0.12
0.25





CCCCCCACTGCTAAATTTGACTGG
244
1
1
6
2
0.03
0.03
0.24
0.08


A














GTTTGTTAATTAACCCAA
245
0
0
1
5
0
0
0.04
0.2





GTCCCTTCGTGGTCGCCA
246
1
2
1
2
0.03
0.07
0.04
0.08





CTGTAGCACCTGAAA
247
0
0
0
6
0
0
0
0.25





TCCTCACACGGGGCACCA
248
2
1
2
0
0.06
0.03
0.08
0





TAACGGCCGCGGTACCC
249
0
3
1
0
0
0.1
0.04
0





GAGGGGGACCAAAAAAAA
250
0
0
0
4
0
0
0
0.16





CCCGCATTCTCCACCA
251
3
0
1
0
0.09
0
0.04
0





AGGGGGGTAAAAAAAAA
252
0
0
0
4
0
0
0
0.16





TCCACCGCTGCCACCA
253
0
3
0
0
0
0.1
0
0





GAGCCATGATGATACCACTGAGC
254
0
1
0
2
0
0.03
0
0.08





CGTCCATGATGTTCCGCAA
255
1
0
2
0
0.03
0
0.08
0





CATCCTCTGCTACCA
256
3
0
0
0
0.09
0
0
0





AGAACACTACGAGCCACA
257
3
0
0
0
0.09
0
0
0





ACCCCACTTCTGGTACCA
258
0
0
1
2
0
0
0.04
0.08





TGTATTGTGAGACATTC
259
0
1
1
0
0
0.03
0.04
0





TCTCGGTGGAACCTCCA
260
0
0
1
1
0
0
0.04
0.04





TCCCCGGCACCTCCAA
261
0
1.01
0
0
0
0.03
0
0





TACCGAGCCTGGTGATAGC
262
0
1
1
0
0
0.03
0.04
0





GCAGCGCCAGCCTCCCGCCCTAC
263
2
0
0
0
0.06
0
0
0





CCGCCTGGGGAGTAC
264
0
2
0
0
0
0.07
0
0





ATCCCCAGCACCTCCACCA
265
0
0
0
2
0
0
0
0.08





AGCAGTGATGTCCTGAAAATTCTG
266
0
0
0
2
0
0
0
0.08


AAG














ACCCCACTATGCTTAGCCCT
267
0
0
2
0
0
0
0.08
0





AAAGGACCTGGCGGTGCTTC
268
1
0
1
0
0.03
0
0.04
0





TTTGCCACACTGCAACACCTT
269
0
0
1
0
0
0
0.04
0





TTCCTTGGATGTCTGAGTGAC
270
0
0
1
0
0
0
0.04
0





TTAAACCACCAAGATCGCTGATGC
271
0
0
1
0
0
0
0.04
0


AC














TGTTCGCCGACCGTTGA
272
0
0
1
0
0
0
0.04
0





TGGGGTCTGGGAGGGA
273
0
1
0
0
0
0.03
0
0





TGGGAGAGCAGGGTATTGT
274
1
0
0
0
0.03
0
0
0





TGCAGATGATGTAAAAGA
275
0
0
1
0
0
0
0.04
0





TCGCTATGATGATGGATTCCAAAA
276
0
0
1
0
0
0
0.04
0





TCCGAAAGGCCTCCCGCACCG
277
0
0
1
0
0
0
0.04
0





TCCCGCACCTCCACCA
278
0
0
1
0
0
0
0.04
0





TAGATGAATAGGTAAAGAG
279
0
0
1
0
0
0
0.04
0





GTGTATGATGACCTCATGTAGCCT
280
0
0
0
1
0
0
0
0.04


GAAC














GTGAAGCGTTCCATATTTTT
281
0
0
1
0
0
0
0.04
0





GGGGGGGGGTTTGGAA
282
0
0
0
1
0
0
0
0.04





GGGGGGAGGGAAGGCAA
283
0
0
1
0
0
0
0.04
0





GGGGGCTGGGCTGGGTA
284
0
1
0
0
0
0.03
0
0





GGGGCCGCCGCCTGTGT
285
1
0
0
0
0.03
0
0
0





GGGAGTCCGCGGCGAGC
286
0
0
0
1
0
0
0
0.04





GGGACCTGGGGACCA
287
1
0
0
0
0.03
0
0
0





GGCTTGGTCTAGGGGTA
288
0
0
0
1
0
0
0
0.04





GGCTGGGACCCTGGACAC
289
0
0
0
1
0
0
0
0.04





GGCGACCTGCGACTCCTT
290
0
1
0
0
0
0.03
0
0





GGAGGGGGGAAACAAA
291
0
0
0
1
0
0
0
0.04





GGAGGGGGGAAAAAAAAAA
292
0
0
0
1
0
0
0
0.04





GGAAGACCTGCACCACTGTC
293
0
1
0
0
0
0.03
0
0





GCGGGTGTCAGGCCT
294
1
0
0
0
0.03
0
0
0





GCCGGGCGTGGTGGTCTG
295
0
1
0
0
0
0.03
0
0





GCCGCCGAGACCCCAGGACCC
296
1
0
0
0
0.03
0
0
0





GCAGCCGTGCTTTTA
297
0
1
0
0
0
0.03
0
0





GCAAATGATGCCCTCTGATC
298
0
0
0
1
0
0
0
0.04





GAGGGGGGTCAAAAAAA
299
0
0
0
1
0
0
0
0.04





CTTGATGATGAGCAGGATCTGAGT
300
0
0
0
1
0
0
0
0.04





CTGTAGGCACTGAAA
301
0
0
0
1
0
0
0
0.04





CTGTAGGCACCATTAA
302
0
0
0
1
0
0
0
0.04





CTGCTTAAGTCCTGACCAG
303
0
0
1
0
0
0
0.04
0





CTGAGCACCTTTCCCTTCC
304
0
1
0
0
0
0.03
0
0





CTAGCCCCAAACCCA
305
1
0
0
0
0.03
0
0
0





CGGTCACACGATTAACCCA
306
0
0
1
0
0
0
0.04
0





CGGGGGGAGGAAAAAA
307
0
0
0
1
0
0
0
0.04





CGGGGGGAAAAAAAAA
308
0
0
0
1
0
0
0
0.04





CGGGGCCGCACGCGC
309
1
0
0
0
0.03
0
0
0





CGGGAGTGGGGTGGCGCCCAG
310
1
0
0
0
0.03
0
0
0





CGGGAGCCCCGGGTT
311
1
0
0
0
0.03
0
0
0





CGGACCTGATAAATTCCCAC
312
0
0
1
0
0
0
0.04
0





CGCGGCTCTTGCGGT
313
1
0
0
0
0.03
0
0
0





CGCCTGAGTCAGAAC
314
1
0
0
0
0.03
0
0
0





CGCCGCCGCCCCCCCC
315
0
1
0
0
0
0.03
0
0





CCTTCCTTGGATGTCTGAGTGAG
316
0
0
1
0
0
0
0.04
0





CCTCGCTGGGGCCTCCA
317
1
0
0
0
0.03
0
0
0





CCTCACAGGGACGCCA
318
1
0
0
0
0.03
0
0
0





CCTAGGAGTGCGACAATT
319
0
0
0
1
0
0
0
0.04





CCGCTCTGAGACCTA
320
0
1
0
0
0
0.03
0
0





CCGCCCGTCACCCTCCTCAAGTA
321
0
0
1
0
0
0
0.04
0





CCCGGGCGGCACACCA
322
0
1
0
0
0
0.03
0
0





CCCGCGGGCTTGCTGGGCGTCCC
323
0
1
0
0
0
0.03
0
0





CCCCTGCGATTTCCCCA
324
0
1
0
0
0
0.03
0
0





CCCCGGCATCTCCACTA
325
1
0
0
0
0.03
0
0
0





CCCCAGTGAGTGCCCTCTTCC
326
0
1
0
0
0
0.03
0
0





CCCAGAGACGCCGTCCTCGA
327
1
0
0
0
0.03
0
0
0





CCCACCGAGGATGCCA
328
1
0
0
0
0.03
0
0
0





CCATCACTACCCACCA
329
0
1
0
0
0
0.03
0
0





CCACTCCAGCCTAGCCCC
330
0
1
0
0
0
0.03
0
0





CAGTACAGGCACACCTC
331
0
1
0
0
0
0.03
0
0





CACGTCGGGGTCCCCA
332
1
0
0
0
0.03
0
0
0





CACGATTAACCCAAGTC
333
0
0
1
0
0
0
0.04
0





CACCACACCCGGGCCA
334
1
0
0
0
0.03
0
0
0





CAACACAGGCATGCT
335
0
1
0
0
0
0.03
0
0





ATAGGGTTTACGACCTCGATGTTG
336
0
0
1
0
0
0
0.04
0


GATCA














ATACCATGATGAACAATAGCTGAG
337
0
0
0
1
0
0
0
0.04


A














AGGGTTCAGCTGTCTC
338
1
0
0
0
0.03
0
0
0





AGGCTGTGATGGACCTGGCTGAGC
339
0
0
0
1
0
0
0
0.04


CTG














AGAGAGTAGGGGGAGGT
340
0
1
0
0
0
0.03
0
0





ACTGTCCCTGTCTACTA
341
0
0
0
1
0
0
0
0.04





ACCGCATCTGGCCTATTTTT
342
0
0
0
1
0
0
0
0.04





ACCAGACCTCCTGTGCGAAG
343
0
0
0
1
0
0
0
0.04





ACAGCCCGGATCCCAGCCCACTTA
344
0
0
0
1
0
0
0
0.04





ACACTGAGCCACAACCCA
345
0
0
1
0
0
0
0.04
0





AAGGGCTTGGCTTAATTA
346
0
0
1
0
0
0
0.04
0





AACCCGGAAGGCGGAGGTTGCGG
347
0
0
0
1
0
0
0
0.04





AACCCCACACCAACC
348
0
0
0
1
0
0
0
0.04





AACAAGCTTCTTTGACGTCCCATC
349
0
0
0
1
0
0
0
0.04


CAC














TCCCCGGCATCTCCACCA
350
116.53
275.18
104.72
59
3.38
9.22
4.11
2.42





CTGATTGCTCCTGTCTGATT
351
0
0
6
1
0
0
0.24
0.04





TCTAGAGGAGCCTGTTCTGTA
352
0
1
3
0
0
0.03
0.12
0





TCGATTCCCGGTCAGGGAACCA
353
0
0
0
4
0
0
0
0.16





ATAGGTTTGGTCCTAGCCTTTCT
354
0
0
3
1
0
0
0.12
0.04





CGTTCGCGCTTTCCCCTG
355
0
1
2
0
0
0.03
0.08
0





TAAGTGTTTGTGGGTTA
356
1
1
0
0
0.03
0.03
0
0





GGCGGCGGGAGACCCA
357
1
1
0
0
0.03
0.03
0
0





CCCCGGCAGGTTTGA
358
0
2
0
0
0
0.07
0
0





TGCCGTGATCGTATAGTGGTTA
359
0
0
1
0
0
0
0.04
0





TCAGACTACTCTCCTCCGCCCATT
360
0
0
1
0
0
0
0.04
0





GGACACAGAGGCTTCG
361
0
1
0
0
0
0.03
0
0





CTGACAGCCGGGGTTTTGGA
362
0
0
0
1
0
0
0
0.04





CGGCGGGGCCTGGAGTCTG
363
1
0
0
0
0.03
0
0
0





CCTGGCTCGCTGCGCCA
364
1
0
0
0
0.03
0
0
0





CCGCCCCACCCCGCGCGC
365
0
1
0
0
0
0.03
0
0





CCCGAACGCTGCCAACCC
366
0
1
0
0
0
0.03
0
0





AGACCCGCGGGCGCTCTCCAGTC
367
0
0
1
0
0
0
0.04
0





CCCCCACAACCGCGCTTGACTAGC
368
12
11
7
9
0.35
0.37
0.27
0.37





CCCTGCTCGCTGCGCCA
369
7
20
5
1
0.2
0.67
0.2
0.04





CTCCCACTGCTTCACTTGACTAGC
370
2
2
18
9
0.06
0.07
0.71
0.37





CCCATCCTCGTCGCCA
371
16
11
1
1
0.46
0.37
0.04
0.04





TCACGTCGGGGTCACCA
372
16
4
5
1
0.46
0.13
0.2
0.04





TCCCTGGTGGTCTAGTGGTTAGGA
373
0
1
10
6
0
0.03
0.39
0.25


TTCG














GGTAGCGTGGCCGAG
374
10
6
0
0
0.29
0.2
0
0





TCCTGCCGCGGTCGCCA
375
6
8
0
1
0.17
0.27
0
0.04





GGAGAGAACGCGGTCTGAGTGGT
376
3
7
1
0
0.09
0.23
0.04
0





TCGGGTGCGAGAGGTCCCGGGT
377
0
0
0
10
0
0
0
0.41





GGCTGGTCCGATGGTAGTGGGTT
378
4
3
3
0
0.12
0.1
0.12
0





CGGAAGCGTGCTGGGCCC
379
1
5
0
4
0.03
0.17
0
0.16





CCCGGGCGGCGCACCA
380
5
4
0
0
0.14
0.13
0
0





GCGGGTGATGCGAACTGGAGTCTG
381
0
0
6
1
0
0
0.24
0.04


AGC














GACGAGGTGGCCGAGTGG
382
2
3
2
0
0.06
0.1
0.08
0





GGGGGTGTAGCTCAG
383
4
2
0
0
0.12
0.07
0
0





CTCCTGGCTGGCTCGCCA
384
0
0
3
3
0
0
0.12
0.12





CGGGAGGCCCGGGTT
385
3
3
0
0
0.09
0.1
0
0





GGCTGGTCCGAGTGCAGTGGTGTT
386
0
1
4
0
0
0.03
0.16
0


TA














CTGCTGTGATGACATTC
387
1
2
2
0
0.03
0.07
0.08
0





TGTCACGCGGGAGACC
388
0.5
0
1
1 .4
0.01
0
0.04
0.06





TCCTCGTTAGTATAGTGGTGAGTA
389
0
1
3
0
0
0.03
0.12
0


TCCC














GGCCGGTTAGCTCAG
390
2
2
0
0
0.06
0.07
0
0





GCATTGGTGGTTCAGTGGTAGA
391
0
0
3
1
0
0
0.12
0.04





CTGTCACGCGGGAGA
392
0.5
0
0
2.6
0.01
0
0
0.11





CTACGGGGATGATTTT
393
3
1
0
0
0.09
0.03
0
0





CCAGGGGCTGAGGGCA
394
1
3
0
0
0.03
0.1
0
0





TTCTCACTACTGCACTTGACTA
395
0
0
2
1
0
0
0.08
0.04





TGGTTATCACGTTCGCC
396
0
2
0
1
0
0.07
0
0.04





TAGGGGTATGATTCTCGCT
397
1
0
0
2
0.03
0
0
0.08





CCACGAGGAAGAGAGGTAGC
398
2
1
0
0
0.06
0.03
0
0





GTCAGGATGGCCGAGCGGTCT
399
0
1
1
0
0
0.03
0.04
0





GGGGATGTAGCTCAG
400
1
1
0
0
0.03
0.03
0
0





GCAGCGATGGCCGAG
401
0
2
0
0
0
0.07
0
0









Three hundred and thirty five sequences aligned to genomic regions which did not fulfill the criteria for miRNA precursors (FIG. 11). About 30% of these non-miRNA sequences were annotated and may represent degradation products originating from other RNA species (FIG. 11 and Table 2).









TABLE 2







Characterization of short-RNA libraries. Number


of non-redundant short-RNAs cloned in each library (naïve, memory,


and centroblast B cells, and Ramos cell line) and overall (Total).












RNA species
Naïve
Memory
Centroblasts
Ramos
Total















Total (non redundant)
683
710
744
765
2115


miRNA
498
485
584
590
1453


miRNA other*
5
3
7
4
19


tRNA
27
33
32
29
108


rRNA
61
99
34
16
174


mRNA
76
72
25
34
176


yRNA
11
11
31
21
53


piRNA
46
54
70
62
148


Repeats
1
1
0
1
2


Mitochondrial genome
12
36
54
11
101


Human viruses
1
4
0
0
5



E. Coli

5
4
0
0
7


Not Annotated
66
64
74
113
262





*miRNA other includes fragments of miRNA precursors, not mature






Each short-RNA is annotated according to the listed RNA species. Results shown in Table 2 refer only to short-RNAs with good-quality matches to the human genome. The same short-RNA may match to multiple databases and therefore the overall sum does not correspond to the total number of short-RNAs. The databases used in the analysis depicted in Table 2 is detailed in the Supplementary Methods section.


The remaining (236 sequences), however, mapped to genomic regions that lack annotations and may therefore represent a part of the transcriptome whose functions are unknown (Table 3A-3B and Table 4A-4B).









TABLE 3A







List of short-RNA lacking genomic locations with


appropriate RNA secondary structures to be defined miRNAs.











SEQ ID




ID
NO:
Short-RNA sequence
Annotations













CU-5016
402
AATGACACGATCACTCCCGTTGAG
Mature:hsa-miR-425:MIMAT0003393





CU-5019
403
GGAGGGGGGGTAAAAAAAA
NEW





CU-5020
404
CCCCGGCATCTCCACC
NEW





CU-5004
405
GAAGCGGGTGCTCTTATTTT
NEW





CU-5021
406
ACCGGGCGGAAACACCA
NEW





CU-5022
407
TCCCGGGTTCAAATCCCGGACGAGCCCCCA
NEW





CU-5008
408
GTGTAAGCAGGGTCGTTTT
NEW





CU-6003
409
ATCCCACCGCTGCTACCA
NEW





CU-5023
410
GGGAAGGTGACCTGAC
NEW





CU-5007
411
CTCCCGCCTTTTTTCCC
NEW





CU-5024
412
CGGAGCAAGAGCGT
NEW





CU-5025
413
CCCCGTACTGGCCACCA
NEW





CU-5026
414
CCCCCGGCACCATCAATA
NEW





CU-5027
415
CAGCCTAGCCCCTACCC
NEW





CU-5005
416
CAGAAGGTCTCACTTTT
NEW





CU-5006
417
AGTATTCTCTGTGGCTTT
NEW





CU-5028
418
TGGAGTGACTATATGGATGCCCCC
NEW





CU-5029
419
TCTGATAGCTTACTTT
NEW





CU-5030
420
TCGAGCCCCAGTGGAACCAC
NEW





CU-5031
421
TCGAATCCTGTTCGTGACGCCA
NEW





CU-5032
422
TCCTCCCCACACTCATCGCCCTTACCA
NEW





CU-5033
423
TATACTACAAGGACACCA
NEW





CU-5034
424
TAGTGGGTGAAAAAAAAAAAA
NEW





CU-5035
425
TACCACACATTCGAAGAACCCGTA
NEW





CU-5036
426
TACAAAACCCACCCCATTCCTCCCCA
NEW





CU-5037
427
GCCCTCCTAATGACCTCC
NEW





CU-5038
428
CTTCCCTCTACACTTATCATC
NEW





CU-5039
429
CGGGCGGCCTGCGCTCTCA
NEW





CU-5040
430
CCCGAGGCCGTGTGCAAATGCAT
NEW





CU-5041
431
CCCCCAGTACCTCCACCA
NEW





CU-5042
432
CCCCCACTGCTAAACTTGACTGGCTTT
NEW





CU-5043
433
COCACTOCACOTTACTACCA
NEW





CU-5044
434
CCCAAGAACAGGGTGACCA
NEW





CU-5045
435
CCAGTCGCGGCCAAATCA
NEW





CU-5046
436
CCAGCTTCACCAAGGTATTGGTTA
NEW





CU-5047
437
CCAGAAAAAACAGGCCTC
NEW





CU-5048
438
CATCATAATCGGAGGCTTTGGCAAC
NEW





CU-5049
439
CAGCAGGGGTAATAAGTGAAATCAAA
NEW





CU-5050
440
CAATGGTGCAGCCGCTATTAAAGGTTCA
NEW





CU-5051
441
CAACTCCTACATACTTCCCCC
NEW





CU-5052
442
ATTCAAAAAAGAGTACCA
NEW





CU-5053
443
ATGCATCTCATATGCGAATAGGAATGC
NEW





CU-5054
444
ATCCCACTTCTGTACCA
NEW





CU-5055
445
ATAACACTAGAAAGTTGGGGCAGATTGC
NEW





CU-5056
446
ACGTGGGCACATTACCCGTCTGACCTGA
NEW





CU-5057
447
ACCCCTTATTAACCCA
NEW





CU-5058
448
ACAAGGCACACCTACACCCCTTATCCC
NEW





CU-5059
449
AAAAGACACCCCCCCACCA
NEW





CU-5060
450
AAAACCCCTACGCATTTATAT
NEW





CU-5061
451
AAAAAGACACCCCCCACCA
NEW





CU-5003
452
ACCCCACTCCTGGTACCA
refseqGeneIntron-annotate





CU-5009
453
TGCCCCCATGTCTAACAACATGGCTA
refseqGeneIntron-annotate;rnaGene-annotate





CU-5013
454
GGCCGGTGATGAGAACT
mRNAall-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;snoRNA-annotate





CU-5062
455
CCCCGCCTGTTTACC
refseqGeneIntron-annotate





CU-5063
456
CCCACTTCTGACACCA
computGene-annotate;refseqGeneIntron-annotate;





exEID-annotate





CU-5064
457
CACCACCTCTTGCTCAGCC
mRNA-annotate;refseqGeneIntron-annotate





CU-5014
458
CTGGAAAGTGCACTTGGACGAACA
refseqGeneIntron-annotate





CU-5065
459
TGACCGCTCTGACCAC
refseqGeneIntron-annotate





CU-5066
460
TGAAGTCCCTTTGCTTTGTT
refseqGeneIntron-annotate





CU-5067
461
TGAACACACAATAGCTAAGACCC
mRNA-annotate;refseqGeneIntron-annotate





CU-5068
462
TOGOOTTACCOCCOACTA
refseqGeneIntron-annotate





CU-5069
463
TCGATAAACCCCGATCAACCT
mRNA-annotate;refseqGeneIntron-annotate





CU-5070
464
TCCCCGTCACCTCCACCA
refseqGeneIntron-annotate





CU-5071
465
TCCCCGGCACTCCACCA
refseqGeneIntron-annotate





CU-5072
466
TCCCCCCGCTGCCACCA
refseqGeneIntron-annotate





CU-5073
467
TCCCCCCCATCTCCACCA
refseqGeneIntron-annotate





CU-5074
468
TACACACCGCCCGTCACCC
mRNA-annotate;refseqGeneIntron-annotate





CU-5075
469
GGCCGGTGATGAGAACTTCTCCC
mRNAall-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;snoRNA-annotate





CU-5076
470
GCTTAGCCTAGCCACACCCCCACG
mRNA-annotate;refseqGeneIntron-annotate





CU-5077
471
GCTCGCCAGAACACTACGA
mRNA-annotate;refseqGeneIntron-annotate





CU-5078
472
GCCGGGGGGCGGGCGCA
refseqGeneIntron-annotate





CU-5079
473
GAACCGGGCGGGAACACCA
refseqGeneIntron-annotate





CU-5080
474
CGCCGCAGTACTGATCATTC
refseqGeneIntron-annotate





CU-5081
475
CCGCACCAATAGGATCCTCC
refseqGeneIntron-annotate





CU-5082
476
CCCGGCCGACGCACCA
refseqGeneIntron-annotate





CU-5083
477
CCACCCCATCATACTCTTTC
refseqGeneIntron-annotate





CU-5084
478
CACCCCCCAGCTCCTCCTTT
refseqGeneIntron-annotate





CU-5085
479
ATAAGTAACATGAAAACATTCTCCTC
refseqGeneIntron-annotate





CU-5086
480
ACTGCTCGCCAGAACAC
mRNA-annotate;refseqGeneIntron-annotate





CU-5087
481
ACCCTGGTGTGGGATCTGCCCGATC
refseqGeneIntron-annotate





CU-5088
482
AACCTCACCACCTCTTTCT
refseqGeneIntron-annotate





CU-5089
483
AAAAGACACCCCCCACACCA
refseqGeneIntron-annotate





CU-5011
484
GCTAAACCTAGCCCCAAACCC
piRNA-annotate





CU-5010
485
GGCCGTGATCGTATA
piRNA-annotate





CU-5090
486
TGGGATGCGAGAGGTCCCGGGT
rnaGene-annotate





CU-5091
487
CTGAACTCCTCACACCC
piRNA-annotate





CU-5092
488
ATTAATCCCCTGGCCCAACCCG
computGene-annotate





CU-5093
489
AGCCCCAAACCCACTCCAC
piRNA-annotate





CU-5094
490
CGCGACCTCAGATCAGAC
rRNA-eliminate;piRNA-annotate;refseqGeneIntron-





annotate





CU-5015
491
TCAAGTGATGTCATCTTACTACTGAGA
mRNAall-annotate;snoRNA-annotate;snoRNA-eliminate;





wgRNA-annotate;rnaGene-annotate





CU-5095
492
TTGGGTGCGAGAGGTCCCGGGT
tRNAcomputational-annotate;tRNA-eliminate;HStRNA-





eliminate;rnaGene-annotate





CU-5096
493
TCTCGGTGGGACCTCCA
refseqGeneExon-eliminate





CU-5097
494
CCGCCCCCCGTTCCCCC
rRNA-eliminate





CU-5098
495
CCCACTGCTAAATTTGACTGGCTT
mRNAall-annotate;yRNA-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-5099
496
ACAGACCAAGAGCCTTC
tRNA-eliminate;rnaGene-annotate





CU-5100
497
TGTAGTAGTCAATTAATGGATATTA
refseqGeneExon-eliminate





cu-5101
498
TGGTTATCACGTTCGCCTCACACGCGA
tRNAcomputational-annotate;tRNA-eliminate;HStRNA-





eliminate;rnaGene-annotate





CU-5102
499
TGGGAATACCGGGTG
rRNA-eliminate;rnaGene-annotate;piRNA-annotate;





refseqGeneIntron-annotate





CU-5103
500
TGGCGGCCAAGCGTTCATAGCGACGTC
rRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-5104
501
TCGTCATCCAGCTAAGGGCTCAGA
mRNAall-annotate;refseqGeneExon-eliminate;





exEID-annotate





CU-5105
502
TCGCCTGCCACGCGGGAGGCCCGGGT
rnaGene-annotate;tRNAcomputational-annotate;tRNA-





eliminate;refseqGeneIntron-annotate;mRNA-





annotate;HStRNA-eliminate





CU-5106
503
TCCCACTGCTTCACTTGA
yRNA-eliminate;refseqGeneIntron-annotate;rnaGene-





annotate





CU-5107
504
GTTTAGACGGGCTCACATCACCCCA
tRNA-eliminate;pi RNA-annotate;refseqGeneIntron-





annotate





CU-5108
505
GCTAACTCATGCCCCCATGTC
tRNA-eliminate;refseqGeneIntron-annotate;rnaGene-





annotate





CU-5109
506
GACTGTGGTGGTTGAATATA
mRNAall-annotate;computGene-annotate;





refseqGeneExon-eliminate;exEID-annotate





CU-5110
507
CGCGACCTCAGATCAGACGTGGCGACC
rRNA-eliminate;piRNA-annotate;refseqGeneIntron-





annotate





CU-5111
508
CGCCGCCGCCCCCCC
mRNAall-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;exEID-annotate





CU-5112
509
CGCCCGACTACCACCACATCCA
mRNAall-annotate;computGene-annotate;





refseqGeneExon-eliminate;exEID-annotate





CU-5113
510
CCCCCCTCCACGCGCCC
rRNA-eliminate;refseqGeneIntron-annotate





CU-5114
511
CCCCACCCCGCGCCCTC
rRNA-eliminate;refseqGeneIntron-annotate





CU-5115
512
CAGAGTGTAGCTTAACACAAAGCACCCAA
tRNA-eliminate;piRNA-annotate;rnaGene-annotate





CU-5116
513
CAATCTTGGCATGTTGGTCTGGTCACCCA
mRNAall-annotate;refseqGeneExon-eliminate;exEID-





annotate





CU-5117
514
CAAAGCATCGCGAAGGCCC
mRNAall-annotate;rRNA-eliminate;piRNA-annotate;





rnaGene-annotate





CU-5118
515
AACACCCTGATTGCTCCTGTCTGAT
mRNAall-annotate;exEID-annotate;snoRNA-annotate;





refseqGeneExon-eliminate;rnaGene-annotate;





snoRNA-eliminate;wgRNA-annotate





CU-5119
516
AAAAAGGGCCTAAAGAAGATGCA
mRNAall-annotate;computGene-annotate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate;exEID-annotate
















TABLE 3B







List of short-RNA lacking genomic locations with appropriate RNA secondary structures to


be defined miRNAs including information on frequencies.










SEQ

Corrected Counts
Frequencies
















ID

Naïve
Memory
Centroblasts
Ramos
Naïve
Memory
Centroblasts
Ramos


NO:
Short-RNA sequence
(N)
(M)
(CB)
(RA)
(N)
(M)
(CB)
(RA)



















525
AATGACACGATCACTCCCGTT
0
0
7
0
0
0
3.98
0



GAG













526
GGAGGGGGGGTAAAAAAAA
0
0
1
0
0
0
0.57
0





527
CCCCGGCATCTCCACC
1
0
0
0
1.72
0
0
0





528
GAAGCGGGTGCTCTTATTTT
5
23
25
224
8.62
20.35
14.2
65.31





529
ACCGGGCGGAAACACCA
9
14
60
20
15.52
12.39
34.09
5.83





530
TCCCGGGTTCAAATCCCGGAC
0
0
4
37
0
0
2.27
10.79



GAGCCCCCA













531
GTGTAAGCAGGGTCGTTTT
0
0
0
7
0
0
0
2.04





532
ATCCCACCGCTGCTACCA
0
1
0
2
0
0.88
0
0.58





533
GGGAAGGTGACCTGAC
2
0
0
0
3.45
0
0
0





534
CTCCCGCCTTTTTTCCC
0
2
0
0
0
1.77
0
0





535
CGGAGCAAGAGCGT
2
0
0
0
3.45
0
0
0





536
CCCCGTACTGGCCACCA
2
0
0
0
3.45
0
0
0





537
CCCCCGGCACCATCAATA
0
0
1
1
0
0
0.57
0.29





538
CAGCCTAGCCCCTACCC
0
2
0
0
0
1.77
0
0





539
CAGAAGGTCTCACTTTT
0
1
0
1
0
0.88
0
0.29





540
AGTATTCTCTGTGGCTTT
0
0
0
2
0
0
0
0.58





541
TGGAGTGACTATATGGATGCC
0
0
1
0
0
0
0.57
0



CCC













542
TCTGATAGCTTACTTT
0
1
0
0
0
0.88
0
0





543
TCGAGCCCCAGTGGAACCAC
0
0
1
0
0
0
0.57
0





544
TCGAATCCTGTTCGTGACGCCA
0
0
0
1
0
0
0
0.29





545
TCCTCCCCACACTCATCGCCC
0
0
1
0
0
0
0.57
0



TTACCA













546
TATACTACAAGGACACCA
0
0
0
1
0
0
0
0.29





547
TAGTGGGTGAAAAAAAAAAAA
0
0
0
1
0
0
0
0.29





548
TACCACACATTCGAAGAACCC
0
0
1
0
0
0
0.57
0



GTA













549
TACAAAACCCACCCCATTCCT
0
1
0
0
0
0.88
0
0



CCCCA













550
GCCCTCCTAATGACCTCC
0
0
1
0
0
0
0.57
0





551
CTTCCCTCTACACTTATCATC
0
0
1
0
0
0
0.57
0





552
CGGGCGGCCTGCGCTCTCA
1
0
0
0
1.72
0
0
0





553
CCCGAGGCCGTGTGCAAATG
0
0
1
0
0
0
0.57
0



CAT













554
CCCCCAGTACCTCCACCA
0
1
0
0
0
0.88
0
0





555
CCCCCACTGCTAAACTTGACT
0
0
1
0
0
0
0.57
0



GGCTTT













556
CCCACTCCACCTTACTACCA
0
0
0
1
0
0
0
0.29





557
CCCAAGAACAGGGTGACCA
0
0
0
1
0
0
0
0.29





558
CCAGTCGCGGCCAAATCA
0
1
0
0
0
0.88
0
0





559
CCAGCTTCACCAAGGTATTGG
0
0
1
0
0
0
0.57
0



TTA













560
CCAGAAAAAACAGGCCTC
0
0
0
1
0
0
0
0.29





561
CATCATAATCGGAGGCTTTGG
0
0
1
0
0
0
0.57
0



CAAC













562
CAGCAGGGGTAATAAGTGAAA
0
0
1
0
0
0
0.57
0



TCAAA













563
CAATGGTGCAGCCGCTATTAA
0
0
0
1
0
0
0
0.29



AGGTTCA













564
CAACTCCTACATACTTCCCCC
1
0
0
0
1.72
0
0
0





565
ATTCAAAAAAGAGTACCA
0
0
1
0
0
0
0.57
0





566
ATGCATCTCATATGCGAATAG
0
0
1
0
0
0
0.57
0



GAATGC













567
ATCCCACTTCTGTACCA
0
1
0
0
0
0.88
0
0





568
ATAACACTAGAAAGTTGGGGC
0
0
1
0
0
0
0.57
0



AGATTGC













569
ACGTGGGCACATTACCCGTCT
0
0
0
1
0
0
0
0.29



GACCTGA













570
ACCCCTTATTAACCCA
0
1
0
0
0
0.88
0
0





571
ACAAGGCACACCTACACCCCT
0
0
1
0
0
0
0.57
0



TATCCC













572
AAAAGACACCCCCCCACCA
0
0
0
1
0
0
0
0.29





573
AAAACCCCTACGCATTTATAT
0
0
1
0
0
0
0.57
0





574
AAAAAGACACCCCCCACCA
0
0
0
1
0
0
0
0.29





575
ACCCCACTCCTGGTACCA
1
11
5
6
1.72
9.73
2.84
1.75





576
TGCCCCCATGTCTAACAACAT
7
4
1
1
12.07
3.54
0.57
0.29



GGCTA













577
GGCCGGTGATGAGAACT
4
3
0
0
6.9
2.65
0
0





578
CCCCGCCTGTTTACC
0
5
2
0
0
4.42
1.14
0





579
CCCACTTCTGACACCA
3
4
0
0
5.17
3.54
0
0





580
CACCACCTCTTGCTCAGCC
1
3
0
0
1.72
2.65
0
0





581
CTGGAAAGTGCACTTGGACGA
0
2
0
0
0
1.77
0
0



ACA













582
TGACCGCTCTGACCAC
0
1
0
0
0
0.88
0
0





583
TGAAGTCCCTTTGCTTTGTT
1
0
0
0
1.72
0
0
0





584
TGAACACACAATAGCTAAGACCC
0
0
1
0
0
0
0.57
0





585
TCGCCTTACCCCCCACTA
0
1
0
0
0
0.88
0
0





586
TCGATAAACCCCGATCAACCT
0
0
1
0
0
0
0.57
0





587
TCCCCGTCACCTCCACCA
0
0
1
0
0
0
0.57
0





588
TCCCCGGCACTCCACCA
0
0
1
0
0
0
0.57
0





589
TCCCCCCGCTGCCACCA
1
0
0
0
1.72
0
0
0





590
TCCCCCCCATCTCCACCA
0
0
1
0
0
0
0.57
0





591
TACACACCGCCCGTCACCC
0
0
1
0
0
0
0.57
0





592
GGCCGGTGATGAGAACTTCTCCC
1
0
0
0
1.72
0
0
0





593
GCTTAGCCTAGCCACACCCCC
0
0
1
0
0
0
0.57
0



ACG













594
GCTCGCCAGAACACTACGA
0
0
1
0
0
0
0.57
0





595
GCCGGGGGGCGGGCGCA
0
1
0
0
0
0.88
0
0





596
GAACCGGGCGGGAACACCA
0
0
0
1
0
0
0
0.29





597
CGCCGCAGTACTGATCATTC
0
0
1
0
0
0
0.57
0





598
CCGCACCAATAGGATCCTCC
0
1
0
0
0
0.88
0
0





599
CCCGGCCGACGCACCA
1
0
0
0
1.72
0
0
0





600
CCACCCCATCATACTCTTTC
0
0
1
0
0
0
0.57
0





601
CACCCCCCAGCTCCTCCTTT
1
0
0
0
1.72
0
0
0





602
ATAAGTAACATGAAAACATTCT
0
0
1
0
0
0
0.57
0



CCTC













603
ACTGCTCGCCAGAACAC
0
0
1
0
0
0
0.57
0





604
ACCCTGGTGTGGGATCTGCC
0
0
1
0
0
0
0.57
0



CGATC













605
AACCTCACCACCTCTTTCT
0
0
1
0
0
0
0.57
0





606
AAAAGACACCCCCCACACCA
0
0
0
1
0
0
0
0.29





607
GCTAAACCTAGCCCCAAACCC
9
16
13
18
15.52
14.16
7.39
5.25





608
GGCCGTGATCGTATA
2
0
0
0
3.45
0
0
0





609
TGGGATGCGAGAGGTCCCGGGT
0
0
0
1
0
0
0
0.29





610
CTGAACTCCTCACACCC
0
1
0
0
0
0.88
0
0





611
ATTAATCCCCTGGCCCAACCCG
0
0
0
1
0
0
0
0.29





612
AGCCCCAAACCCACTCCAC
0
0
1
0
0
0
0.57
0





613
CGCGACCTCAGATCAGAC
1
5
8
1
1.72
4.42
4.55
0.29





614
TCAAGTGATGTCATCTTACTAC
0
0
3
1
0
0
1.7
0.29



TGAGA













615
TTGGGTGCGAGAGGTCCCGGGT
0
0
0
3
0
0
0
0.87





616
TCTCGGTGGGACCTCCA
0
2
0
0
0
1.77
0
0





617
CCGCCCCCCGTTCCCCC
1
1
0
0
1.72
0.88
0
0





618
CCCACTGCTAAATTTGACTGG
0
0
1
1
0
0
0.57
0.29



CTT













619
ACAGACCAAGAGCCTTC
0
0
2
0
0
0
1.14
0





620
TGTAGTAGTCAATTAATGGATA
0
0
1
0
0
0
0.57
0



TTA













621
TGGTTATCACGTTCGCCTCAC
0
0
0
1
0
0
0
0.29



ACGCGA













622
TGGGAATACCGGGTG
0
0
1
0
0
0
0.57
0





623
TGGCGGCCAAGCGTTCATAG
0
0
0
1
0
0
0
0.29



CGACGTC













624
TCGTCATCCAGCTAAGGGCTC
0
0
1
0
0
0
0.57
0



AGA













625
TCGCCTGCCACGCGGGAGGC
0
0
1
0
0
0
0.57
0



CCGGGT













626
TCCCACTGCTTCACTTGA
0
0
0
1
0
0
0
0.29





627
GTTTAGACGGGCTCACATCAC
0
0
1
0
0
0
0.57
0



CCCA













628
GCTAACTCATGCCCCCATGTC
0
0
1
0
0
0
0.57
0





629
GACTGTGGTGGTTGAATATA
0
0
0
1
0
0
0
0.29





630
CGCGACCTCAGATCAGACGT
0
0
1
0
0
0
0.57
0



GGCGACC













631
CGCCGCCGCCCCCCC
0
1
0
0
0
0.88
0
0





632
CGCCCGACTACCACCACATCCA
1
0
0
0
1.72
0
0
0





633
CCCCCCTCCACGCGCCC
0
1
0
0
0
0.88
0
0





634
CCCCACCCCGCGCCCTC
0
1
0
0
0
0.88
0
0





635
CAGAGTGTAGCTTAACACAAA
0
0
1
0
0
0
0.57
0



GCACCCAA













636
CAATCTTGGCATGTTGGTCTG
0
0
1
0
0
0
0.57
0



GTCACCCA













637
CAAAGCATCGCGAAGGCCC
0
0
1
0
0
0
0.57
0





638
AACACCCTGATTGCTCCTGTC
0
0
1
0
0
0
0.57
0



TGAT













639
AAAAAGGGCCTAAAGAAGATGCA
0
0
1
0
0
0
0.57
0
















TABLE 4A







List of short-RNA consensus with maximum 1 mismatch to the human genome.











SEQ




ID
ID NO:
Short-RNA sequence
Annotations





CU-6232
640
TGGCTCAGTTCAGCAGGAACAGT
Mature:hsa-miR-24:MIMAT0000080





CU-6180
641
GTGGGGGAGAGGCTGTCGA
Mature:hsa-miR-1275:MIMAT0005929





CU-6130
642
CGGGGCAGCTCAGTACAGGATT
Mature:hsa-miR-486-3p:MIMAT0004762





CU-6044
643
AATTGCACGGTATCCATCTGTAT
Mature:hsa-miR-363:MIMAT0000707





CU-6133
644
CGGGGGAGCGCCGCGTA
NEW





CU-6215
645
TCGATCCCGGGTTTCGGCACCA
NEW





CU-6072
646
ATCGTATCCCACTTCTGACACCA
NEW





CU-6030
647
ATCCTGCCGACTACGCCA
NEW





CU-6210
648
TCGAATCCCACTCCTGACACCA
NEW





CU-6069
649
ATCCCATCCTCGTCGCCA
NEW





CU-6216
650
TCGATTCCCCGACGGGGAGCCA
NEW





CU-6071
651
ATCCGGGTGCCCCCTCCA
NEW





CU-6202
652
TCCCGGGCGGCGCACCA
NEW





CU-6066
653
ATCCCACCAGAGTCGCCA
NEW





CU-6192
654
TCAAATCACGTCGGGGTCACCA
NEW





CU-6239
655
TGTCAGTTTGTTAATTGACCCAA
NEW





CU-6214
656
TCGATCCCCGTACGGGCCACCA
NEW





CU-6213
657
TCGAGCCTCACCTGGAGCACCA
NEW





CU-6206
658
TCCGGCTCGAAGGACCA
NEW





CU-6006
659
GGCAATACGAGCACCCTG
NEW





CU-6004
660
CCGGGGCGTCTCGTAC
NEW





CU-6056
661
AGCGGCTGTGCACAAA
NEW





CU-6242
662
TGTCAGTTTGTTTAATCCAA
NEW





CU-6241
663
TGTCAGTTTGTTATTACCAA
NEW





CU-6237
664
TGTCAGGCACCATCAATAA
NEW





CU-6225
665
TGATCTTGACACTTAAAGCC
NEW





CU-6219
666
TCGTAGGCACCATCAAT
NEW





CU-6211
667
TCGACTCCCGGTATGGGAACCA
NEW





CU-6187
668
TAGGGAGGTTATGATTAACTTTT
NEW





CU-6183
669
TAAAGTGCTTAGTGCAGGTA
NEW





CU-6181
670
GTTTATGTTGCTTACCTCC
NEW





CU-6176
671
GTAGATAAAATATTGGCG
NEW





CU-6163
672
GGCGGGGACGACGTCAG
NEW





CU-6162
673
GGCGGCGTCGCGGCGGGTC
NEW





CU-6161
674
GGAGGGGGTGAACAAAAAGAAAAA
NEW





CU-6159
675
GCTAAACCTAGCCCCAAACCCACTCCACA
NEW





CU-6142
676
CTGGATAGCGCACTTCGTT
NEW





CU-6129
677
CGGGCGAGGGGCGGACGTTCG
NEW





CU-6123
678
CGGACCTATACCGGA
NEW





CU-6096
679
CCCCGGGTTCAATCCCCGGCACCTCCACC
NEW




A






CU-6088
680
CCCCCCACAACCGCGAA
NEW





CU-6087
681
CCCAGCATCTCCTGTGTTTA
NEW





CU-6086
682
CCCACGTTGGGACGCCA
NEW





CU-6064
683
ATCACGTCCGTGCCTCCA
NEW





CU-6063
684
ATAGCAATGTCAGCAGTACCT
NEW





CU-6051
685
ACCCTGCTCGCTGCGCCA
refseqGeneIntron-annotate





CU-6198
686
TCCCACCCAGGGACGCCA
refseqGeneIntron-annotate





CU-6218
687
TCGTAGGCACATCAATA
refseqGeneIntron-annotate





CU-6007
688
CCCCCACAACCGCGTA
refseqGeneIntron-annotate





CU-6001
689
ACCCCGTCCGTGCCTCCA
refseqGeneIntron-annotate





CU-6039
690
AAAAAAGACACCCCCCACA
refseqGeneIntron-annotate





CU-6005
691
TGTCAGTTTGTTAACCCAA
refseqGeneIntron-annotate





CU-6204
692
TCCCTGTGGTCTAGTGGTTAGG
refseqGeneIntron-annotate





CU-6172
693
GGGGGGGTAAAAAAA
refseqGeneIntron-annotate





CU-6171
694
GGGGGGGGAAAAAAAA
refseqGeneIntron-annotate





CU-6128
695
CGGGCCCGGGTCTTCCC
refseqGeneIntron-annotate





CU-6002
696
CCGCCCCCCGTTCCCCCCA
refseqGeneIntron-annotate





CU-6050
697
ACCCCCGGCTCCTCCACCA
refseqGeneIntron-annotate





CU-6244
698
TTTGGTGGAAATTTTTTGA
refseqGeneIntron-annotate





CU-6240
699
TGTCAGTTTGTTATACCAA
refseqGeneIntron-annotate





CU-6238
700
TGTCAGTTTGTAATTATCCCAA
refseqGeneIntron-annotate





CU-6236
701
TGTCAATTTTTAACCCAA
refseqGeneIntron-annotate





CU-6227
702
TGCTAGGGTAAAAAAAAAA
refseqGeneIntron-annotate





CU-6226
703
TGCAACTCCAAATAAAAGTACCA
refseqGeneIntron-annotate





CU-6224
704
TGAGGTAACGGGGAATTA
refseqGeneIntron-annotate





CU-6209
705
TCCTCGGCATCTCCACCA
refseqGeneIntron-annotate





CU-6197
706
TCATATGAAGTCACCCTAGCCATC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6196
707
TCAGTTTGTTTATTAACCCAA
refseqGeneIntron-annotate





CU-6195
708
TCAGCGTGTCTTTGCCCT
refseqGeneIntron-annotate





CU-6194
709
TCACTGGTGGTCTAGTGGT
refseqGeneIntron-annotate;rnaGene-annotate





CU-6193
710
TCACAATGCTGCCACCA
refseqGeneIntron-annotate





CU-6189
711
TAGTTGTTAATTAACCCAA
refseqGeneIntron-annotate





CU-6188
712
TAGTCCTCATCGCCCTCC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6184
713
TAAAGTGCTTATAGTGCGGGTAA
refseqGeneIntron-annotate





CU-6179
714
GTCCCACCAGAGTCGCCA
refseqGeneIntron-annotate





CU-6170
715
GGGGGAGGGGCCAAAAAAA
refseqGeneIntron-annotate





CU-6167
716
GGGACGCCGCGGTGTCG
refseqGeneIntron-annotate





CU-6166
717
GGGAATACCGGGTGCTTTAGGCTT
refseqGeneIntron-annotate;rnaGene-annotate





CU-6160
718
GGAAGAAGGTGGTGGTATA
refseqGeneIntron-annotate





CU-6156
719
GCGGTGAAATGCGTA
computGene-annotate;Ecoli-annotate;





refseqGeneIntron-annotate





CU-6154
720
GCGGGGAAGGTGGCAAA
refseqGeneIntron-annotate





CU-6152
721
GCGACGACCTCGCGCCCACCTGGTCA
refseqGeneIntron-annotate





CU-6151
722
GCCACCCGATACTGCTGT
refseqGeneIntron-annotate





CU-6150
723
GATGTATGCTTTGTTTCTGTT
refseqGeneIntron-annotate





CU-6148
724
GAGGGGGATTTAGAAAAAAA
refseqGeneIntron-annotate





CU-6147
725
GAAGGAAAGTTCTATAGT
refseqGeneIntron-annotate





CU-6146
726
GAAGCGGCTCTCTTATTT
refseqGeneIntron-annotate





CU-6145
727
GAACGAGACTCTGGCATGCTGA
refseqGeneIntron-annotate;rnaGene-annotate





CU-6143
728
CTGGTAGGCCCATCAAT
refseqGeneIntron-annotate





CU-6132
729
CGGGGCCGATCGCGCGC
computGene-annotate;refseqGeneIntron-annotate





CU-6125
730
CGGCCCCGGGTTCCTCCC
computGene-annotate;refseqGeneIntron-annotate





CU-6118
731
CGAGCCCGGTTAGTA
refseqGeneIntron-annotate;rnaGene-annotate





CU-6117
732
CGACTCTTAGCGGTGGA
piRNA-annotate;refseqGeneIntron-annotate





CU-6116
733
CGAATCCCACTTCTGACACCA
refseqGeneIntron-annotate





CU-6113
734
CGAAAGGGAATCGGGTC
refseqGeneIntron-annotate





CU-6112
735
CCTTAGGTCGCTGGTAAA
refseqGeneIntron-annotate





CU-6108
736
CCGTGCGAGAATACCA
refseqGeneIntron-annotate





CU-6107
737
CCGGTCTCTCAAGCGGCC
refseqGeneIntron-annotate





CU-6099
738
CCCGGCCCTCGCGCGTCC
computGene-annotate;refseqGeneIntron-annotate





CU-6094
739
CCCCGGCATTTCCACCA
computGene-annotate;refseqGeneIntron-annotate





CU-6090
740
CCCCCCCGGCTCCTCCACCA
refseqGeneIntron-annotate





CU-6089
741
CCCCCCACAACCGCTA
refseqGeneIntron-annotate





CU-6085
742
CCCAAGTATTGACTCACCC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6084
743
CCAGTAAGCGCGAGTC
refseqGeneIntron-annotate





CU-6082
744
CCAAAGAAAGCACGTAGAG
refseqGeneIntron-annotate





CU-6081
745
CATGTTTAACGGCCGCGGT
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6080
746
CAGTTTGTAATTAACCCAA
refseqGeneIntron-annotate





CU-6079
747
CAGGAACGGCGCACCA
computGene-annotate;refseqGeneIntron-annotate





CU-6078
748
CAGAACCCTCTAAATCCCC
mitochondnon-annotate;refseqGeneIntron-annotate





CU-6076
749
CACCCGGCTGTGTGCACATGTGT
miRBASE-annotate;computGene-annotate;





refseqGeneIntron-annotate;wgRNA-annotate





CU-6075
750
CAATTGGACCAATCTATC
mitochondnion-annotate;refseqGeneIntron-annotate





CU-6074
751
ATTCCTGTACTGCGATA
refseqGeneIntron-annotate





CU-6070
752
ATCCCTGCGGCGTCTCCA
refseqGeneIntron-annotate





CU-6067
753
ATCCCACCGCTGCCATCA
refseqGeneIntron-annotate





CU-6062
754
AGTCAATAGAAGCCGGCGTA
mitochondnion-annotate;refseqGeneIntron-annotate





CU-6061
755
AGGTTCGTTTGTAAAAA
refseqGeneIntron-annotate





CU-6060
756
AGGTCCTGGGTTTAAGTGT
coMputGene-annotate;refseqGeneIntron-annotate





CU-6058
757
AGGGGGAAGTTCTATAGTC
refseqGeneIntron-annotate





CU-6057
758
AGGCTGTGATGCTCTCNTGAGCCCT
refseqGeneIntron-annotate





CU-6055
759
AGCCCCTCTCCGGCCCTTA
refseqGeneIntron-annotate





CU-6054
760
ACTACCACCTACCTCCC
mitochondnion-annotate;refseqGeneIntron-annotate





CU-6052
761
ACGCCCTTCCCCCCCTTCTTT
miRBASE-annotate;refseqGeneIntron-annotate





CU-6049
762
ACCCCACTCCTGGTGCAC
refseqGeneIntron-annotate





CU-6048
763
ACCACCTGATCCCTTCCC
refseqGeneIntron-annotate





CU-6047
764
ACAGCTAAGCACCCACCA
refseqGeneIntron-annotate





CU-6045
765
ACACATGTTTAACGGCC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6043
766
AATTAGGGACCTGTATG
refseqGeneIntron-annotate





CU-6042
767
AATGGCCCATTTGGGCAAACA
computGene-annotate;refseqGeneIntron-annotate





CU-6041
768
AAAGCGGCTGTGCAAACA
refseqGeneIntron-annotate





CU-6212
769
TCGACTCCTGGCTGGCTCGCCA
wgRNA-annotate





CU-6200
770
TCCCCGGCATCTCCACCAA
computGene-annotate





CU-6157
771
GCGGTGGATCACTCGGCTCGTGCGT
rnaGene-annotate





CU-6105
772
CCGGGTGTTGTAGA
mRNAall-annotate;exEID-annotate





CU-6235
773
TGTAGCGTGGCCGAGCGGT
rnaGene-annotate





CU-6234
774
TGGGGCGACCTCGGAGCAG
mitochondrion-annotate





CU-6230
775
TGGCGTCCTAAGCCAGGGATTGTGGGT
rnaGene-annotate





CU-6229
776
TGGCAGGGGAGATACCATGATTT
rnaGene-annotate





CU-6222
777
TCTGATCAGGGTGAGCATC
mitochondrion-annotate





CU-6220
778
TCGTAGGCACCATCCAT
computGene-annotate





CU-6165
779
GGGAAACGGGGCGCGGCTG
rnaGene-annotate





CU-6137
780
CTACTCCTGCTCGCATCTGCTATA
mitochondrion-annotate





CU-6135
781
CGGGTGGGTTTTTACCGG
computGene-annotate





CU-6120
782
CGAGGAATTCCCAGTAAG
rnaGene-annotate





CU-6115
783
CGAACGCACTTGCGGCCCC
rnaGene-annotate





CU-6093
784
CCCCGCGCGGGTTCGAATC
rnaGene-annotate





CU-6059
785
AGGGGTATGATTCCCGCTT
rnaGene-annotate





CU-6131
786
CGGGGCCACGCGCGCGTC
mRNA-annotate;rRNA-eliminate





CU-6032
787
TGGCGCTGCGGGATGAAC
rRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1153
788
CCCCCCACTGCTAAATTTGACTGGCTT
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-6182
789
TAAAGGTTCGTTTGTAAAA
computGene-annotate;refseqGeneExon-eliminate





CU-6033
790
CGGGGCCGAGGGAGCGA
rRNA-eliminate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6174
791
GGGTTAGGCCTCTTTT
tRNA-eliminate;rnaGene-annotate





CU-6141
792
CTGCGGAAGGATCATTA
rRNA-eliminate;rnaGene-annotate





CU-6101
793
CCCTACCCCCCCGG
rRNA-eliminate;refseqGeneIntron-annotate





CU-6034
794
CCCGCCGGGTCCGCCC
computGene-annotate;rRNA-eliminate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate;rnaGene-annotate





CU-6035
795
CCCCGCGCCCTCTCTCTCTC
rRNA-eLiminate;refseqGeneIntron-annotate





CU-6028
796
CAGGCCTCCCTGGAATC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6029
797
AGTCCCACCCGGGGTACCA
computGene-annotate;refseqGeneExon-eliminate





CU-6243
798
TTGACACGCCCCAGTGCCCTGT
refseqGeneExon-eliminate





CU-6233
799
TGGGAGCGGGCGGGCGGTC
rRNA-eliminate;rnaGene-annotate





CU-6231
800
TGGCGTGGAGCCGGGCGT
rRNA-eliminate;refseqGeneIntron-annotate





CU-6228
801
TGGAGGTCCGTAGCGGT
rRNA-eliminate;mRNA-annotate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6223
802
TGAAGAAGGTCTCGAACA
computGene-annotate;refseqGeneExon-eliminate





CU-6221
803
TCTCGCCGGGGCTTCCA
computGene-annotate;refseqGeneExon-eliminate;





rnaGene-annotate





CU-6217
804
TCGTAGCACCATCAATAA
computGene-annotate;refseqGeneExon-eliminate





CU-6208
805
TCCGGGTCCCCCCTCCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6207
806
TCCGGGGCTGCACGCGCGCT
rRNA-eliminate;rnaGene-annotate





CU-6205
807
TCCGGCCGTGTCGGT
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6203
808
TCCCTGTCCTCCAGGAGT
miRBASE-annotate;computGene-annotate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate;wgRNA-annotate





CU-6201
809
TCCCCTCCTCGTCGCCA
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6199
810
TCCCAGGTAGTCTAGTGGT
refseqGeneExon-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-6191
811
TATTCATTTATCCCCAGCCTAT
miRBASE-annotate;snoRNA-eliminate;refseqGeneIntron-





annotate;wgRNA-annotate;rnaGene-annotate





CU-6190
812
TAGTTGTTATAACCCAA
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6186
813
TAGATCACCCCCTCCCC
mitochondrion-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6185
814
TACCGGCACCTGGCGCC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6178
815
GTATAGGGGCGAAAGAC
rRNA-eliminate;mRNA-annotate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6177
816
GTAGCTGGTTCCCTCCGAA
rRNA-eliminate;mRNA-annotate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6175
817
GGTAAGAAGCCCGGCTC
computGene-annotate;rRNA-eliminate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate;rnaGene-annotate





CU-6173
818
GGGGGGGTTTAAAAAAAAA
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6169
819
GGGGCGCACTACCGGCC
refseqGeneExon-eliminate





CU-6168
820
GGGAGAGGCTGTCGCTGCG
computGene-annotate;refseqGeneExon-eliminate





CU-6164
821
GGCGGGTGAAGCGGCG
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6158
822
GCGGTTCCGGCGGCGTC
rRNA-eliminate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6155
823
GCGGGGCGCCTAGGCCTGGTTTGT
refseqGeneExon-eliminate





CU-6153
824
GCGGCGGTCGGCGGGCGGCGGG
rRNA-eliminate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6149
825
GAGGGGGGGGGTGGGGGGGGA
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6144
826
CTGTCGGCCACCATCAT
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6140
827
CTGCAACTCGACCCCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6139
828
CTCCTCTCCCCGCCCGCCG
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6138
829
CTCAAAGATTAAGCCATGCATGTCTA
rRNA-eliminate;rnaGene-annotate





CU-6136
830
CTACGCCGCGACGAG
computGene-annotate;rRNA-eliminate





CU-6134
831
CGGGTGACGGGGAATCAGGGTT
rRNA-eliminate;rnaGene-annotate





CU-6127
832
CGGGCAGCTTCCGGGA
computGene-annotate;rRNA-eliminate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate





CU-6126
833
CGGGAGGCCCGGGTCCTG
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6124
834
CGGCCCCGCATCCTCCC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6122
835
CGCGGGTAAACGGCGGGAGTAACTAT
mRNAall-annotate;rRNA-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6121
836
CGCCCCCCGTTCCCCCCTCC
rRNA-eliminate





CU-6119
837
CGAGCGGAAACACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6114
838
CGAACCCGGCACCGC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6111
839
CCTCGGGCCGATCGCAC
rRNA-eliminate;rnaGene-annotate





CU-6110
840
CCTATATATCTTACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6109
841
CCGTGGCGGCGACGACC
computGene-annotate;rRNA-eliminate;refseqGeneExon-





eliminate





CU-6106
842
CCGGGTTCCGGCACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6104
843
CCGCGAGGGGGGCCCG
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6103
844
CCGCCTCACGGGACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6102
845
CCGCCCGTCCCCGCCCCTTG
rRNA-eliminate;refseqGeneIntron-annotate;rnaGene-





annotate





CU-6100
846
CCCGGGGCCGCGGTTCCG
computGene-annotate;rRNA-eliminate;refseqGeneIntron-





annotate





CU-6098
847
CCCGAGCCGCCTGGAT
computGene-annotate;rRNA-eliminate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate;rnaGene-annotate





CU-6097
848
CCCGACGGCCGAACT
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6095
849
CCCCGGGGAGCCCGGCGGG
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6092
850
CCCCCTCGCGGCCCTCCCC
rRNA-eliminate;refseqGeneIntron-annotate





CU-6091
851
CCCCCCGTGGCGGCGAC
rRNA-eliminate;refseqGeneIntron-annotate





CU-6083
852
CCACCCAGGGCACGCCA
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6077
853
CACGGGTGACGGGGAA
computGene-annotate;rnaGene-annotate;





refseqGeneIntron-annotate;rRNA-eliminate;





refseqGeneExon-eliminate;piRNA-annotate





CU-6073
854
ATGGGGAGGAAAAAAAAAAAAAA
refseqGeneExon-eliminate;refseqGeneIntron-annotate





CU-6068
855
ATCCCACCGCTGCCCCCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6065
856
ATCACGTCGGTCACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6053
857
ACGGGAAACCTCACCCGGCCCGG
rRNA-eliminate;piRNA-annotate;rnaGene-annotate





CU-6046
858
ACAGAGGCTTACGACCCCTTATTT
mitochondrion-annotate;tRNA-eliminate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-6040
859
AAAAAGGCATAATTAAACTT
mitochondrion-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate
















TABLE 4B







List of short-RNA consensus with maximum 1 mismatch to the


human genome. Table includes information on genomic frequencies.










SEQ

Corrected Counts
Frequencies
















ID

Naïve
Memory
Centroblasts
Ramos
Naïve
Memory
Centroblasts
Ramos


NO:
Short-RNA sequence
(N)
(M)
(CB)
(RA)
(N)
(M)
(CB)
(RA)



















860
TGGCTCAGTTCAGCAGGAACAGT
0
0
1
0
0
0
1.05
0





861
GTGGGGGAGAGGCTGTCGA
0
0
0
1
0
0
0
0.81





862
CGGGGCAGCTCAGTACAGGATT
0
0
1
0
0
0
1.05
0





863
AATTGCACGGTATCCATCTGTAT
0
0
1
0
0
0
1.05
0





864
CGGGGGAGCGCCGCGTA
2
0
0
0
2.04
0
0
0





865
TCGATCCCGGGTTTCGGCACCA
0
0
1
0
0
0
1.05
0





866
ATCGTATCCCACTTCTGACACCA
0
0
0
1
0
0
0
0.81





867
ATCCTGCCGACTACGCCA
13
15
13
613.27
13.76
13.68
4.88






868
TCGAATCCCACTCCTGACACCA
1
2
7
71.02
1.83
7.37
5.69






869
ATCCCATCCTCGTCGCCA
0
0
10
3
0
0
10.53
2.44





870
TCGATTCCCCGACGGGGAGCCA
1
1
1
9
1.02
0.92
1.05
7.32





871
ATCCGGGTGCCCCCTCCA
2
4
0
1
2.04
3.67
0
0.81





872
TCCCGGGCGGCGCACCA
2
2
1
0
2.04
1.83
1.05
0





873
ATCCCACCAGAGTCGCCA
0
0
2
3
0
0
2.11
2.44





874
TCAAATCACGTCGGGGTCACCA
0
1
2
0
0
0.92
2.11
0





875
TGTCAGTTTGTTAATTGACCCAA
0
0
1
1
0
0
1.05
0.81





876
TCGATCCCCGTACGGGCCACCA
0
0
1
1
0
0
1.05
0.81





877
TCGAGCCTCACCTGGAGCACCA
0
0
2
0
0
0
2.11
0





878
TCCGGCTCGAAGGACCA
0
0
2
0
0
0
2.11
0





879
GGCAATACGAGCACCCTG
2
0
0
0
2.04
0
0
0





880
CCGGGGCGTCTCGTAC
2
0
0
0
2.04
0
0
0





881
AGCGGCTGTGCACAAA
0
0
0
2
0
0
0
1.63





882
TGTCAGTTTGTTTAATCCAA
0
0
0
1
0
0
0
0.81





883
TGTCAGTTTGTTATTACCAA
0
0
0
1
0
0
0
0.81





884
TGTCAGGCACCATCAATAA
0
0
0
1
0
0
0
0.81





885
TGATCTTGACACTTAAAGCC
0
0
0
1
0
0
0
0.81





886
TCGTAGGCACCATCAAT
0
0
0
1
0
0
0
0.81





887
TCGACTCCCGGTATGGGAACCA
0
0
0
1
0
0
0
0.81





888
TAGGGAGGTTATGATTAACTTTT
0
0
0
1
0
0
0
0.81





889
TAAAGTGCTTAGTGCAGGTA
0
0
0
1
0
0
0
0.81





890
GTTTATGTTGCTTACCTCC
0
0
1
0
0
0
1.05
0





891
GTAGATAAAATATTGGCG
1
0
0
0
1.02
0
0
0





892
GGCGGGGACGACGTCAG
0
0
0
1
0
0
0
0.81





893
GGCGGCGTCGCGGCGGGTC
0
1
0
0
0
0.92
0
0





894
GGAGGGGGTGAACAAAAAGAAAAA
0
0
0
1
0
0
0
0.81





895
GCTAAACCTAGCCCCAAACCCACT
0
0
0
1
0
0
0
0.81



CCACA













896
CTGGATAGCGCACTTCGTT
0
0
0
1
0
0
0
0.81





897
CGGGCGAGGGGCGGACGTTCG
0
0
1
0
0
0
1.05
0





898
CGGACCTATACCGGA
1
0
0
0
1.02
0
0
0





899
CCCCGGGTTCAATCCCCGGCACCT
0
0
1
0
0
0
1.05
0



CCACCA













900
CCCCCCACAACCGCGAA
0
1
0
0
0
0.92
0
0





901
CCCAGCATCTCCTGTGTTTA
0
1
0
0
0
0.92
0
0





902
CCCACGTTGGGACGCCA
1
0
0
0
1.02
0
0
0





903
ATCACGTCCGTGCCTCCA
0
1
0
0
0
0.92
0
0





904
ATAGCAATGTCAGCAGTACCT
0
0
1
0
0
0
1.05
0





905
ACCCTGCTCGCTGCGCCA
9
17
4
7
9.18
15.6
4.21
5.69





906
TCCCACCCAGGGACGCCA
8
2
1
0
8.16
1.83
1.05
0





907
TCGTAGGCACATCAATA
0
0
0
4
0
0
0
3.25





908
CCCCCACAACCGCGTA
0
4
0
0
0
3.67
0
0





909
ACCCCGTCCGTGCCTCCA
2
1
1
0
2.04
0.92
1.05
0





910
AAAAAAGACACCCCCCACA
0
0
0
3
0
0
0
2.44





911
TGTCAGTTTGTTAACCCAA
0
0
0
2
0
0
0
1 .63





912
TCCCTGTGGTCTAGTGGTTAGG
0
0
1
1
0
0
1.05
0.81





913
GGGGGGGTAAAAAAA
0
0
0
1
0
0
0
0.81





914
GGGGGGGGAAAAAAAA
0
0
0
1
0
0
0
0.81





915
CGGGCCCGGGTCTTCCC
1
1
0
0
1.02
0.92
0
0





916
CCGCCCCCCGTTCCCCCCA
0
2
0
0
0
1.83
0
0





917
ACCCCCGGCTCCTCCACCA
0
1
0
1
0
0.92
0
0.81





918
TTTGGTGGAAATTTTTTGA
0
0
0
1
0
0
0
0.81





919
TGTCAGTTTGTTATACCAA
0
0
0
1
0
0
0
0.81





920
TGTCAGTTTGTAATTATCCCAA
0
0
0
1
0
0
0
0.81





921
TGTCAATTTTTAACCCAA
0
0
0
1
0
0
0
0.81





922
TGCTAGGGTAAAAAAAAAA
0
0
0
1
0
0
0
0.81





923
TGCAACTCCAAATAAAAGTACCA
0
0
0
1
0
0
0
0.81





924
TGAGGTAACGGGGAATTA
0
0
0
1
0
0
0
0.81





925
TCCTCGGCATCTCCACCA
0
0
1
0
0
0
1.05
0





926
TCATATGAAGTCACCCTAGCCATC
0
0
1
0
0
0
1.05
0





927
TCAGTTTGTTTATTAACCCAA
0
0
0
1
0
0
0
0.81





928
TCAGCGTGTCTTTGCCCT
1
0
0
0
1.02
0
0
0





929
TCACTGGTGGTCTAGTGGT
0
1
0
0
0
0.92
0
0





930
TCACAATGCTGCCACCA
1
0
0
0
1.02
0
0
0





931
TAGTTGTTAATTAACCCAA
0
0
0
1
0
0
0
0.81





932
TAGTCCTCATCGCCCTCC
0
1
0
0
0
0.92
0
0





933
TAAAGTGCTTATAGTGCGGGTAA
0
0
0
1
0
0
0
0.81





934
GTCCCACCAGAGTCGCCA
0
0
1
0
0
0
1.05
0





935
GGGGGAGGGGCCAAAAAAA
0
0
0
1
0
0
0
0.81





936
GGGACGCCGCGGTGTCG
1
0
0
0
1.02
0
0
0





937
GGGAATACCGGGTGCTTTAGGCTT
0
1
0
0
0
0.92
0
0





938
GGAAGAAGGTGGTGGTATA
0
0
0
1
0
0
0
0.81





939
GCGGTGAAATGCGTA
1
0
0
0
1.02
0
0
0





940
GCGGGGAAGGTGGCAAA
0
0
0
1
0
0
0
0.81





941
GCGACGACCTCGCGCCCACCTGG
0
1
0
0
0
0.92
0
0



TCA













942
GCCACCCGATACTGCTGT
0
1
0
0
0
0.92
0
0





943
GATGTATGCTTTGTTTCTGTT
0
0
1
0
0
0
1.05
0





944
GAGGGGGATTTAGAAAAAAA
0
0
0
1
0
0
0
0.81





945
GAAGGAAAGTTCTATAGT
0
0
0
1
0
0
0
0.81





946
GAAGCGGCTCTCTTATTT
0
0
0
1
0
0
0
0.81





947
GAACGAGACTCTGGCATGCTGA
0
0
1
0
0
0
1.05
0





948
CTGGTAGGCCCATCAAT
0
0
0
1
0
0
0
0.81





949
CGGGGCCGATCGCGCGC
0
1
0
0
0
0.92
0
0





950
CGGCCCCGGGTTCCTCCC
1
0
0
0
1.02
0
0
0





951
CGAGCCCGGTTAGTA
1
0
0
0
1.02
0
0
0





952
CGACTCTTAGCGGTGGA
0
0
1
0
0
0
1.05
0





953
CGAATCCCACTTCTGACACCA
0
0
0
1
0
0
0
0.81





954
CGAAAGGGAATCGGGTC
1
0
0
0
1.02
0
0
0





955
CCTTAGGTCGCTGGTAAA
0
0
1
0
0
0
1.05
0





956
CCGTGCGAGAATACCA
0
1
0
0
0
0.92
0
0





957
CCGGTCTCTCAAGCGGCC
1
0
0
0
1.02
0
0
0





958
CCCGGCCCTCGCGCGTCC
0
1
0
0
0
0.92
0
0





959
CCCCGGCATTTCCACCA
0
0
1
0
0
0
1.05
0





960
CCCCCCCGGCTCCTCCACCA
0
0
0
1
0
0
0
0.81





961
CCCCCCACAACCGCTA
0
1
0
0
0
0.92
0
0





962
CCCAAGTATTGACTCACCC
0
1
0
0
0
0.92
0
0





963
CCAGTAAGCGCGAGTC
1
0
0
0
1.02
0
0
0





964
CCAAAGAAAGCACGTAGAG
0
0
0
1
0
0
0
0.81





965
CATGTTTAACGGCCGCGGT
0
0
1
0
0
0
1.05
0





966
CAGTTTGTAATTAACCCAA
0
0
0
1
0
0
0
0.81





967
CAGGAACGGCGCACCA
0
0
1
0
0
0
1.05
0





968
CAGAACCCTCTAAATCCCC
0
0
1
0
0
0
1.05
0





969
CACCCGGCTGTGTGCACATGTGT
1
0
0
0
1.02
0
0
0





970
CAATTGGACCAATCTATC
0
0
1
0
0
0
1.05
0





971
ATTCCTGTACTGCGATA
0
0
0
1
0
0
0
0.81





972
ATCCCTGCGGCGTCTCCA
0
0
0
1
0
0
0
0.81





973
ATCCCACCGCTGCCATCA
0
1
0
0
0
0.92
0
0





974
AGTCAATAGAAGCCGGCGTA
0
0
1
0
0
0
1.05
0





975
AGGTTCGTTTGTAAAAA
0
0
0
1
0
0
0
0.81





976
AGGTCCTGGGTTTAAGTGT
0
0
0
1
0
0
0
0.81





977
AGGGGGAAGTTCTATAGTC
0
0
0
1
0
0
0
0.81





978
AGGCTGTGATGCTCTCNTGAGCCC
0
0
1
0
0
0
1.05
0



T













979
AGCCCCTCTCCGGCCCTTA
0
1
0
0
0
0.92
0
0





980
ACTACCACCTACCTCCC
1
0
0
0
1.02
0
0
0





981
ACGCCCTTCCCCCCCTTCTTT
0
0
0
1
0
0
0
0.81





982
ACCCCACTCCTGGTGCAC
1
0
0
0
1.02
0
0
0





983
ACCACCTGATCCCTTCCC
1
0
0
0
1.02
0
0
0





984
ACAGCTAAGCACCCACCA
0
0
1
0
0
0
1.05
0





985
ACACATGTTTAACGGCC
1
0
0
0
1.02
0
0
0





986
AATTAGGGACCTGTATG
0
0
1
0
0
0
1.05
0





987
AATGGCCCATTTGGGCAAACA
0
0
0
1
0
0
0
0.81





988
AAAGCGGCTGTGCAAACA
0
0
0
1
0
0
0
0.81





989
TCGACTCCTGGCTGGCTCGCCA
0
2
2
1
0
1.83
2.11
0.81





990
TCCCCGGCATCTCCACCAA
0
1
2
0
0
0.92
2.11
0





991
GCGGTGGATCACTCGGCTCGTGC











GT
0
0
0
3
0
0
0
2.44





992
CCGGGTGTTGTAGA
2
0
0
0
2.04
0
0
0





993
TGTAGCGTGGCCGAGCGGT
0
1
0
0
0
0.92
0
0





994
TGGGGCGACCTCGGAGCAG
0
0
1
0
0
0
1.05
0





995
TGGCGTCCTAAGCCAGGGATTGTG
0
0
0
1
0
0
0
0.81



GGT













996
TGGCAGGGGAGATACCATGATTT
0
0
1
0
0
0
1.05
0





997
TCTGATCAGGGTGAGCATC
0
1
0
0
0
0.92
0
0





998
TCGTAGGCACCATCCAT
0
0
0
1
0
0
0
0.81





999
GGGAAACGGGGCGCGGCTG
0
1
0
0
0
0.92
0
0





1000
CTACTCCTGCTCGCATCTGCTATA
0
0
1
0
0
0
1.05
0





1001
CGGGTGGGTTTTTACCGG
1
0
0
0
1.02
0
0
0





1002
CGAGGAATTCCCAGTAAG
0
0
1
0
0
0
1.05
0





1003
CGAACGCACTTGCGGCCCC
1
0
0
0
1.02
0
0
0





1004
CCCCGCGCGGGTTCGAATC
1
0
0
0
1.02
0
0
0





1005
AGGGGTATGATTCCCGCTT
0
0
0
1
0
0
0
0.81





1006
CGGGGCCACGCGCGCGTC
3
6
0
0
3.06
5.5
0
0





1007
TGGCGCTGCGGGATGAAC
0
3
1
0
0
2.75
1.05
0





1008
CCCCCCACTGCTAAATTTGACTGG
0
0
2
2
0
0
2.11
1.63



CTT













1009
TAAAGGTTCGTTTGTAAAA
0
0
0
3
0
0
0
2.44





1010
CGGGGCCGAGGGAGCGA
1
2
0
0
1.02
1.83
0
0





1011
GGGTTAGGCCTCTTTT
0
1
1
0
0
0.92
1.05
0





1012
CTGCGGAAGGATCATTA
1
0
1
0
1.02
0
1.05
0





1013
CCCTACCCCCCCGG
0
2
0
0
0
1.83
0
0





1014
CCCGCCGGGTCCGCCC
2
0
0
0
2.04
0
0
0





1015
CCCCGCGCCCTCTCTCTCTC
0
2
0
0
0
1.83
0
0





1016
CAGGCCTCCCTGGAATC
2
0
0
0
2.04
0
0
0





1017
AGTCCCACCCGGGGTACCA
0
0
0
2
0
0
0
1.63





1018
TTGACACGCCCCAGTGCCCTGT
1
0
0
0
1.02
0
0
0





1019
TGGGAGCGGGCGGGCGGTC
0
1
0
0
0
0.92
0
0





1020
TGGCGTGGAGCCGGGCGT
0
1
0
0
0
0.92
0
0





1021
TGGAGGTCCGTAGCGGT
1
0
0
0
1.02
0
0
0





1022
TGAAGAAGGTCTCGAACA
0
0
0
1
0
0
0
0.81





1023
TCTCGCCGGGGCTTCCA
0
1
0
0
0
0.92
0
0





1024
TCGTAGCACCATCAATAA
0
0
0
1
0
0
0
0.81





1025
TCCGGGTCCCCCCTCCA
0
1
0
0
0
0.92
0
0





1026
TCCGGGGCTGCACGCGCGCT
0
1
0
0
0
0.92
0
0





1027
TCCGGCCGTGTCGGT
1
0
0
0
1 .02
0
0
0





1028
TCCCTGTCCTCCAGGAGT
0
0
0
1
0
0
0
0.81





1029
TCCCCTCCTCGTCGCCA
1
0
0
0
1.02
0
0
0





1030
TCCCAGGTAGTCTAGTGGT
1
0
0
0
1 .02
0
0
0





1031
TATTCATTTATCCCCAGCCTAT
0
1
0
0
0
0.92
0
0





1032
TAGTTGTTATAACCCAA
0
0
0
1
0
0
0
0.81





1033
TAGATCACCCCCTCCCC
0
1
0
0
0
0.92
0
0





1034
TACCGGCACCTGGCGCC
1
0
0
0
1.02
0
0
0





1035
GTATAGGGGCGAAAGAC
0
0
1
0
0
0
1.05
0





1036
GTAGCTGGTTCCCTCCGAA
0
0
0
1
0
0
0
0.81





1037
GGTAAGAAGCCCGGCTC
0
0
1
0
0
0
1.05
0





1038
GGGGGGGTTTAAAAAAAAA
0
0
0
1
0
0
0
0.81





1039
GGGGCGCACTACCGGCC
1
0
0
0
1.02
0
0
0





1040
GGGAGAGGCTGTCGCTGCG
0
0
0
1
0
0
0
0.81





1041
GGCGGGTGAAGCGGCG
0
1
0
0
0
0.92
0
0





1042
GCGGTTCCGGCGGCGTC
0
1
0
0
0
0.92
0
0





1043
GCGGGGCGCCTAGGCCTGGTTTG
1
0
0
0
1.02
0
0
0



T













1044
GCGGCGGTCGGCGGGCGGCGGG
1
0
0
0
1.02
0
0
0





1045
GAGGGGGGGGGTGGGGGGGGA
0
0
0
1
0
0
0
0.81





1046
CTGTCGGCCACCATCAT
0
0
0
1
0
0
0
0.81





1047
CTGCAACTCGACCCCA
0
1
0
0
0
0.92
0
0





1048
CTCCTCTCCCCGCCCGCCG
0
0
1
0
0
0
1.05
0





1049
CTCAAAGATTAAGCCATGCATGTC
0
0
1
0
0
0
1.05
0



TA













1050
CTACGCCGCGACGAG
1
0
0
0
1.02
0
0
0





1051
CGGGTGACGGGGAATCAGGGTT
1
0
0
0
1.02
0
0
0





1052
CGGGCAGCTTCCGGGA
0
0
0
1
0
0
0
0.81





1053
CGGGAGGCCCGGGTCCTG
1
0
0
0
1.02
0
0
0





1054
CGGCCCCGCATCCTCCC
1
0
0
0
1.02
0
0
0





1055
CGCGGGTAAACGGCGGGAGTAAC
0
0
1
0
0
0
1.05
0



TAT













1056
CGCCCCCCGTTCCCCCCTCC
0
1
0
0
0
0.92
0
0





1057
CGAGCGGAAACACCA
1
0
0
0
1.02
0
0
0





1058
CGAACCCGGCACCGC
1
0
0
0
1.02
0
0
0





1059
CCTCGGGCCGATCGCAC
0
0
1
0
0
0
1.05
0





1060
CCTATATATCTTACCA
0
1
0
0
0
0.92
0
0





1061
CCGTGGCGGCGACGACC
0
1
0
0
0
0.92
0
0





1062
CCGGGTTCCGGCACCA
1
0
0
0
1.02
0
0
0





1063
CCGCGAGGGGGGCCCG
1
0
0
0
1.02
0
0
0





1064
CCGCCTCACGGGACCA
1
0
0
0
1.02
0
0
0





1065
CCGCCCGTCCCCGCCCCTTG
0
1
0
0
0
0.92
0
0





1066
CCCGGGGCCGCGGTTCCG
1
0
0
0
1.02
0
0
0





1067
CCCGAGCCGCCTGGAT
0
1
0
0
0
0.92
0
0





1068
CCCGACGGCCGAACT
0
1
0
0
0
0.92
0
0





1069
CCCCGGGGAGCCCGGCGGG
1
0
0
0
1.02
0
0
0





1070
CCCCCTCGCGGCCCTCCCC
0
1
0
0
0
0.92
0
0





1071
CCCCCCGTGGCGGCGAC
0
1
0
0
0
0.92
0
0





1072
CCACCCAGGGCACGCCA
1
0
0
0
1.02
0
0
0





1073
CACGGGTGACGGGGAA
1
0
0
0
1.02
0
0
0





1074
ATGGGGAGGAAAAAAAAAAAAAA
0
0
0
1
0
0
0
0.81





1075
ATCCCACCGCTGCCCCCA
0
0
0
1
0
0
0
0.81





1076
ATCACGTCGGTCACCA
0
0
0
1
0
0
0
0.81





1077
ACGGGAAACCTCACCCGGCCCGG
0
0
1
0
0
0
1.05
0





1078
ACAGAGGCTTACGACCCCTTATTT
0
0
1
0
0
0
1.05
0





1079
AAAAAGGCATAATTAAACTT
0
0
1
0
0
0
1.05
0









Interestingly, several of these non-annotated sequences (i.e. CU-5004, CU-5021, CU-6030, CU-6069) were cloned multiple times and showed differential expression across libraries, suggesting they may represent short-RNAs with characteristics distinct from those currently recognized in “classic” miRNAs.


In conclusion, the generation of short-RNA libraries from normal and neoplastic B cells led to the identification of 401 bona fide miRNAs as well as other short-RNA species of unknown function.


Abundance and Evolutionary Conservation


Previously reported miRNAs appeared to be more abundant than newly discovered miRNAs (FIG. 13A). Approximately 21% of previously reported miRNAs appeared in the libraries as single occurrences compared to 57% of the newly discovered miRNAs. Approximately 42% of known miRNAs were expressed at all stages of mature B cell development, while newly identified miRNAs showed a more distinct stage-specificity (FIG. 13B), consistent with the notion that presently known miRNAs are mostly representative of ubiquitously expressed miRNAs.


Regardless of their novelty, stage-specific miRNAs were observed with frequencies (defined as the fraction of the total pool of cloned miRNAs represented by a given miRNA) ranging between 0.03 and 0.6% of their respective libraries. The most abundant GC-associated miRNAs showed restricted expression in GC B cells and, if unaffected by transformation, in Ramos cells. However, in naïve and memory B cell libraries only the rarest miRNAs were truly exclusive in their expression and most of the non-GC-specific miRNAs were expressed in both naïve and memory cells albeit at different levels.


In order to investigate the presence of orthologous miRNA in other mammalian species, we relied on UCSC-provided Blastz pairwise alignments between human and target species and investigated conservation using two complementary methods, detailed in Supplementary Methods. The analysis was performed on the complete set of miRNAs deposited in the miRBase database and on the miRNAs (known and new) represented in the B cell libraries. Alignments of the human mature miRNA to its target species were required to have either perfect conservation of the entire mature miRNA sequence (FIG. 13C) or conservation of seeds composed of seven bases starting from the second position of the human mature sequence followed by conservation of 3 bases starting from the 12th, 13th or 14th position as suggested by21. The majority of miRBase miRNAs showed conservation across mammalian genomes, from primates to rodents. Conservation frequency mimicked known phylogenetic distances to human, with the highest conservation in chimp and lowest in rat. The conservation frequencies of known and new miRNAs in B cells were similar in chimp (Pan troglodytes) and monkey (Macacus rhesus), especially when conservation requirements were restricted to the seed region of miRNAs. However, conservation frequencies in dog, mouse and rat were significantly divergent, with known miRNAs more likely to exhibit conservation than new candidate miRNAs (FIG. 13C). In summary, new miRNAs expressed at specific stages of B cell differentiation were less abundant and showed a lower level of conservation across species.


Validation of Newly Discovered miRNA


The newly identified miRNAs were investigated by Northern Blot analysis in order to validate their existence in vivo. Northern Blot analyses were performed using B cell lines and cells isolated from tonsil tissue obtained from multiple donors. Among 23 candidate miRNAs that have been cloned in any of the four libraries with 1-100 occurrences, 13 were detectable by Northern Blot (FIG. 14A). Detection of several miRNAs represented by low number of occurrences in the libraries was successful only upon enrichment for the short-RNA fraction, suggesting that low-abundance miRNA could be below the level of detection by Northern blotting. Overall, approximately 55% of the newly cloned and computationally validated miRNAs were detectable by Northern Blot.


Transcriptional and Post-Transcriptional Regulation


Most newly identified miRNAs showed a long abundant transcript (>150 nt) that might correspond to the primary miRNA transcript and a second transcript (˜60-80 nt) consistent with the precursor miRNA. As shown in FIG. 14B (top panel), the precursor miRNA and the correspondent mature miRNA may be produced in some cell type but not in others, suggesting transcriptional regulation. Conversely in some cases the miRNA precursor species may be present in cell types that lack expression of the mature form (FIG. 14B, bottom panel) suggesting the existence of a second level of regulation targeting the Dicer-dependent pre-miRNA processing22-24.


Distinct miRNA Signatures in Normal B Cells


miRNA representation in the four constructed libraries suggested differential expression of miRNAs during B cell differentiation and GC transit (FIG. 13B). The correlation among miRNA profiles from normal B cells and Ramos cell line was further investigated by hierarchical clustering using miRNA frequencies (defined as the fraction of the total pool of cloned miRNAs represented by a given miRNA in a library) obtained from the cloning data (FIG. 15A). Naïve and memory B cells appeared similar, sharing a large fraction of the most abundant miRNA. Conversely, centroblasts and Ramos cells showed more distinct miRNA profiles with a sizeable fraction of abundant miRNA being specifically expressed in each library.


We also performed miRNA expression profiling of centroblasts, naïve and memory B cells (six donors/each) using a microarray representative of 723 known human miRNAs (miRBase v.10.1). Each B cell population showed a distinct miRNA expression profile. Consistent with the cloning data (FIGS. 13B and 15A) GC B cells appeared to be quite distinct from naïve and memory B cells which instead shared expression of a large fraction of miRNAs (FIG. 15A). The main differences between naïve and memory B cells resided in the level of miRNA expression. The expression of several miRNAs was tested by qRT-PCR analysis which confirmed that the microarray data were accurate for the relative quantification.


miRNA Signatures can Identify Subtypes of B Cell Malignancies


The miRNA library generated from Ramos BL cell line demonstrated that tumors can display specific miRNA expression signatures. To investigate whether these signatures can identify subtypes of B cell malignancies, miRNA expression profiling was performed using the same microarray platform on a panel of GC-derived malignancies including BL (8), DLBCL (16) and FL (10). Unsupervised clustering analysis of the tumor miRNA profiles was able to identify three major clusters enriched for samples belonging to each malignant phenotype (FIG. 16). These tumors can be discriminated as well by gene expression profiling however it requires the use of a higher number of features. These results show that tumors deriving from the same stage of B cell differentiation acquire distinct miRNA expression profiles as consequence of malignant transformation.


Discussion


The combination of cloning procedures and computational tools led us to the identification of a large fraction of known as well as newly discovered miRNA expressed during B cell differentiation. These findings have general implications for the understanding of the total miRNA content of the human genome as well as for future studies on the role of miRNAs in B cell differentiation, function and lymphomagenesis.


The discovery of >250 new miRNAs and of their tissue-specific pattern of expression are in sharp contrast with previous reports that suggested, based on the discovery of only 12 new human miRNA7 from an analysis of 26 different organ systems and cell types, that most miRNAs are ubiquitously expressed and that most miRNAs have already been identified. These discordant results and conclusions may be partially due to the significantly higher number of clones per library sequenced in this study (3500 versus 1300 on average in7) which allowed the detection of low abundant miRNA species and to the criteria applied in the miRNA prediction (see Supplementary Methods).


The relatively lower degree of evolutionary conservation of tissue-specific miRNAs (FIG. 13C) may have prevented the cross-species identification of miRNAs using murine libraries18,25. Consistent with these observations, a recent report on short-RNAs in mouse embryonic stem cells discovered new Dicer-dependent miRNAs characterized by both low abundance and low level of conservation26. Thus, a large number of low-abundance, recently evolved, tissue-specific miRNAs remain to be discovered.


Two categories of short-RNAs were identified that could not be annotated as bona fide miRNAs. The first category is represented by those short-RNAs that could not be accurately mapped to the genome. Considering that a fraction of these RNAs were cloned multiple times and showed a stage-specific behavior, such short-RNAs do actually exist and that the lack of a match to the human genome may be due to polymorphisms, editing and other post-transcriptional modifications or to an incomplete/inaccurate sequencing of the corresponding genomic regions. The second category is represented by short-RNAs for which classic pre-miRNA structures could not be identified in the genome and no similarity to other non-coding RNA was found in the available databases. These short-RNAs may either be miRNA for which RNA secondary structure prediction algorithms failed to predict the correct hairpin structure or may represent new miRNA species of presently unknown mechanism of generation or other not yet described types of short-RNAs.


The stage specific expression of various miRNAs, especially in GC B cells, suggests highly specialized regulatory functions in B cell biology. The role of miRNAs that show cell type-specific functions in lymphocytes has just begun to be elucidated8-10,27. The miRNAs specifically associated to GC or non-GC B cells by either cloning or miRNA expression profiling (FIG. 15) have not been previously reported in B cell differentiation with the exception of miR-15010. The miR-17-92 cluster, previously reported as a potential oncogene11, was found over-expressed in Ramos cell line compared to GC B cells possibly as a consequence of the transformation process.


Specificity in mature miRNA expression may be regulated at the transcriptional as well as at the post-transcriptional, i.e. pre-miRNA processing, level. Pre-miRNA accumulation in absence of a mature miRNA can occur in a cell type-restricted manner, suggesting the presence of a regulation mechanism at the pre-miRNA processing step. Both regulatory mechanisms may act during normal differentiation and may also be dysregulated during transformation as a consequence of genetic or epigenetic alterations22-24. The expanded B cell miRNome described here can be used to identify specific differences in miRNA expression in normal versus lymphoma cells that can guide searches for these tumor alterations.


miRNA expression profile differences between GC and non-GC B cells resembled those observed by expression profiling of coding genes28, consistent with the previous observation that miRNA profiling may be equally or more informative in discriminating cell phenotypes29. miRNA expression profiling, especially if including new B-cell specific miRNAs, may be useful in the differential diagnosis of lymphoid malignancies.


Materials and Methods


Generation of Short-RNA Libraries


Purification of naïve, memory and GC B cells was performed as previously reported28 using magnetic cell sorting of mononucleated cells obtained from human tonsils. Total RNA was purified using the Trizol Reagent (Invitrogen) following the manufacturer's indications. The short-RNA libraries were generated using an established protocol described in detail in30. Briefly, total RNA was separated on 15% polyacrylamide gel and the fragment corresponding to 18-28 nucleotides length was excised. The purified small RNAs were linked to adaptor oligonucleotides and gel purified. Upon adaptor ligation, RNA was reverse transcribed and cDNA was PCR amplified and cloned into pCR2.1-TOPO vector (Invitrogen). Sequencing was performed on colony PCR amplicons.


Computational Identification of Mature and Precursor MiRNAs


The bioinformatics miRNA analysis pipeline (FIG. 18) includes: (a) identification of short-RNAs from each library, (b) identification of exact and partial matches of the short-RNA sequences to the human genome, (c) testing each short-RNA genomic region for compatibility with hairpin secondary structures, (d) clustering genomic regions to predict mature miRNAs, (e) annotating and filtering short-RNAs and miRNAs candidates, (f) estimation of predicted miRNA frequencies in the libraries and (g) clustering short-RNAs that do not support miRNA candidates. The details are reported in the Supplementary Methods.


Orthology Analysis


The identification of putative orthologous sequences of known and predicted precursor and mature human miRNAs in chimp (panTro2), monkey (rheMac2), dog (canFam2) mouse (mm8) and rat (rn4) was performed using UCSC-provided Blastz31 pairwise alignments between human and target species. The details are reported in the Supplementary Methods.


miRNA Expression Profiling


The miRNA expression profiles were generated using the Human miRNA Microarray kit (Agilent Technologies) that allows detection of 723 known human (miRBase v.10.1) and 76 human viral miRNAs following the manufacturer's indications. Analysis of raw data was performed using the Feature Extraction Software 9.5.3.1 (Agilent Technologies). The dendrograms (FIG. 15) were generated using a hierarchical clustering algorithm based on the average-linkage method32,33 and Spearman's correlation as provided by the geWorkbench platform (http://www.geworkbench.org).


Northern Blot


Total RNA and small RNA fractions were purified using the Trizol Reagent (Invitrogen) and the PureLink miRNA Isolation Kit (Invitrogen), respectively, following the manufacturer's indications. Electrophoresis was performed on 15% denaturing polyacrylamide gel and then RNA was transferred on Duralon UV membrane (Stratagene) using a semidry transfer apparatus. Pre-hybridization and hybridization were performed in 5×SSC, 20 mM Na2HPO4 pH 7.2, 7% SDS, 3×Denhardt's Solution. Oligonucleotide probes were [γ32P]-ATP labeled by polynucleotide kinase (Fermentas). The list of oligonucleotides is reported in Table 5.









TABLE 5







Listing of Oligonucleotide Probe Sequences












Mature miRNA
SEQ

SEQ ID


ID
sequence (5′-3′)
ID NO:
Probe sequence (5′-3′)
NO:





CU-1303
ATCCCACTTCTGACACC
237
TGGTGTCAGAAGTGGGAT
1080



A








CU-1403
GCATTGGTGGTTCAGTG
391
TCTACCACTGAACCACCAATGC
1081



GTAGA








CU-1253
GTGAAGCGTTCCATATT
281
AAAAATATGGAACGCTTCAC
1082



TTT








CU-1513
GCGGGTGATGCGAACT
381
GCTCAGACTCCAGTTCGCATCACC
1083



GGAGTCTGAGC

CGC






CU-1173
ATCCCACTCCTGACACC
145
TGGTGTCAGGAGTGGGAT
1084



A








CU-1276
TCGATTCCCGGCCAATG
236
TGGTGCATTGGCCGGGAATCGA
1085



CACCA








CU-1368
GACGAGGTGGCCGAGT
382
AACCACTCGGCCACCTCGTC
1086



GG








CU-1254
TCCCCGGCACCTCCACC
233
TGGTGGAGGTGCCGGGGA
1087



A








CU-1137
GCTAAGGAAGTCCTGTG
132
AAAACTGAGCACAGGACTTCCTTA
1088



CTCAGTTTT

GC






CU-1153
CCCCCCACTGCTAAATT
142
AAGCCAGTCAAATTTAGCAGTGGG
1089



TGACTGGCTT

GGG






CU-1241
AGTCCCATCTGGGTCGC
243
TGGCGACCCAGATGGGACT
1090



CA








CU-1351
CCTTCCTTGGATGTCTG
316
CTCACTCAGACATCCAAGGAAGG
1091



AGTGAG








CU-1142
TCGATTCCCGGCCCATG
149
TGGTGCATGGGCCGGGAATCGA
1092



CACCA









After over-night hybridization, membranes were washed at the same temperature in 3×SSC, 25 mM NaH2PO4 pH 7.5, 5% SDS, 10×Denhardt's Solution for 15-20′ and in 1×SSC, 1% SDS for 5′. Images were obtained by exposure to phosphoimager cassette and acquisition by Storm 840 Phosphoimager (Molecular Dynamics) and by film exposure for approximately 2 weeks.


REFERENCES



  • 1. Bartel, D. P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-97 (2004).

  • 2. Kim, V. N. MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol 6, 376-85 (2005).

  • 3. Griffiths-Jones, S. miRBase: the microRNA sequence database. Methods Mol Biol 342, 129-38 (2006).

  • 4. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-4 (2006).

  • 5. Miranda, K. C. et al. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126, 1203-17 (2006).

  • 6. Bentwich, I. et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet. 37, 766-70 (2005).

  • 7. Landgraf, P. et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-14 (2007).

  • 8. That, T. H. et al. Regulation of the germinal center response by microRNA-155. Science 316, 604-8 (2007).

  • 9. Rodriguez, A. et al. Requirement of bic/microRNA-155 for normal immune function. Science 316, 608-11 (2007).

  • 10. Xiao, C. et al. MiR-150 Controls B Cell Differentiation by Targeting the Transcription Factor c-Myb. Cell 131, 146-59 (2007).

  • 11. He, L. et al. A microRNA polycistron as a potential human oncogene. Nature 435, 828-33 (2005).

  • 12. Calin, G. A. et al. Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Nail Acad Sci USA 99, 15524-9 (2002).

  • 13. Calin, G. A. et al. A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 353, 1793-801 (2005).

  • 14. Kuppers, R. & Dalla-Favera, R. Mechanisms of chromosomal translocations in B cell lymphomas. Oncogene 20, 5580-94 (2001).

  • 15. Klein, U. & Dalla-Favera, R. Germinal centres: role in B-cell physiology and malignancy. Nat Rev Immunol 8, 22-33 (2008).

  • 16. Harrell, F. E. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (Springer, N.Y., 2001).

  • 17. Hinkley, A. C. D. a. D. V. Bootstrap Methods and their Applications (Cambridge University Press, New York, 1997).

  • 18. Neilson, J. R., Zheng, G. X., Burge, C. B. & Sharp, P. A. Dynamic regulation of miRNA expression in ordered stages of cellular development. Genes Dev 21, 578-89 (2007).

  • 19. Kawahara, Y. et al. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137-40 (2007).

  • 20. Luciano, D. J., Mirsky, H., Vendetti, N. J. & Maas, S. RNA editing of a miRNA precursor. Rna 10, 1174-7 (2004).

  • 21. Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105 (2007).

  • 22. Thomson, J. M. et al. Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20, 2202-7 (2006).

  • 23. Michael, M. Z., S M, O. C., van Holst Pellekaan, N. G., Young, G. P. & James, R. J. Reduced accumulation of specific microRNAs in colorectal neoplasia. Mol Cancer Res 1, 882-91 (2003).

  • 24. Lee, E. J. et al. Systematic evaluation of microRNA processing patterns in tissues, cell lines, and tumors. Rna 14, 35-42 (2007).

  • 25. Chen, C. Z., Li, L., Lodish, H. F. & Bartel, D. P. MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-6 (2004).

  • 26. Calabrese, J. M., Seila, A. C., Yeo, G. W. & Sharp, P. A. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci USA 104, 18097-102 (2007).

  • 27. Li, Q. J. et al. miR-181a is an intrinsic modulator of T cell sensitivity and selection. Cell 129, 147-61 (2007).

  • 28. Klein, U. et al. Transcriptional analysis of the B cell germinal center reaction. Proc Natl Acad Sci USA 100, 2639-44 (2003).

  • 29. Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834-8 (2005).

  • 30. Lau, N. C., Lim, L. P., Weinstein, E. G. & Bartel, D. P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-62 (2001).

  • 31. Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res 13, 103-7 (2003).

  • 32. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863-8 (1998).

  • 33. Hartigan, J. A. Clustering Algorithms (Wiley, New York, 1975).



Supplementary Methods


Bioinformatics Analysis of Short-RNA Libraries


The bioinformatics microRNA (miRNA) analysis pipeline includes (a) identification of short-RNAs from each library, (b) identification of exact and partial matches of the short-RNA sequences to the human genome, (c) testing each short-RNA genomic region for compatibility with hairpin secondary structures, (d) clustering genomic regions to predict mature miRNAs, (e) annotating and filtering short-RNAs and miRNAs candidates, (f) estimation of predicted miRNA frequencies in the libraries, (g) clustering short-RNAs that do not support miRNAs candidates.


a. Identification of Short-RNAs from Cloned cDNA Sequences:


Short-RNA sequences of length 15-30 bp were recovered from within cloned cDNA sequences using 8 bp adaptor oligonucleotides as oriented markers. An exact ≧6 bp match to the suffix of the 5′ daptor oligonucleotide and the prefix of the 3′ adaptor oligonucleotide were required. All short-RNAs of length 17 to 27 nt (length range of miRNAs deposited in the miRBase database v11.0) that are bounded by adaptors were reported, and short-RNAs containing adaptor fragments were tagged as lower confidence observations. miRNA candidates supported by low confidence short-RNAs were later evaluated individually and discarded by expert decision. The remaining ones are listed here: CU-1153 is supported by 29 short-RNAs including 1 low confidence; CU-1293 and CU-1079 are supported by low confidence short-RNAs including sequence fragments that could not originate from linkers.


b. Mapping to the Human Genome:


Each short-RNA was aligned to the human genome assembly from March 2006 (hg18) using WU-Blast. WU-Blast was ran locally optimal, with default word length 6, maximum allowed separation 2, no gaps, and no high scoring pair consistency. All WU-Blast reported matches were retrieved and mismatches marked. Only the genomic matches with the smallest number of mismatches were recorded, and we refer to them as short-RNA genomic locations.


c. Testing Short-RNA Genomic Locations for Hairpin Secondary Structures:


Short-RNA genomic locations were tested for compatibility with hairpin secondary structures. The criteria were established upon investigation of the characteristics of mammalian miRNA precursors deposited in the miRBase database (v.11.0)s1,s2. Only short-RNAs that have genomic locations compatible with a hairpin structure were used to define putative miRNAs and their precursors. A short-RNA genomic location was considered if the following criteria were satisfied: i) one or more hairpin structures were predicted in the genomic region starting at most 90 bases upstream and ending at most 90 bases downstream of the short-RNA genomic location; ii) the lowest free-energy predicted secondary structure for the containing region was a hairpin with maximum fold free energy of −8 joules, which is the maximum fold free energy observed for miRBase miRNAs; 110 the short-RNA genomic location could not overlap the predicted hairpin loop, which was required to be 3-20 bp long (the range of hairpin-loop lengths observed in the miRBase database); iv) the ratio between the number of complementary base pairs and the total number of base pairs in the stem was larger than 0.645, which is the minimum ratio observed in miRBase database. We used RNAfold 1.6, a part of the ViennaRNA package (http://www.tbi.univie.ac.at/˜ivo/RNA/), with temperature set to 37° C. to predict the lowest energy secondary RNA structure for each candidate genomic sequence.


d. Prediction of Mature miRNA:


Short-RNA genomic locations consistent with hairpin secondary structures (smirREGs) were clustered based on genomic region overlap, and smirREG clusters were pruned, split, merged, accepted, or eliminated iteratively. Clusters were first constructed from smirREGs corresponding to: (1) regions perfectly aligned to short-RNAs; (2) regions aligned to short-RNAs with 1-mismatch, represented by an ‘A’ in the last position; and (3) regions aligned to short-RNAs with 1-mismatch, where the associated short-RNAs had no perfect matches, were not used in (1) or (2) and were 1-mismatch away from short RNAs associated with smirREGs defined in (1) or (2). These smirREGs are major contributors to each region cluster. Regions associated with perfectly matching short-RNAs and 1-mismatch short-RNAs that were not used to define smirREG clusters, but overlapped 5 or fewer such clusters were tagged as minor contributors and added to the overlapping smirREG clusters. Each cluster was pruned or divided by identifying regions corresponding to short-RNAs where more than 25% of the region is supported by no more than 50% of the observations. For each cluster, all such regions were identified and first minor then major contributors with the largest ratio of unsupported portions to total length were iteratively removed until at least 75% of every contributing region was supported by more than 50% of the observations.


One exception is represented by CU-1088 cluster where a short-RNA was included despite the fact that only 70% of its sequence contributed to the majority observation. This exception was made because this short-RNA was the only one matching a known mature miRNA (miR-320a) and it was left out by the 75% rule.


Pruned smirREGs were merged and used to construct new and possibly overlapping clusters, but minor contributors were discarded. Finally, smirREG clusters were used to define putative mature miRNAs. The mature sequence was defined as the majority nucleotide in each position supported by more than 50% of the observations, with the genomic sequence allowed to break ties. SmirREG clusters were matched based on mature sequence containment. Clusters corresponding to a short-RNA set that is fully contained in a set corresponding to a matched cluster were eliminated, and matching clusters with partial containment were merged. The process was repeated until no elimination or merging was needed. Finally, a putative precursor was identified for each smirREG cluster following the procedure described in (c) with the added restriction that no more than 5 positions in the mature sequence were allowed to dangle off of the precursor region encapsulated by complimentary base pairs. When the mature region dangled off of the precursor region (but no more than 5 bases), the precursor was extended with non complimentary bases to include the mature region.


The mature miRNA prediction was followed by the elimination of incompatible predictions. Putative miRNAs whose predicted locations overlapped loops of known miRNAs or precursors of other higher-confidence predictions that could not form mature-star pairs were eliminated. Predicted miRNAs that were entirely composed of low confidence single-observation short-RNAs that contained linker fragments were also eliminated, with the exception of CU-1293 as described in (a). Mature miRNA predictions of length shorter than 17 nt or longer than 28 nt were discarded. Candidate miRNAs that were supported by a single observation were tagged as lower confidence predictions; some of these are likely to be miRNA, but others may be degradation products of previously unannotated RNA.


e. Annotation and Filtering of Candidate miRNA and sRNA:


Putative miRNAs and short RNAs were matched to several RNA databases (see below) via regular expression scans (for sequence databases) or genomic region containment (for databases that specify genomic regions). Databases identifying validated human mRNA, tRNA, snoRNA and yRNA were used to eliminate putative miRNA candidates with one observation and to annotate putative miRNAs with multiple observations. All putative miRNAs matching rRNAs were eliminated regardless of the number of observations because of the extreme abundance of rRNAs. Other RNA databases were used for annotation purposes only.


Precursor and mature miRNA as well as sRNA sequences were aligned to several sequence databases (see below) using the BLAST and MEGABLAST programs from NCBI. Candidate miRNA precursors showing a full match to non-coding RNAs (tRNA, rRNA, snoRNA, other nc-RNA) or to mRNA were disregarded. Predicted precursor and mature miRNAs were further classified as either “known” or “new” based on whether exact matches can be found in miRBAse database. Short-RNA sequences included in miRNA precursor, but not overlapping the mature miRNA, were considered as degradation products of miRNA precursor processing and marked as “miRNA other” (Table 6). All cloned short-RNA were annotated using the databases reported below and results are showed in Table 2.









TABLE 6







Databases used for annotation of short-RNA and miRNA.









Name/

Version/


Description
Source
Date





miRBASE
Sanger Institute,
version 11.0



http://microrna.sanger.ac.uk/sequences



Human
GIRI,
version


fraction of
http://www.girinst.org/Repbase_Update.html
12.02


REPBASE




Human tRNA
EMBL, http://www.trna.uni-bayreuth.de,
September



Bayreuth Univ., Germany
2007 edition


Human
The University of Queensland, Australia,
September


snoRNA
IMB, http://imb.uq.edu.au
2006 edition


Human rRNA
NCBI, ftp://ftp.ncbi.nlm/nih.gov, compiled
Downloaded



manually
January




2007


Human yRNA
NCBI, ftp://ftp.ncbi.nlm/nih.gov, compiled
Downloaded



manually
January




2007


Non coding
Compiled manually combining resources
Compiled in


RNA
from NCBI, ftp://ftp.ncbi.nlm/nih.gov and
October



IMB http://imb.uq.edu.au
2006 using




current




ENTREZ




and




September




2006 edition




of Univ. of




Queensland




database


VECTOR
NCBI, ftp://ftp.ncbi.nlm/nih.gov
Downloaded


databases

January




2007


Human
NCBI annotation for human genome,
Downloaded


protein coding
ftp://ftp.ncbi.nlm/nih.gov
January


genes and

2007


mRNA




mRNA
NCBI, ftp://ftp.ncbi.nlm/nih.gov compiled
Downloaded


dataset
manually using ENTRES NR database
January




2007


exEID (exon
University of Toledo, Ohio,
September


subset of BID
http://hsc.utoledo.edu/bioinfo/eid/
2005


database)

(hs35p1)


Human
NCBI, ftp://ftp.ncbi.nlm/nih.gov
Downloaded


mitochondrial

January


genome

2007


Human EST
NCBI, ftp://ftp.ncbi.nlm/nih.gov



database




Human viral
NCBI, http://www.ncbi.nlm.nih.gov, viral
Compiled in


genomes
genomes section
March 2007



E. coli

NCBI, ftp://ftp.ncbi.nlm/nih.gov
Downloaded


genomes

August 2007


RefGene
UCSC, http://genome.ucsc.edu, annotation
annotation


Intron
track for hg18
track for




hg18


RefGene
UCSC, http://genome.ucsc.edu, annotation
annotation


Exon
track for hg18
track for




hg18


rnaGenes
UCSC, http://genome.ucsc.edu, annotation
annotation



track for hg18
track for




hg18


wgRNA
UCSC, http://genome.ucsc.edu, annotation
annotation



track for hg18
track for




hg18


snoRNA
EMBL, http://www-snorna.biotoul.fr/
Version 3


database




Genscan
MIT, http://genes.mit.edu/GENSCAN.html
Version 1.0


Computa-
UCSC, http://genome.ucsc.edu, annotation
annotation


tional tRNA
track for hg18
track for


prdiction

hg18


piRNA
The University of Queensland, Australia,
September



IMB, http://imb.uq.edu.au
2006 edition


Morozov
Manually curated
Version 1.0


database









f. Estimation of Mature miRNA Frequencies:


Short-RNAs may contribute to more than one miRNA. In order to compare observation frequencies of predicted miRNAs we normalized the contribution of each supporting short-RNA to its associated predicted miRNA in each library. Short-RNAs supporting a single miRNA prediction were not affected, but the level of support of short-RNAs associated with several miRNAs was prorated. The normalization procedure was performed iteratively. First, for each library, we assigned an observation frequency to each miRNA, taken to be the sum of the library-specific observations across all of its supporting short-RNAs. Second, for each short-RNA observed in this library, we computed sum, the sum of the observation frequencies of the predicted miRNAs it supports. Then, short-RNA support for each miRNA was adjusted to be the number of its observations multiplied by the ratio between the observation frequency of this miRNA and sum. Finally, the frequency of each predicted miRNA was recalculated to be the sum of the adjusted frequencies of the short-RNAs supporting it. To compare the abundance of predicted miRNAs across libraries we normalized the frequencies of observations in each library to sum to 100%, comparing frequencies of observations in each library rather than raw observations. This normalization step was necessary due to the variability between the total number of observations across libraries.


g. Clustering Short-RNAs.


Short-RNAs that did not support predicted miRNAs were categorized according to the quality of their best alignments to hg18. These short-RNA were clustered following the procedure described in (c) but with no secondary structure requirements (Table 2 and Tables 9-10). Table 3 and Tables 9-10 were constructed from perfect matches and single mismatches, respectively, as described in (c). Tables 4 and 5 smirREGs were constructed from single-mismatches, double mismatches and three or more mismatches, respectively.


Estimation of Libraries Complexity


A bootstrap technique was used to estimate the total number of miRNAs expressed in each library and the number of short-RNAs must be sequenced to achieve a complete coverage. Bootstrapping is a statistical technique for estimating properties of an “estimator” by measuring those properties in multiple subsets of the sampless3,s4. Specifically, we estimated the distribution of mature miRNAs obtained by random sub-sampling different size short-RNA libraries from each complete library. For each size N=10, 20, . . . Nt, where Nt is the total number of short-RNAs in the library, we randomly sampled 1000 libraries of size N and computed the number r(N) of inferred miRNAs, resulting in a distribution p(r(N)) for which we could compute standard statistical parameters such as average, variance, mode and median. Based on this sampling, we can extrapolate p(r(N)) for increasing values of N to determine at which point it is no longer efficient to use larger values of N to increase miRNA coverage. To achieve this, we fitted the data to the parametric function ƒ(x)=K*(1−e−mx). Since we include both experimentally confirmed and putative mature miRNAs and since bootstrapping can produce optimistic results we expect that the estimated values constitute an upper boundary on the real library complexity.


Based on this analysis, we estimated that the total numbers of mature miRNAs are: 188 (naïve), 211 (memory), 219 (centroblasts) and 225 (Ramos). Thus, the libraries sequenced in this study cover respectively 84.0% (naïve), 87.2% (memory), 86.8% (centroblasts), and 85.8% (Ramos) of the expressed miRNAs in these cellular phenotypes. FIG. 17 gives the 95% confidence intervals for p(r(N)) at each sampling point, in addition to the curve of the associated extrapolated function for each library. Clearly, the bootstrap analysis estimate of the total number of miRNA is correct only if the abundance of the miRNAs expressed in the sampled populations closely matches that of known miRNA in miRBase. This is not unreasonable if, as was done here, only miRNAs that are specific to a B cell differentiation stage or transformation are considered. Thus, this does not estimate the total number of miRNA expressed across all human cell types, stages of differentiation and neoplastic transformations, which could be several fold larger than what was estimated from the B cell clone libraries.


Orthology and Conservation Analysis


We investigated conservation of known and predicted precursor and mature human miRNA in chimp (panTro2), monkey (rheMac2), dog (canFam2) mouse (mm8) and rat (rn4). We obtained 678 miRNA precursor sequences from miRBase (v.11.0); 677 (672 unique) mature miRNAs; and 170 (167 unique) star sequences. In total, we obtained 947 locations for mature and star mirBase sequences. We predicted 926 precursors of which 146 match miRBase precursors and 780 are newly predicted. Categorizing these by their corresponding mature sequences, 762 precursors correspond to mature miRNAs that are not included in the miRBase and 164 precursors are associated with 129 predicted miRNAs that match miRBase miRNAs. Of the 762 newly predicted precursors, one overlaps with a miRBase precursor and its corresponding predicted miRNA is a candidate star sequence; 19 precursors associated with 8 mature sequences listed in miRBase database.


Here, we predicted 762 miRNA genomic locations associated with unique mature miRNA sequences not included in miRBase. Of these 762 predicted miRNA genomic locations, one overlaps with a miRBase precursor and is a candidate star sequence. We identified 164 precursors associated with 129 predicted miRNAs matching the sequence of known miRNAs; 19 of the 164 precursors, associated with 9 known miRNAs, do not match known precursors. miRNA conservation has been repeatedly used to help identify putative miRNA mappings to genomes. To identify putative ortholog miRNAs we relied on UCSC-provided Blastz pairwise alignments between human and target speciess5. We used two related but complementary methods: (1) map the mature human miRNA to its ortholog location as specified by pairwise alignment; and (2) map the precursor of the human miRNA to its ortholog location as specified by pairwise alignment, expanding the human region to include at least 80 bases from both sides of the mature region, and identifying regions in the target that match the sequence of the mature human miRNA.


Method 1 is the simplest but fails to account for alignment inaccuracies and local mutations that may shift the position of the mature sequence in the target species. Method 2 accounts for locally imperfect Blastz mapping, but relies on conservation of larger regions that may not be subject to the same selective pressure as the mature miRNA. Alignment-based mapping of the human mature miRNA to its target were required to have either perfect conservation of the entire mature miRNA sequence or conservation of seeds composed of seven bases starting from the second position of the human mature sequence followed by conservation of 3 bases starting from the 12th, 13th, or 14th position as suggested bys6. We scanned the entire mapped ortholog region for a match to the human mature sequence or to its seed.


Comparison with Previously Reported MiRNA Prediction from Short-RNA Libraries


Landgraf et al.s7 used more restrictive miRNA characterizations for mature miRNA prediction, annotation and conservation. They required that at least 60% of the observations associated with a predicted miRNA align at the 5′ end. We made no such restriction, and some of the miRNAs with the highest number of observations in our libraries, such as CU-1026 and CU-1018, are supported by a high proportion of 5′-misaligned cloned sequences. Landgraf et al. eliminated predictions that can be derived from repeat sequences by excluding precursors that contain more than 30% repetitive elements and that match hg18 more than 10 times with at least 75% identity. We annotated mature sequences that match known repeats but did not eliminate them. Finally, Landgraf et al. formulated extensive criteria for sequence orthology, requiring sequence conservation greater than 75% in multiple alignments across vertebrates and either restricting the 20 5′-most misaligned nucleotides to be transitions or requiring 100% conservation in positions 2 through 8 and 90% conservation overall. We, as described above, simply considered full conservation and seed-based conservation, mapping human to primates, dog and rodents.


REFERENCES



  • s1. Griffiths-Jones, S. miRBase: the microRNA sequence database. Methods Mol Biol 342, 129-38 (2006).

  • s2. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-4 (2006).

  • s3. Harrell, F. E. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (Springer, N.Y., 2001).

  • s4. Hinkley, A. C. D. a. D. V. Bootstrap Methods and their Applications (Cambridge University Press, New York, 1997).

  • s5. Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res 13, 103-7 (2003).

  • s6. Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105 (2007).

  • s7. Landgraf, P. et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-14 (2007).



Example 4

miRNome of Human Mature B Cells


Summary:


The full set of microRNAs (miRNAs) in the human genome is not known. Because presently known miRNAs have been identified by virtue of their abundant expression in a few cell types, many tissue-specific miRNAs remain unrevealed. To understand the role of miRNAs in B cell function and lymphomagenesis, we generated short-RNA libraries from normal human B cells at different stages of development (naïve, germinal center, memory) and from a Burkitt lymphoma cell line. A combination of cloning and computational analysis identified 178 miRNAs (miRNome) expressed in normal and/or transformed B cell libraries. Most notably, the B cell miRNome included 75 miRNAs which to our knowledge have not been previously reported and of which 66 have been validated by RNA blot and/or RT-PCR analyses. Numerous miRNAs were expressed in a stage- or transformation-specific fashion in B cells, suggesting specific functional or pathologic roles. These results provide a resource for studying the role of miRNAs in B cell development, immune function, and lymphomagenesis.


A new mechanism of post-transcriptional regulation has been revealed with the discovery of microRNAs (miRNAs), a class of short RNAs that impair translation or induce mRNA degradation by binding to the 3′ untranslated region of target mRNA ([Bartel, 2004] and [Kim, 2005]). A recent release of the miRBase database (v.11.0) ([Griffiths-Jones, 2006] and [Griffiths-Jones et al., 2006]) reports 847 human miRNAs. However, the discovery of miRNAs is still an on-going process with variable predictions about the total number of miRNAs expressed in mammalian cells ranging from one thousand to several thousands ([Bentwich et al., 2005] and [Miranda et al., 2006]). The reported miRNAs have been identified from a limited number of cell types or from tissues whose cellular heterogeneity may favor the identification of ubiquitous and abundant miRNA. In fact, a recent report aiming for the identification of miRNA expression profiles from a large panel of different mammalian tissues and cell types led to the discovery of only 12 previously unreported human miRNA (Landgraf et al., 2007). These findings led to the conclusion that most miRNAs are known and that most of them are ubiquitously expressed (Landgraf et al., 2007). Nonetheless, additional analyses of purified cell populations have led to the identification of tissue- and stage of differentiation-specific miRNAs in a few tissues, suggesting the existence of tissue-specific miRNA expression ([Calabrese et al., 2007] and [Cummins et al., 2006]).


The role of miRNAs in B lymphocyte development and B cell lymphomagenesis is largely unknown. A critical stage of the differentiation process leading to effector B cells is represented by the germinal centers (GC), the structures that develop when mature naive B cells encounter the antigen in the secondary lymphoid organs and are stimulated to proliferate and differentiate into GC centroblasts (CB). During the GC reaction, B cells undergo somatic hypermutation of their immunoglobulin-variable regions and class switch recombination. B cells that have acquired the ability to express high-affinity immunoglobulins are then positively selected and further differentiate into the final effectors of the humoral immune response, i.e., memory B cells and plasma cells (Klein and Dalla-Favera, 2008). Naive, GC, and memory B cells are also relevant targets of disease because each of these B cell subpopulations can be affected by malignant transformation leading to different types of lymphomas and leukemias ([Klein and Dalla-Favera, 2008] and [Kuppers and Dalla-Favera, 2001]).


Several initial observations suggest an important role of specific miRNAs in B cell function and malignancy. Via mouse models, miR-155 has been demonstrated to affect regulation of the GC response through modulation of cytokine production ([Rodriguez et al., 2007] and [That et al., 2007]) and by direct post-transcriptional regulation of the activation-induced cytidine deaminase (AID) ([Dorsett et al., 2008] and [Teng et al., 2008]). Recently, miR-150 has been shown to target MYB, a critical transcription factor involved in the control of B cell differentiation (Xiao et al., 2007). In B cell lymphomas, 13q31 amplification has been associated with the overexpression of the miR-17-92 cluster and its enforced expression in a murine B cell lymphoma model showed a role in accelerating tumor development (He et al., 2005). Furthermore, miR-15a and miR-16 have been implicated in the pathogenesis of B cell chronic lymphocytic leukemia (CLL) ([Calin et al., 2002] and [Calin et al., 2005]).


As a basis for a comprehensive analysis of the role of miRNAs in B cell function and lymphomagenesis, this study was aimed at identifying the miRNAs expressed (miRNome) in the human mature B cell compartment, including naive, GC, and memory B cells. By using a combination of cloning and computational analysis, we report the identification of 178 miRNAs representing the mature B cell miRNome, including 75 previously unreported miRNAs. The results show that normal B cell subpopulations are characterized by specific miRNA “signatures,” suggesting functional roles of miRNAs in B cell differentiation and transformation.


Results


Construction of Short-RNA Libraries from Human B Cell Subpopulations.


Short-RNA libraries were generated by cloning RNA fractions of 15-30 nt from human centroblasts, naive, and memory B cells purified from tonsils, as well as from the Burkitt lymphoma cell line Ramos, which is representative of malignant transformation of GC B cells. Approximately 3,500 sequences were analyzed from each library, corresponding to 13,788 total short-RNAs (2,632 nonredundant sequences). By using a bootstrap approach ([Harrell, 2001] and [Davison and Hinkley, 1997]), we estimated that more than 85% of the complexity of the libraries has been examined (FIG. 23).


Mapping of Short-RNA Sequences to the Human Genome.


The cloned sequences were subjected to a computational analysis (see Supplemental Experimental Procedures described in Example 3) summarized in the flowchart illustrated in FIG. 24. Each cloned sequence was first matched to the human genome assembly (March 2006, hg18) to retrieve the genomic regions from which the short RNAs originated. One or more genomic locations were identified for 2086 (80%) of the cloned sequences considering both perfect matches and single mismatches (FIG. 24). Consistent with previous observations, 3′-end mismatches were the most common and showed a clear preference for A in the last position (Neilson et al., 2007). The failure of 546 short-RNA sequences to align with the human genome is likely due, at least in part, to errors introduced by PCR during the cloning procedure (FIG. 24). However, a small subset of these short RNAs lacking a corresponding genomic region in Homo sapiens have been cloned with high frequencies in multiple libraries and showed differential expression during B cell differentiation, suggesting that they may represent bona fide short-RNA species, which cannot be mapped on the current reference genome probably because of polymorphisms and/or post-transcriptional modifications. However, given the difficulty of assigning genomic coordinates to these sequences, they were omitted from further analyses.


Computational Prediction of Precursor and Mature miRNAs.


In order to identify candidate miRNAs among the cloned short-RNA sequences, we developed a computational pipeline aiming at the identification of potential miRNA precursors based on the investigation of their genomic location and folding characteristics (FIG. 18 and Supplemental Experimental Procedures in Example 3). In brief, short RNA sequences were mapped to the human genome and their respective candidate genomic precursors (±90 nt) were retrieved and analyzed for secondary structure, size and energy of the loop, and number of complimentary base pairs in the stem of the loop. The prediction was performed on the full set of nonredundant short RNAs (2632 sequences) for which one or more genomic locations could be identified (FIG. 24). The analysis led to the identification of candidate miRNA precursors for 1646 short-RNA sequences, which were then clustered allowing for (1) the variability observed at the mature miRNA 3′ ends (and less dramatically at the 5′ ends) including nucleotide substitutions and deletions, and (2) the possibility of miRNA editing as previously reported ([Kawahara et al., 2007] and [Luciano et al., 2004]) (Supplemental Experimental Procedures in Example 3). After annotating each candidate mature miRNA, those which matched mRNA, rRNA, tRNA, post-transcriptionally modified t-RNA, and other ncRNA (yRNA, sn/snoRNA) sequences, and were present only once in the libraries were not considered further. The remaining sequences were still considered miRNAs based on criteria (identification of genomic loci consistent with a pre-miRNA, length, recurrence, differential expression, detection in the Ago complex) that suggest their existence as bona fide miRNA. Moreover, consistent with the miRNA length of the Homo sapiens miRBase database (v11.0), only mature candidate miRNAs of length 17-28 nt were considered.


Overall, the analysis identified 178 mature miRNAs, of which 103 were known and 75 were not previously reported, to our knowledge (Table 7 and FIG. 24). Computational prediction of precursor miRNAs (pre-miRNA) identified 114 precursors already reported to potentially code for the 103 known mature miRNA, and 274 genomic locations containing new candidate pre-miRNA associated with the 75 previously unreported and 8 known mature miRNAs (FIG. 19 and Table 7).









TABLE 7







(PART A) List of known and newly identified bona fide and candidate mature miRNAs and their


predicted precursors. Genomic locations are provided for all candidate miRNA. Frequencies


have been calculated only for bonafide miRNA


(cloned at least 2 times in the B cell libraries).











SEQ ID




ID
NO.
Mature miRNA sequence
Annotations













CU-1026
1
TGTAGTGTTTCCTACTTTATGGA
Mature:hsa-miR-142-3p:MIMAT0000434





CU-1064
2
TAGCTTATCAGACTGATGTTGA
Mature:hsa-miR-21:MIMAT0000076





CU-1061
3
TAAAGTGCTTATAGTGCAGGTAG
Mature:hsa-miR-20a:MIMAT0000075





CU-1035
4
TAGCAGCACATCATGGTTTACA
Mature:hsa-miR-15b:MIMAT0000417





CU-1037
5
TAGCAGCACGTAAATATTGGCG
Mature:hsa-miR-16:MIMAT0000069





CU-1001
6
TGAGGTAGTAGGTTGTATAGTT
Mature:hsa-let-7a:MIMAT0000062





CU-1116
7
TATTGCACTTGTCCCGGCCTGT
Mature:hsa-miR-92a:MIMAT0000092





CU-1018
8
TCCCACCGCTGCCACCA
Mature:hsa-miR-1280:MIMAT0005946





CU-1006
9
TGAGGTAGTAGATTGTATAGTT
Mature:hsa-let-7f:MIMAT0000067





CU-1079
10
TAGCACCATCTGAAATCGGTTA
Mature :hsa-miR-29a:MIMAT0000086





CU-1033
11
TAGCAGCACATAATGGTTTGT
Mature:hsa-miR-15a:MIMAT0000068





CU-1124
12
CCCATAAAGTAGAAAGCACTA
Mature:hsa-miR-142-5p:MIMAT0000433





CU-1007
13
TGAGGTAGTAGTTTGTACAGTT
Mature:hsa-let-7g:MIMAT0000414





CU-1008
14
TGAGGTAGTAGTTTGTGCTGTT
Mature:hsa-let-7i:MIMAT0000415





CU-1082
15
TAGCACCATTTGAAATCGGTTA
Mature:hsa-miR-29c:MIMAT0000681





CU-1085
16
TGTAAACATCCTACACTCTCAGC
Mature:hsa-miR-30c:MIMAT0000244





CU-1039
17
CAAAGTGCTTACAGTGCAGGTAG
Mature:hsa-miR-17:MIMAT0000070





CU-1071
18
CATTGCACTTGTCTCGGTCTGA
Mature:hsa-miR-25:MIMAT0000081





CU-1046
19
CAACGGAATCCCAAAAGCAGCTG
Mature:hsa-miR-191:MIMAT0000440





CU-1057
20
TGTGCAAATCCATGCAAAACTGA
Mature:hsa-miR-19b:MIMAT0000074





CU-1024
21
TACCACAGGGTAGAACCACGGA
Mature:hsa-miR-140-3p:MIMAT0004597





CU-1084
22
TGTAAACATCCTACACTCAGCT
Mature:hsa-miR-30b:MIMAT0000420





CU-1003
23
TGAGGTAGTAGGTTGTGTGGTT
Mature:hsa-let-7b:MIMAT0000063





CU-1080
24
TAGCACCATTTGAAATCAGTGTT
Mature:hsa-miR-29b:MIMAT0000100





CU-1012
25
TAAAGTGCTGACAGTGCAGAT
Mature:hsa-miR-106b:MIMAT0000680





CU-1092
26
TCCCTGTCCTCCAGGAGCTC
Mature:hsa-miR-339-5p:MIMAT0000764





CU-1072
27
TTCAAGTAATCCAGGATAGGCT
Mature:hsa-miR-26a:MIMAT0000082





CU-1118
28
CAAAGTGCTGTTCGTGCAGGTAG
Mature:hsa-miR-93:MIMAT0000093





CU-1067
29
TGTCAGTTTGTCAAATACCCCA
Mature:hsa-miR-223:MIMAT0000280





CU-1027
30
TGAGAACTGAATTCCATGGGTT
Mature:hsa-miR-146a:MIMAT0000449





CU-1029
31
TCTCCCAACCCTTGTACCAGT
Mature:hsa-miR-150:MIMAT0000451





CU-1015
32
TCCCTGAGACCCTAACTTGTGA
Mature:hsa-miR-125b:MIMAT0000423





CU-1093
33
TCTCACACAGAAATCGCACCCGTC
Mature:hsa-miR-342-3p:MIMAT0000753





CU-1016
34
GTCCCTGTTCGGGCGCCA
Mature:hsa-miR-1274b:MIMAT0005938





CU-1056
35
TGTGCAAATCTATGCAAAACTGA
Mature:hsa-miR-19a:MIMAT0000073





CU-1086
36
TGTAAACATCCCCGACTGGAAG
Mature:hsa-miR-30d:MIMAT0000245





CU-1065
37
AGCTACATTGTCTGCTGGGTT
Mature:hsa-miR-221:MIMAT0000278





CU-1004
38
AGAGGTAGTAGGTTGCATAGTT
Mature:hsa-let-7d:MIMAT0000065





CU-1011
39
CCGCACTGTGGGTACTTGCT
Star:hsa-miR-106b*:MIMAT0004672





CU-1010
40
AGCAGCATTGTACAGGGCTATGA
Mature:hsa-miR-103:MIMAT0000101





CU-1050
41
AACTGGCCCTCAAAGTCCCGCT
Mature:hsa-miR-193b:MIMAT0002819





CU-1091
42
GCCCCTGGGCCTATCCTAGAA
Mature:hsa-miR-331-3p:MIMAT0000760





CU-1023
43
AGCTGGTGTTGTGAATCAGGCCGT
Mature:hsa-miR-138:MIMAT0000430





CU-1101
44
TGAGGGGCAGAGAGCGAGACTT
Mature:hsa-miR-423-5p:MIMAT0004748





CU-1066
45
GCTACATCTGGCTACTGGGTCT
Mature:hsa-miR-222:MIMAT0000279





CU-1017
46
GTGGGGGAGAGGCTGTA
Mature:hsa-miR-1275:MIMAT0005929





CU-5001
47
CTATACGACCTGCTGCCTTTC
Star:hsa-let-7d*:MIMAT0004484





CU-1032
48
TTAATGCTAATCGTGATAGGGGT
Mature:hsa-miR-155:MIMAT0000646





CU-1108
49
AGGGGGAAAGTTCTATAGTC
Mature:hsa-miR-625:MIMAT0003294





CU-1055
50
ACAGTAGTCTGCACATTGGTT
Mature:hsa-miR-199b-3p:MIMAT0004563





CU-1042
51
AACATTCAACGCTGTCGGTGAGTT
Mature:hsa-miR-181a:MIMAT0000256





CU-1113
52
TGGAAGACTAGTGATTTTGTTGT
Mature:hsa-miR-7:MIMAT0000252





CU-1098
53
TAATGCCCCTAAAAATCCTTAT
Mature:hsa-miR-365:MIMAT0000710





CU-1052
54
TAGCAGCACAGAAATATTGGCA
Mature:hsa-miR-195:MIMAT0000461





CU-1568
55
TGAGGTAGTAGGTTGTAT
Mature:hsa-let-7c:MIMAT0000064





CU-1103
56
TCCTGTACTGAGCTGCCCCGAG
Mature:hsa-miR-486-5p:MIMAT0002177





CU-1014
57
TCCCTGAGACCCTTTAACCTGTGA
Mature:hsa-miR-125a-5p:MIMAT0000443





CU-1068
58
ATCACATTGCCAGGGATTTCCA
Mature:hsa-miR-23a:MIMAT0000078





CU-1019
59
TCACAGTGAACCGGTCTCTTT
Mature:hsa-miR-128:MIMAT0000424





CU-1076
60
CACTAGATTGTGAGCTCCTGGA
Mature:hsa-miR-28-3p:MIMAT0004502





CU-1111
61
CAACAAATCACAGTCTGCCAT
Star:hsa-miR-7-1*:MIMAT0004553





CU-1062
62
CAAAGTGCTTATAGTGCAGGTAG
Mature:hsa-miR-20b-mm:MIMAT0001413





CU-1115
63
AGGTTGGGATCGGTTGCAATGCT
Star:hsa-miR-92a-1*:MIMAT0004507





CU-1126
64
ACATTCATTGCTGTCGGTGGGTT
Mature:hsa-miR-181b:MI0000270





CU-5016
1093
AATGACACGATCACTCCCGTTGAG
Mature :hsa-miR-425:MIMAT0003393





CU-1096
65
TCCCCCAGGTGTGATTCTGATT
Mature:hsa-miR-361-3p:MIMAT0004682





CU-1054
66
CCCAGTGTTCAGACTACCTGTTC
Mature:hsa-miR-199a-5p:MIMAT0000231





CU-1112
67
ACCAATATTACTGTGCTGCTT
Star:hsa-miR-16-2*:MIMAT0004518





CU-1087
68
TGTAAACATCCTTGACTGGAAGCT
Mature:hsa-miR-30e:MIMAT0000692





CU-1045
69
TAAGGTGCATCTAGTGCAGATA
Mature:hsa-miR-18a:MIMAT0000072





CU-1069
70
ATCACATTGCCAGGGATTACCA
Mature:hsa-miR-23b:MIMAT0000418





CU-1044
71
ACTGCCCTAAGTGCTCCTTCTG
Star:hsa-miR-18a*:MIMAT0002891





CU-1083
72
TGTAAACATCCTCGACTGGA
Mature:hsa-miR-30a:MIMAT0000087





CU-1009
73
TACAGTACTGTGATAACTGAAG
Mature:hsa-miR-101:MIMAT0000099





CU-1030
74
CTAGACTGAAGCTCCTTGAGG
Mature:hsa-miR-151-3p:MIMAT0000757





CU-1088
1094
TGGGTTGAGAGGGCGAA
Mature:hsa-miR-320a:MIMAT0000510





CU-1095
75
TGGCAGTGTCTTAGCTGGTTGTT
Mature:hsa-miR-34a:MIMAT0000255





CU-1119
76
TGAGGTAGTAAGTTGTATTGTT
Mature:hsa-miR-98:MIMAT0000096





CU-1028
77
TGAGAACTGAATTCCATAGGCTGT
Mature:hsa-miR-146b-5p:MIMAT0002809





CU-1031
78
TCGAGGAGCTCACAGTCTAGTA
Mature:hsa-miR-151-5p:MIMAT0004697





CU-1100
79
AGCTCGGTCTGAGGCCCCTCAG
Mature:hsa-miR-423-3p:MIMAT0001340





CU-1038
80
ACTGCAGTGAAGGCACTTGTAG
Star:hsa-miR-17*:MIMAT0000071





CU-1040
81
ACCATCGACCGTTGATTGTA
Star:hsa-miR-181a*:MIMAT0000270





CU-1053
82
TTCACCACCTTCTCCACCCAG
Mature:hsa-miR-197:MIMAT0000227





CU-1075
83
TTCACAGTGGCTAAGTTCTG
Mature:hsa-miR-27b:MIMAT0000419





CU-1073
84
TTCAAGTAATTCAGGATAGGTT
Mature:hsa-miR-26b:MIMAT0000083





CU-1110
85
TGGGTTTACGTTGGGAGAACT
Mature:hsa-miR-629:MIMAT0004810





CU-1005
87
TGAGGTAGGAGGTTGTATAGTT
Mature:hsa-let-7e:MIMAT0000066





CU-1081
88
TGACCGATTTCTCCTGGTGTT
Star:hsa-miR-29c*:MIMAT0004673





CU-1117
89
TATTGCACTCGTCCCGGCC
Mature:hsa-miR-92b:MIMAT0003218





CU-1094
90
GGGGTGCTATCTGTGATTGA
Mature:hsa-miR-342-5p:MIMAT0004694





CU-1021
91
GCATGGGTGGTTCAGTGGTAGAA
Mature:hsa-miR-1308:MIMAT0005947





CU-1089
92
CTGGCCCTCTCTGCCCTT
Mature:hsa-miR-328:MIMAT0000752





CU-1047
93
CTGACCTATGAATTGACAGC
Mature:hsa-miR-192:MIMAT0000222





CU-1099
94
CTCCTGACTCCAGGTCCTGTG
Star:hsa-miR-378*:MIMAT0000731





CU-1105
95
CGTCAACACTTGCTGGTT
Mature:hsa-miR-505:MIMAT0002876





CU-1034
96
CGAATCATTATTTGCTGCTCT
Star:hsa-miR-15b*:MIMAT0004586





CU-5002
97
CATCGGGAATGTCGTGTCCGCC
Star:hsa-miR-425*:MI0001448





CU-1025
98
CAGTGGTTTTACCCTATGGTA
Mature:hsa-miR-140-5p:MIMAT0000431





CU-1022
99
CAGTGCAATGATGAAAGGGCAT
Mature:hsa-miR-130b:MIMAT0000691





CU-1104
100
CAGCAGCACACTGTGGTTTGT
Mature:hsa-miR-497:MIMAT0002820





CU-1106
101
CACGCTCATGCACACACCCAC
Mature:hsa-miR-574-3p:MIMAT0003239





CU-1077
102
AAGGAGCTCACAGTCTATTGAG
Mature:hsa-miR-28-5p:MIMAT0000085





CU-1132
131
GCCGGGTACTTTCGTATTTT
NEW





CU-1137
132
GCTAAGGAAGTCCTGTGCTCAGTTTT
NEW





CU-1178
148
GGGTGTGCGTGTTTTT
NEW





CU-1164
150
GAGAGCGCTCGGTTTTT
NEW





CU-1148
151
TGGTGTGGTCTGTTGTTTT
NEW





CU-1221
152
TGTGCTCCGGAGTTACCTCGTTT
NEW





CU-1180
155
AACCGAGCGTCCAAGCTCTTTCCATTTT
NEW





CU-1155
156
TCCCCGCACCTCCACCA
NEW





CU-1175
162
GGCGTGATTCATACCTTTT
NEW





CU-1197
169
TGTGGTGGCTTACTTTT
NEW





CU-1146
172
AGAAAGGCCGAATTTTA
NEW





CU-1212
157
TCCCCGGCACTTCCACCA
NEW





CU-1251
232
CCCACCCAGGGACGCCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1254
233
TCCCCGGCACCTCCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1298
234
ATCCCGGACGAGCCCCCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1153
142
CCCCCCACTGCTAAATTTGACTGGCTT
refseqGeneIntron-annotate;rnaGene-annotate





CU-1276
236
TCGATTCCCGGCCAATGCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1303
237
ATCCCACTTCTGACACCA
computGene-annotate;refseqGeneIntron-annotate;





tRNAprefix-annotate





CU-1242
239
TCCCCGTACGGGCCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1241
243
AGTCCCATCTGGGTCGCCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1575
244
CCCCCCACTGCTAAATTTGACTGGA
refseqGeneIntron-annotate;rnaGene-annotate





CU-1243
246
GTCCCTTCGTGGTCGCCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1300
248
TCCTCACACGGGGCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1278
249
TAACGGCCGCGGTACCC
refseqGeneIntron-annotate





CU-1264
250
GAGGGGGACCAAAAAAAA
refseqGeneIntron-annotate





CU-1130
133
CCCGGGTTTCGGCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1380
354
TAGGTTTGGTCCTAGCCTTTCT
piRNA-annotate;refseqGeneIntron-annotate





CU-1246
252
GGGGGGTAAAAAAAAA
refseqGeneIntron-annotate





CU-1277
254
GAGCCATGATGATACCACTGAGC
refseqGeneIntron-annotate





CU-1345
257
AGAACACTACGAGCCACA
mRNA-annotate;refseqGeneIntron-annotate





CU-1352
258
ACCCCACTTCTGGTACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1324
260
TCTCGGTGGAACCTCCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1269
262
TACCGAGCCTGGTGATAGC
refseqGeneIntron-annotate





CU-1281
263
GCAGCGCCAGCCTCCCGCCCTAC
refseqGeneIntron-annotate





CU-1339
265
ATCCCCAGCACCTCCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1293
266
AGCAGTGATGTCCTGAAAATTCTGAAG
refseqGeneIntron-annotate





CU-1307
267
ACCCCACTATGCTTAGCCCT
mRNA-annotate;refseqGeneIntron-annotate





CU-1294
268
AAAGGACCTGGCGGTGCTTC
mRNA-annotate;refseqGeneIntron-annotate





CU-1369
350
TCCCCGGCATCTCCACCA
coMputGene-annotate;tRNAprefix-annotate





CU-1191
143
GCCCGCATCCTCCACCA
tRNAprefix-annotate;





CU-1173
145
ATCCCACTCCTGACACCA
tRNAprefix-annotate;





CU-1142
149
TCGATTCCCGGCCCATGCACCA
tRNAprefix-annotate;





CU-1186
153
TCCCCGACACCTCCACCA
tRNAprefix-annotate;





CU-1371
352
TCTAGAGGAGCCTGTTCTGTA
mRNA-annotate





CU-1381
353
TCGATTCCCGGTCAGGGAACCA
repeats-annotate;tRNAprefix-annotate





CU-1213
158
TCACCCCATAAACACCA
tRNAprefix-annotate;





CU-1363
355
CGTTCGCGCTTTCCCCTG
rnaGene-annotate





CU-1220
161
TTCCCCGACGGGGAGCCA
tRNAprefix-annotate;





CU-1396
356
TAAGTGTTTGTGGGTTA
rnaGene-annotate





CU-1570
171
ATCCCCAGCATCTCCACCA
tRNAprefix-annotate;





CU-1524
368
CCCCCACAACCGCGCTTGACTAGC
mRNAall-annotate;yRNA-eliminate;rnaGene-annotate





CU-1453
369
CCCTGCTCGCTGCGCCA
tRNAprefix-annotate;refseqGeneExon-eliminate;





CU-1477
370
CTCCCACTGCTTCACTTGACTAGC
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1222
372
TCACGTCGGGGTCACCA
refseqGeneExon-eliminate;tRNAprefix-annotate





CU-1388
373
TCCCTGGTGGTCTAGTGGTTAGGATTCG
tRNAcomputational-annotate;refseqGeneIntron-





annotate;rnaGene-annotate;HStRNA-eliminate;





piRNA-annotate





CU-1488
375
TCCTGCCGCGGTCGCCA
tRNAprefix-annotate;refseqGeneExon-eliminate;





CU-1557
376
GGAGAGAACGCGGTCTGAGTGGT
snoRNA-eliminate;wgRNA-annotate;rnaGene-annotate





CU-1379
377
TCGGGTGCGAGAGGTCCCGGGT
tRNAcomputational-annotate;HStRNA-eliminate;





rnaGene-annotate





CU-1542
378
GGCTGGTCCGATGGTAGTGGGTT
mRNAall-annotate;yRNA-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-1550
379
CGGAAGCGTGCTGGGCCC
tRNAcomputational-annotate;tRNA-eliminate;





rnaGene-annotate;HStRNA-eliminate;piRNA-annotate





CU-1513
381
GCGGGTGATGCGAACTGGAGTCTGAGC
computGene-annotate;snoRNA-annotate;





refseqGeneExon-eliminate;rnaGene-annotate;





snoRNA-eliminate;wgRNA-annotate





CU-1368
382
GACGAGGTGGCCGAGTGG
tRNAcomputational-annotate;rnaGene-annotate;





HStRNA-eliminate;piRNA-annotate





CU-1370
351
CTGATTGCTCCTGTCTGATT
mRNAall-annotate;refseqGeneExon-eliminate;wgRNA-





annotate;exEID-annotate;rnaGene-annotate





CU-1470
384
CTCCTGGCTGGCTCGCCA
mRNAall-annotate;computGene-annotate;exEID-





annotate;tRNAprefix-annotate;refseqGeneIntron-





annotate;refseqGeneExon-eliminate;





CU-1538
386
GGCTGGTCCGAGTGCAGTGGTGTTTA
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1486
387
CTGCTGTGATGACATTC
computGene-annotate;snoRNA-annotate;





refseqGeneExon-eliminate;rnaGene-annotate;





snoRNA-eliminate;wgRNA-annotate





CU-1382
389
TCCTCGTTAGTATAGTGGTGAGTATCCC
tRNAcomputational-annotate;rnaGene-annotate;





HStRNA-eliminate;piRNA-annotate





CU-1403
391
GCATTGGTGGTTCAGTGGTAGA
rnaGene-annotate;tRNAcomputational-annotate;





piRNA-annotate;tRNA-eliminate;refseqGeneIntron-





annotate;HStRNA-eliminate





CU-1457
395
TTCTCACTACTGCACTTGACTA
mRNAall-annotate;exEID-annotate;yRNA-eliminate;





rnaGene-annotate;refseqGeneIntron-annotate;





refseqGeneExon-eliminate





CU-1440
396
TGGTTATCACGTTCGCC
tRNAcomputational-annotate;tRNA-eliminate;





rnaGene-annotate;HStRNA-eliminate;piRNA-annotate





CU-1528
397
TAGGGGTATGATTCTCGCT
tRNAcomputational-annotate;tRNA-eliminate;





HStRNA-eliminate;rnaGene-annotate





CU-1288
255
CGTCCATGATGTTCCGCAA
mRNAall-annotate;snoRNA-annotate;





refseqGeneIntron-annotate;refseqGeneExon-





eliminate;piRNA-annotate;wgRNA-annotate





CU-1545
398
CCACGAGGAAGAGAGGTAGC
snoRNA-annotate;snoRNA-eliminate;wgRNA-annotate;





rnaGene-annotate





CU-1323
259
TGTATTGTGAGACATTC
mRNAall-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;wgRNA-annotate;





rnaGene-annotate





CU-1244
399
GTCAGGATGGCCGAGCGGTCT
tRNAcomputational-annotate;rnaGene-annotate;





HStRNA-eliminate;refseqGeneIntron-annotate










Candidate miRNAs observed only once in any of the four libraries










CU-1123
103
TTGGTCCCCTTCAACCAGCTGT
Mature:hsa-miR-133a:MIMAT0000427





CU-1074
104
TTCACAGTGGCTAAGTTCCGA
Mature:hsa-miR-27a:MIMAT0000084





CU-1097
105
TTATCAGAATCTCCAGGGGTAA
Mature:hsa-miR-361-5p:MIMAT0000703





CU-1043
106
TGGAGAGAAAGGCAGTTCCTGAT
Mature:hsa-miR-185:MIMAT0000455





CU-1112
107
TGAGACCTCTGGGTTCTGAGCT
Mature:hsa-miR-769-5p:MIMAT0003886





CU-1122
108
TCTTTGGTTATCTAGCTGTATGA
Mature:hsa-miR-9:MIMAT0000441





CU-1109
109
TCTAGTAAGAGTGGCAGTCGA
Mature:hsa-miR-628-3p:MIMAT0003297





CU-1090
110
TATTGCACATTACTAAGTTGA
Mature:hsa-miR-32:MIMAT0000090





CU-1013
111
TAAGGCACGCGGTGAATGCCA
Mature:hsa-miR-124:MIMAT0000422





CU-1058
112
TAACACTGTCTGGTAACGATGTT
Mature:hsa-miR-200a:MIMAT0000682





CU-1059
113
GTGAAATGTTTAGGACCACTAG
Mature:hsa-miR-203:MIMAT0000264





CU-1102
114
GCAGTCCATGGGCATATACACA
Mature:hsa-miR-455-3p:MIMAT0004784





CU-1107
115
GAGCTTATTCATAAAAGTGCAG
Mature:hsa-miR-590-5p:MIMAT0003258





CU-1114
116
CTGCCCTGGCCCGAGGGACCGA
Mature:hsa-miR-874:MIMAT0004911





CU-1002
117
CTATACAACCTACTGCCTTC
Star:hsa-let-7b*:MIMAT0004482





CU-1049
118
CGGGGTTTTGAGGGCGAGATGA
Star:hsa-miR-193b*:MIMAT0004767





CU-1051
119
CCAGTGGGGCTGCTGTTATCTG
Star:hsa-miR-194*:MIMAT0004671





CU-1036
120
CCAGTATTAACTGTGCTGCTGA
Star:hsa-miR-16-1*:MIMAT0004489





CU-1121
121
CACCCGTAGAACCGACCTTGCG
Mature:hsa-miR-99b:MIMAT0000689





CU-1120
122
CAAGCTCGTGTCTGTGGGTCCG
Star:hsa-miR-99b*:MIMAT0004678





CU-1063
123
CAACACCAGTCGATGGGCTGTA
Star:hsa-miR-21*:MIMAT0004494





CU-1070
124
AGGCGGAGACTTGGGCAATT
Star:hsa-miR-25*:MIMAT0004498





CU-1060
125
ACTGCATTATGAGCACTTAAAGT
Star:hsa-miR-20a*:MIMAT0004493





CU-1078
126
ACTGATTTCTTTTGGTGTTCA
Star:hsa-miR-29a*:MIMAT0004503





CU-1020
127
ACTCGGCGTGGCGTCGGTCGTGG
Mature:hsa-miR-1307:MIMAT0005951





CU-1041
128
ACCACTGACCGTTGACTGTAC
Star:hsa-miR-181a-2*:MIMAT0004558





CU-1048
129
AACTGGCCTACAAAGTCCCAGT
Mature:hsa-miR-193a-3p:MIMAT0000459





CU-1136
134
TCGGGCGGGAGTGGTGGCTTT
NEW





CU-1383
135
TAGAGGCACCGCCTGCCCA
NEW





CU-1131
136
CGGGGCGCGGCCTCGCTG
NEW





CU-1135
137
CCCACGGGGGTCTCCGGGCGAG
NEW





CU-1133
139
CAGCCCGGCCTGGCTCCTCCAT
NEW





CU-1134
140
CACGGAAGGTGGCCCGG
NEW





CU-1160
174
TGTCAGTTTGAACCCAA
NEW





CU-1189
175
TGTAGTGTTTCTTACTTTA
NEW





CU-1219
176
TGGCGAAGGTCGGCCGCG
NEW





CU-1190
179
TCGGCTTTCCCTGCTAACTGGGCTTTTT
NEW





CU-1144
180
TCAGAGCGCGGGCCGACCCC
NEW





CU-1384
183
TAACCCCAGGGTTGGTCA
NEW





CU-1171
185
GGGCGTGGGTGTGATGATTC
NEW





CU-1199
186
GGGAGGTGAGTAGGTCTG
NEW





CU-1226
187
GGAGACGTGGCCGAGAG
NEW





CU-1572
188
GCGGAATACCACGGGGA
NEW





CU-1151
189
GCAGGCGGGGGATTAGCTA
NEW





CU-1227
190
GCAGCGGAACGTCGGCGCGC
NEW





CU-1152
192
CTTGGACTAACCTGGTGTA
NEW





CU-1207
197
CGGTGGAACCTGCATTGGTTT
NEW





CU-1181
198
CGGGGCCGGGGCTAGGGT
NEW





CU-1185
199
CGGGCCGCCCCCGCCCACCG
NEW





CU-1366
201
CGGCCTATCCGGAATGCCCC
NEW





CU-1145
203
CGCGGCCAGTGTCCCCTTGTA
NEW





CU-1201
204
CGACACACGGCCCGTGGCGC
NEW





CU-1172
206
CCTCACTGGGGGCTCCA
NEW





CU-1217
210
CCGCCCCGACCTTAGCTA NEW
NEW





CU-1177
214
CCCCGGCATCTCCATCA
NEW





CU-1360
218
CCACCCTGGAGCCTCCGT
NEW





CU-1179
221
ATGGCCTGGACCCCACTCCT
NEW





CU-1161
222
ATGGCCGCATATATTTT
NEW





CU-1168
225
AGCGAGGGTTCCGCCGGCC
NEW





CU-1195
226
ACTGGGGAGGGGGAGGAGCCTCGAGG
NEW





CU-1215
227
ACCCCGAGGGGACGGGCG
NEW





CU-1208
228
ACAGCGCTGTGTTCCCGT
NEW





CU-1373
230
AACTAAAACCCCTACGCA
NEW





CU-1196
231
AAAGGAGCCGAATCTTT
NEW





CU-1204
224
ATCCTGCTCACAGCCCCA
NEW





CU-1325
269
TTTGCCACACTGCAACACCTT
refseqGeneIntron-annotate





CU-1310
271
TTAAACCACCAAGATCGCTGATGCAC
refseqGeneIntron-annotate





CU-1299
272
TGTTCGCCGACCGTTGA
refseqGeneIntron-annotate





CU-1165
173
TGTCAGTTTTTACCCAA
refseqGeneIntron-annotate





CU-1322
274
TGGGAGAGCAGGGTATTGT
refseqGeneIntron-annotate





CU-1203
177
TGCAGGGCCGGCGGGGAGG
refseqGeneIntron-annotate





CU-1308
277
TCCGAAAGGCCTCCCGCACCG
refseqGeneIntron-annotate





CU-1376
181
TCAACACCCACTCCCTC
refseqGeneIntron-annotate





CU-1138
182
TATCAATGATGCTTCTGAGA
refseqGeneIntron-annotate





CU-1297
279
TAGATGAATAGGTAAAGAG
refseqGeneIntron-annotate





CU-1235
280
GTGTATGATGACCTCATGTAGCCTGAAC
refseqGeneIntron-annotate





CU-1253
281
GTGAAGCGTTCCATATTTTT
mRNAall-annotate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1337
283
GGGGGGAGGGAAGGCAA
refseqGeneIntron-annotate





CU-1316
284
GGGGGCTGGGCTGGGTA
refseqGeneIntron-annotate





CU-1343
285
GGGGCCGCCGCCTGTGT
refseqGeneIntron-annotate





CU-1326
286
GGGAGTCCGCGGCGAGC
refseqGeneIntron-annotate





CU-1286
288
GGCTTGGTCTAGGGGTA
refseqGeneIntron-annotate





CU-1332
289
GGCTGGGACCCTGGACAC
refseqGeneIntron-annotate





CU-1262
290
GGCGACCTGCGACTCCTT
refseqGeneIntron-annotate





CU-1317
292
GGAGGGGGGAAAAAAAAAA
computGene-annotate;refseqGeneIntron-annotate





CU-1266
295
GCCGGGCGTGGTGGTCTG
refseqGeneIntron-annotate





CU-1261
296
GCCGCCGAGACCCCAGGACCC
refseqGeneIntron-annotate





CU-1259
298
GCAAATGATGCCCTCTGATC
refseqGeneIntron-annotate





CU-1349
299
GAGGGGGGTCAAAAAAA
refseqGeneIntron-annotate





CU-1272
300
CTTGATGATGAGCAGGATCTGAGT
refseqGeneIntron-annotate





CU-1313
303
CTGCTTAAGTCCTGACCAG
refseqGeneIntron-annotate





CU-1157
196
CTGATGTTGATGCATATGATGACA
refseqGeneIntron-annotate





CU-1296
304
CTGAGCACCTTTCCCTTCC
refseqGeneIntron-annotate





CU-1245
306
CGGTCACACGATTAACCCA
mRNA-annotate;refseqGeneIntron-annotate





CU-1319
310
CGGGAGTGGGGTGGCGCCCAG
refseqGeneIntron-annotate





CU-1569
312
CGGACCTGATAAATTCCCAC
refseqGeneIntron-annotate





CU-1351
316
CCTTCCTTGGATGTCTGAGTGAG
mRNAall-annotate;refseqGeneIntron-annotate;





wgRNA-annotate;rnaGene-annotate





CU-1354
317
CCTCGCTGGGGCCTCCA
tRNAprefix-annotate;refseqGeneIntron-annotate;





CU-1228
321
CCGCCCGTCACCCTCCTCAAGTA
mRNA-annotate;refseqGeneIntron-annotate





CU-1271
323
CCCGCGGGCTTGCTGGGCGTCCC
refseqGeneIntron-annotate





CU-1166
213
CCCCGGCCCATGCACCA
refseqGeneIntron-annotate;





CU-1285
325
CCCCGGCATCTCCACTA
refseqGeneIntron-annotate





CU-1571
326
CCCCAGTGAGTGCCCTCTTCC
refseqGeneIntron-annotate





CU-1353
327
CCCAGAGACGCCGTCCTCGA
refseqGeneIntron-annotate





CU-1347
330
CCACTCCAGCCTAGCCCC
refseqGeneIntron-annotate





CU-1295
331
CAGTACAGGCACACCTC
refseqGeneIntron-annotate





CU-1250
333
CACGATTAACCCAAGTC
mRNA-annotate;refseqGeneIntron-annotate





CU-1311
337
ATACCATGATGAACAATAGCTGAGA
refseqGeneIntron-annotate





CU-1350
339
AGGCTGTGATGGACCTGGCTGAGCCTG
refseqGeneIntron-annotate





CU-1252
340
AGAGAGTAGGGGGAGGT
refseqGeneIntron-annotate





CU-1334
341
ACTGTCCCTGTCTACTA
refseqGeneIntron-annotate





CU-1340
342
ACCGCATCTGGCCTATTTTT
refseqGeneIntron-annotate





CU-1342
343
ACCAGACCTCCTGTGCGAAG
refseqGeneIntron-annotate





CU-1304
344
ACAGCCCGGATCCCAGCCCACTTA
refseqGeneIntron-annotate





CU-1230
345
ACACTGAGCCACAACCCA
refseqGeneIntron-annotate





CU-1192
229
ACAAAAAAAAAAGCCCAACCCT
refseqGeneIntron-annotate





CU-1312
346
AAGGGCTTGGCTTAATTA
refseqGeneIntron-annotate





CU-1255
347
AACCCGGAAGGCGGAGGTTGCGG
computGene-annotate;refseqGeneIntron-annotate





CU-1346
349
CAAAAGCTTCTTTGACGTCCCATCCAC
refseqGeneIntron-annotate





CU-1573
359
TGCCGTGATCGTATAGTGGTTA
piRNA-annotate





CU-1395
362
CTGACAGCCGGGGTTTTGGA
computGene-annotate





CU-1365
363
CGGCGGGGCCTGGAGTCTG
mRNAall-annotate;computGene-annotate;exEID-





annotate





CU-1375
364
CCTGGCTCGCTGCGCCA
computGene-annotate





CU-1209
207
CCTCACCTGGAGCACCA
tRNAprefix-annotate;





CU-1174
366
CCCGAACGCTGCCAACCC
exEID-annotate





CU-1214
215
CCCCAGTACCTCCACCA
tRNAprefix-annotate;





CU-1218
223
ATCCTGTTCGTGACGCCA
tRNAprefix-annotate;





CU-1385
367
AGACCCGCGGGCGCTCTCCAGTC
rnaGene-annotate
















TABLE 7







(PART B) List of known and newly identified bona fide and candidate mature miRNAs. Counts and


annotations are provided for all candidate miRNA. Frequencies have been calculated only for


bona fide miRNA (cloned at least 2 times in the B cell libraries).










Corrected Counts
Frequencies

















SEQ ID
Naïve
Memory
Centroblasts
Ramos
Naïve
Memory
Centroblasts
Ramos


Mature miRNA sequence
NO.
(N)
(M)
(CB)
(RA)
(N)
(M)
(CB)
(RA)



















TGTAGTGTTTCCTACTTTATGGA
1
1329
592
635
391
40.74
21.5
25.55
17.89





TAGCTTATCAGACTGATGTTGA
2
196
353
144
13
6.01
12.82
5.79
0.59





TAAAGTGCTTATAGTGCAGGTAG
3
54
19
49.82
257.89
1.66
0.69
2
11.8





TAGCAGCACATCATGGTTTACA
4
38
61
176.84
105
1.16
2.21
7.12
4.8





TAGCAGCACGTAAATATTGGCG
5
131
97
53
35
4.02
3.52
2.13
1.6





TGAGGTAGTAGGTTGTATAGTT
6
62.84
78.99
92.19
63.25
1.93
2.87
3.71
2.89





TATTGCACTTGTCCCGGCCTGT
7
17
21
46
207
0.52
0.76
1.85
9.47





TCCCACCGCTGCCACCA
8
68
97
25
28
2.08
3.52
1.01
1.28





TGAGGTAGTAGATTGTATAGTT
9
41.28
44
64
51.38
1.27
1.6
2.58
2.35





TAGCACCATCTGAAATCGGTTA
10
78
60
42
22
2.39
2.18
1.69
1.01





TAGCAGCACATAATGGTTTGT
11
90
39
32.16
8
2.76
1.42
1.29
0.37





CCCATAAAGTAGAAAGCACTA
12
88
53
7
10
2.7
1.92
0.28
0.46





TGAGGTAGTAGTTTGTACAGTT
13
41.28
47
30.77
21.16
1.27
1.71
1.24
0.97





TGAGGTAGTAGTTTGTGCTGTT
14
23
24
32
42
0.71
0.87
1.29
1.92





TAGCACCATTTGAAATCGGTTA
15
44
41
16
1
1.35
1.49
0.64
0.05





TGTAAACATCCTACACTCTCAGC
16
27
25
26
20
0.83
0.91
1.05
0.91





CAAAGTGCTTACAGTGCAGGTAG
17
9
6
10.18
65.04
0.28
0.22
0.41
2.98





CATTGCACTTGTCTCGGTCTGA
18
11
9
34
39
0.34
0.33
1.37
1.78





CAACGGAATCCCAAAAGCAGCTG
19
17
21
36
18
0.52
0.76
1.45
0.82





TGTGCAAATCCATGCAAAACTGA
20
0
1
25
65
0
0.04
1.01
2.97





TACCACAGGGTAGAACCACGGA
21
31
22
17
21
0.95
0.8
0.68
0.96





TGTAAACATCCTACACTCAGCT
22
31
11
27
16
0.95
0.4
1.09
0.73





TGAGGTAGTAGGTTGTGTGGTT
23
19.48
19
29
5.08
0.6
0.69
1.17
0.23





TAGCACCATTTGAAATCAGTGTT
24
22
14
12
4
0.67
0.51
0.48
0.18





TAAAGTGCTGACAGTGCAGAT
25
7
6
13
26
0.21
0.22
0.52
1.19





TCCCTGTCCTCCAGGAGCTC
26
6
3
3
32
0.18
0.11
0.12
1.46





TTCAAGTAATCCAGGATAGGCT
27
2
8
13
16
0.06
0.29
0.52
0.73





CAAAGTGCTGTTCGTGCAGGTAG
28
9
2
13
14
0.28
0.07
0.52
0.64





TGTCAGTTTGTCAAATACCCCA
29
25
10
1
0
0.77
0.36
0.04
0





TGAGAACTGAATTCCATGGGTT
30
4
7
21
4
0.12
0.25
0.85
0.18





TCTCCCAACCCTTGTACCAGT
31
12
18
2
0
0.37
0.65
0.08
0





TCCCTGAGACCCTAACTTGTGA
32
0
1
28
2
0
0.04
1.13
0.09





TCTCACACAGAAATCGCACCCGTC
33
10
8
8
3
0.31
0.29
0.32
0.14





GTCCCTGTTCGGGCGCCA
34
12
10
6
1
0.37
0.36
0.24
0.05





TGTGCAAATCTATGCAAAACTGA
35
0
0
9
19
0
0
0.36
0.87





TGTAAACATCCCCGACTGGAAG
36
7
3
14
3
0.21
0.11
0.56
0.14





AGCTACATTGTCTGCTGGGTT
37
17
6
4
0
0.52
0.22
0.16
0





AGAGGTAGTAGGTTGCATAGTT
38
2
4
10
10
0.06
0.15
0.4
0.46





CCGCACTGTGGGTACTTGCT
39
8
6
2
8
0.25
0.22
0.08
0.37





AGCAGCATTGTACAGGGCTATGA
40
1
1
10
11
0.03
0.04
0.4
0.5





AACTGGCCCTCAAAGTCCCGCT
41
0
0
2
21
0
0
0.08
0.96





GCCCCTGGGCCTATCCTAGAA
42
1
0
10
10
0.03
0
0.4
0.46





AGCTGGTGTTGTGAATCAGGCCGT
43
0
0
15
5
0
0
0.6
0.23





TGAGGGGCAGAGAGCGAGACTT
44
5
1
7
4
0.15
0.04
0.28
0.18





AGCTACATCTGGCTACTGGGTCT
45
6
6
5
0
0.18
0.22
0.2
0





GTGGGGGAGAGGCTGTA
46
2
6
3
5
0.06
0.22
0.12
0.23





CTATACGACCTGCTGCCTTTC
47
6
3
4
1
0.18
0.11
0.16
0.05





TTAATGCTAATCGTGATAGGGGT
48
3
4
5
1
0.09
0.15
0.2
0.05





AGGGGGAAAGTTCTATAGTC
49
0
2
0
11
0
0.07
0
0.5





ACAGTAGTCTGCACATTGGTT
50
0
0
13
0
0
0
0.52
0





AACATTCAACGCTGTCGGTGAGTT
51
0
0
7
6
0
0
0.28
0.27





TGGAAGACTAGTGATTTTGTTGT
52
1
1
1
8
0.03
0.04
0.04
0.37





TAATGCCCCTAAAAATCCTTAT
53
0
0
6
4
0
0
0.24
0.18





TAGCAGCACAGAAATATTGGCA
54
4
0
5
0
0.12
0
0.2
0





TGAGGTAGTAGGTTGTAT
55
0.11
0.01
0.01
0.13
0
0
0
0.01





TCCTGTACTGAGCTGCCCCGAG
56
0
0
7
1
0
0
0.28
0.05





TCCCTGAGACCCTTTAACCTGTGA
57
0
0
8
0
0
0
0.32
0





ATCACATTGCCAGGGATTTCCA
58
0
0.5
7
0
0
0.02
0.28
0





TCACAGTGAACCGGTCTCTTT
59
1
0
0
6
0.03
0
0
0.27





CACTAGATTGTGAGCTCCTGGA
60
2
0
4
1
0.06
0
0.16
0.05





CAACAAATCACAGTCTGCCAT
61
3
0
1
3
0.09
0
0.04
0.14





CAAAGTGCTTATAGTGCAGGTAG
62
0
1
1
0.08
0
0.04
0.04
0





AGGTTGGGATCGGTTGCAATGCT
63
0
0
0
7
0
0
0
0.32





ACATTCATTGCTGTCGGTGGGTT
64
0
0
1
6
0
0
0.04
0.27





AATGACACGATCACTCCCGTTGAG
1095
0
0
7
0
0
0
0.28
0





TCCCCCAGGTGTGATTCTGATT
65
4
1
0
1
0.12
0.04
0
0.05





CCCAGTGTTCAGACTACCTGTTC
66
0
0
6
0
0
0
0.24
0





ACCAATATTACTGTGCTGCTT
67
1
1
2
2
0.03
0.04
0.08
0.09





TGTAAACATCCTTGACTGGAAGCT
68
2
0
3
0
0.06
0
0.12
0





TAAGGTGCATCTAGTGCAGATA
69
0
0
1
4
0
0
0.04
0.18





ATCACATTGCCAGGGATTACCA
70
0
0.5
3
1
0
0.02
0.12
0.05





ACTGCCCTAAGTGCTCCTTCTG
71
0
0
0
5
0
0
0
0.23





TGTAAACATCCTCGACTGGA
72
1
0
3
0
0.03
0
0.12
0





TACAGTACTGTGATAACTGAAG
73
1
0
0
3
0.03
0
0
0.14





CTAGACTGAAGCTCCTTGAGG
74
2
1
1
0
0.06
0.04
0.04
0





TGGGTTGAGAGGGCGAA
1094
1
0
1
1
0.03
0
0.04
0.05





TGGCAGTGTCTTAGCTGGTTGTT
75
0
1
2
0
0
0.04
0.08
0





TGAGGTAGTAAGTTGTATTGTT
76
0
1
1
1
0
0.04
0.04
0.05





TGAGAACTGAATTCCATAGGCTGT
77
1
0
2
0
0.03
0
0.08
0





TCGAGGAGCTCACAGTCTAGTA
78
1
0
1
1
0.03
0
0.04
0.05





AGCTCGGTCTGAGGCCCCTCAG
79
0
0
2
1
0
0
0.08
0.05





ACTGCAGTGAAGGCACTTGTAG
80
0
0
0
3
0
0
0
0.14





ACCATCGACCGTTGATTGTA
81
0
1
0
2
0
0.04
0
0.09





TTCACCACCTTCTCCACCCAG
82
0
0
0
2
0
0
0
0.09





TTCACAGTGGCTAAGTTCTG
83
0
0
2
0
0
0
0.08
0





TTCAAGTAATTCAGGATAGGTT
84
0
0
1
1
0
0
0.04
0.05





TGGGTTTACGTTGGGAGAACT
85
0
0
0
2
0
0
0
0.09





TGAGGTAGGAGGTTGTATAGTT
87
0
0
1.02
0
0
0
0.04
0





TGACCGATTTCTCCTGGTGTT
88
2
0
0
0
0.06
0
0
0





TATTGCACTCGTCCCGGCC
89
0
0
1
1
0
0
0.04
0.05





GGGGTGCTATCTGTGATTGA
90
2
0
0
0
0.06
0
0
0





GCATGGGTGGTTCAGTGGTAGAA
91
0
0
2
0
0
0
0.08
0





CTGGCCCTCTCTGCCCTT
92
0
0
1
1
0
0
0.04
0.05





CTGACCTATGAATTGACAGC
93
0
0
0
2
0
0
0
0.09





CTCCTGACTCCAGGTCCTGTG
94
0
0
0
2
0
0
0
0.09





CGTCAACACTTGCTGGTT
95
0
0
1
1
0
0
0.04
0.05





CGAATCATTATTTGCTGCTCT
96
0
0
1
1
0
0
0.04
0.05





CATCGGGAATGTCGTGTCCGCC
97
0
2
0
0
0.07
0
0






CAGTGGTTTTACCCTATGGTA
98
0
0
1
1
0
0
0.04
0.05





CAGTGCAATGATGAAAGGGCAT
99
0
0
2
0
0
0
0.08
0





CAGCAGCACACTGTGGTTTGT
100
0
0
2
0
0
0
0.08
0





CACGCTCATGCACACACCCAC
101
0
0
2
0
0
0
0.08
0





AAGGAGCTCACAGTCTATTGAG
102
0
0
2
0
0
0
0.08
0





GCCGGGTACTTTCGTATTTT
131
3
3
0
34
0.09
0.11
0
1.56





GCTAAGGAAGTCCTGTGCTCAGT
132
0
0
1
19
0
0
0.04
0.87


TTT














AGGGTGTGCGTGTTTTT
148
0
0
0
20
0
0
0
0.91





GAGAGCGCTCGGTTTTT
150
0
0
1
9
0
0
0.04
0.41





TGGTGTGGTCTGTTGTTTT
151
0
0
0
9
0
0
0
0.41





TGTGCTCCGGAGTTACCTCGTTT
152
0
0
0
8
0
0
0
0.37





AACCGAGCGTCCAAGCTCTTTCC
155
0
0
0
5
0
0
0
0.23


ATTTT














TCCCCGCACCTCCACCA
156
0
2
1
1
0
0.07
0.04
0.05





GGCGTGATTCATACCTTTT
162
0
0
0
2
0
0
0
0.09





ATGTGGTGGCTTACTTTT
169
0
0
0
2
0
0
0
0.09





AGAAAGGCCGAATTTTA
172
0
0
1
1
0
0
0.04
0.05





TCCCCGGCACTTCCACCA
157
0
3
0
0
0
0.11
0
0





CCCACCCAGGGACGCCA
232
223
218
6
2
6.84
7.92
0.24
0.09





TCCCCGGCACCTCCACCA
233
60.47
101.82
40.28
34
1.85
3.7
1.62
1.56





ATCCCGGACGAGCCCCCA
234
48
60
80
45
1.47
2.18
3.22
2.06





CCCCCCACTGCTAAATTTGACTG
142
18
8
61
22
0.55
0.29
2.45
1.01


GCTT














TCGATTCCCGGCCAATGCACCA
236
4
18
36
4
0.12
0.65
1.45
0.18





ATCCCACTTCTGACACCA
237
11
9
26.69
14
0.34
0.33
1.07
0.64





TCCCCGTACGGGCCACCA
239
11
6
3
2
0.34
0.22
0.12
0.09





AGTCCCATCTGGGTCGCCA
243
4
2
3
6
0.12
0.07
0.12
0.27





CCCCCCACTGCTAAATTTGACTG
244
1
1
6
2
0.03
0.04
0.24
0.09


GA














GTCCCTTCGTGGTCGCCA
246
1
2
1
2
0.03
0.07
0.04
0.09





TCCTCACACGGGGCACCA
248
2
1
2
0
0.06
0.04
0.08
0





TAACGGCCGCGGTACCC
249
0
3
1
0
0
0.11
0.04
0





GAGGGGGACCAAAAAAAA
250
0
0
0
4
0
0
0
0.18





CCCGGGTTTCGGCACCA
133
0
3
0
1
0
0.11
0
0.05





ATAGGTTTGGTCCTAGCCTTTCT
354
0
0
3
1
0
0
0.12
0.05





AGGGGGGTAAAAAAAAA
252
0
0
0
4
0
0
0
0.18





GAGCCATGATGATACCACTGAGC
254
0
1
0
2
0
0.04
0
0.09





AGAACACTACGAGCCACA
257
3
0
0
0
0.09
0
0
0





ACCCCACTTCTGGTACCA
258
0
0
1
2
0
0
0.04
0.09





TCTCGGTGGAACCTCCA
260
0
0
1
1
0
0
0.04
0.05





TACCGAGCCTGGTGATAGC
262
0
1
1
0
0
0.04
0.04
0





GCAGCGCCAGCCTCCCGCCCTAC
263
2
0
0
0
0.06
0
0
0





ATCCCCAGCACCTCCACCA
265
0
0
0
2
0
0
0
0.09





AGCAGTGATGTCCTGAAAATTCT
266
0
0
0
2
0
0
0
0.09


GAAG














ACCCCACTATGCTTAGCCCT
267
0
0
2
0
0
0
0.08
0





AAAGGACCTGGCGGTGCTTC
268
1
0
1
0
0.03
0
0.04
0





TCCCCGGCATCTCCACCA
350
116.53
75.18
104.72
59
3.57
9.99
4.21
2.7





GCCCGCATCCTCCACCA
143
38
61
2
4
1.16
2.21
0.08
0.18





ATCCCACTCCTGACACCA
145
7
13
11.31
3
0.21
0.47
0.46
0.14





TCGATTCCCGGCCCATGCACCA
149
1
2
10
4
0.03
0.07
0.4
0.18





TCCCCGACACCTCCACCA
153
2
2
2
1
0.06
0.07
0.08
0.05





TCTAGAGGAGCCTGTTCTGTA
352
0
1
3
0
0
0.04
0.12
0





TCGATTCCCGGTCAGGGAACCA
353
0
0
0
4
0
0
0
0.18





TCACCCCATAAACACCA
158
2
1
0
0
0.06
0.04
0
0





CGTTCGCGCTTTCCCCTG
355
0
1
2
0
0
0.04
0.08
0





TTCCCCGACGGGGAGCCA
161
1
0
0
1
0.03
0
0
0.05





TAAGTGTTTGTGGGTTA
356
1
1
0
0
0.03
0.04
0
0





ATCCCCAGCATCTCCACCA
171
0
0
2
0
0
0
0.08
0





CCCCCACAACCGCGCTTGACTAGC
368
12
11
7
9
0.37
0.4
0.28
0.41





CCCTGCTCGCTGCGCCA
369
7
20
5
1
0.21
0.73
0.2
0.05





CTCCCACTGCTTCACTTGACTAGC
370
2
2
18
9
0.06
0.07
0.72
0.41





TCACGTCGGGGTCACCA
372
16
4
5
1
0.49
0.15
0.2
0.05





TCCCTGGTGGTCTAGTGGTTAGG
373
0
1
10
6
0
0.04
0.4
0.27


ATTCG














TCCTGCCGCGGTCGCCA
375
6
8
0
1
0.18
0.29
0
0.05





GGAGAGAACGCGGTCTGAGTGGT
376
3
7
1
0
0.09
0.25
0.04
0





TCGGGTGCGAGAGGTCCCGGGT
377
0
0
0
10
0
0
0
0.46





GGCTGGTCCGATGGTAGTGGGTT
378
4
3
3
0
0.12
0.11
0.12
0





CGGAAGCGTGCTGGGCCC
379
1
5
0
4
0.03
0.18
0
0.18





GCGGGTGATGCGAACTGGAGTCT
381
0
0
6
1
0
0
0.24
0.05


GAGC














GACGAGGTGGCCGAGTGG
382
2
3
2
0
0.06
0.11
0.08
0





CTGATTGCTCCTGTCTGATT
351
0
0
6
1
0
0
0.24
0.05





CTCCTGGCTGGCTCGCCA
384
0
0
3
3
0
0
0.12
0.14





GGCTGGTCCGAGTGCAGTGGTG
386
0
1
4
0
0
0.04
0.16
0


TTTA














CTGCTGTGATGACATTC
387
1
2
2
0
0.03
0.07
0.08
0





TCCTCGTTAGTATAGTGGTGAGT
389
0
1
3
0
0
0.04
0.12
0


ATCCC














GCATTGGTGGTTCAGTGGTAGA
391
0
0
3
1
0
0
0.12
0.05





TTCTCACTACTGCACTTGACTA
395
0
0
2
1
0
0
0.08
0.05





TGGTTATCACGTTCGCC
396
0
2
0
1
0
0.07
0
0.05





TAGGGGTATGATTCTCGCT
397
1
0
0
2
0.03
0
0
0.09





CGTCCATGATGTTCCGCAA
255
1
0
2
0
0.03
0
0.08
0





CCACGAGGAAGAGAGGTAGC
398
2
1
0
0
0.06
0.04
0
0





TGTATTGTGAGACATTC
259
0
1
1
0
0
0.04
0.04
0





GTCAGGATGGCCGAGCGGTCT
399
0
1
1
0
0
0.04
0.04
0









A sizeable number of sequences (334) aligned to genomic regions that did not fulfill the criteria for miRNA precursors (FIG. 24). About 80% of these sequences were annotated or cloned only once and may represent degradation products originating from other RNA species (FIG. 24 and Table 8). The remaining (58 sequences), however, mapped to genomic regions that lack annotations and may therefore represent a part of the transcriptome whose functions are unknown (Table 9 and Table 10). Interestingly, several of these nonannotated sequences (i.e., CU-5004, CU-5021, CU-6030, CU-6069) were cloned multiple times and showed differential expression across libraries (Table 9 and Table 10), suggesting that they may represent short RNAs with characteristics distinct from those currently recognized in “classic” miRNAs.


In conclusion, the generation of short-RNA libraries from normal and neoplastic B cells led to the identification of 178 mature miRNAs cloned multiple times as well as other short-RNA species of unknown function.









TABLE 8







Characterization of short-RNA libraries. Number of not


redundant short-RNAs cloned in each library (naïve B cells, memory


B cells, centroblasts and Ramos cell line) and overall (total). Each


short-RNA is annotated according to the listed RNA species. Results


shown here refer only to short-RNA with matches to the human genome.


The same short-RNA might match to multiple databases and therefore


the overall sum does not correspond to the total number of short-RNAs.












RNA species
Naïve
Memory
Centroblasts
Ramos
Total















Total (non redundant)
680
709
740
740
2086


miRNA
424
408
528
538
1259


miRNA other*
1
0
3
0
4


tRNA
27
33
32
29
108


rRNA
61
99
34
16
174


mRNA
76
72
25
34
176


snoRNA
8
13
15
6
40


yRNA
11
11
31
21
53


piRNA
46
54
70
62
148


Repeats
1
1
0
1
2


Mitochondrial genome
12
36
54
11
101


Human viruses
1
4
0
0
5



E. Coli

5
4
0
0
7


Not Annotated
111
119
97
134
375





*miRNA other: includes fragments of miRNA precursors, not mature.


The databases used in this analysis are detailed in Supplementary Methods.













TABLE 9







(PART A) List of short-RNA lacking genomic locations with appropriate RNA secondary


structures to be defined miRNAs.


Table includes information on genomic locations and annotations.











SEQ ID




ID
NO:
Short-RNA sequence
Annotations





CU-5004
1096
GAAGCGGGTGCTCTTATTTT
NEW





CU-5008
1097
GTGTAAGCAGGGTCGTTTT
NEW





CU-6003
1098
ATCCCACCGCTGCTACCA
NEW





CU-5023
1099
GGGAAGGTGACCTGAC
NEW





CU-5007
1100
CTCCCGCCTTTTTTCCC
NEW





CU-5024
1101
CGGAGCAAGAGCGT
NEW





CU-5026
1102
CCCCCGGCACCATCAATA
NEW





CU-5027
1103
CAGCCTAGCCCCTACCC
NEW





CU-5005
1104
CAGAAGGTCTCACTTTT
NEW





CU-5006
1105
AGTATTCTCTGTGGCTTT
NEW





CU-5028
1106
TGGAGTGACTATATGGATGCCCCC
NEW





CU-5029
1107
TCTGATAGCTTACTTT
NEW





CU-5030
1108
TCGAGCCCCAGTGGAACCAC
NEW





CU-5032
1109
TCCTCCCCACACTCATCGCCCTTACCA
NEW





CU-5033
1110
TATACTACAAGGACACCA
NEW





CU-5034
1111
TAGTGGGTGAAAAAAAAAAAA
NEW





CU-5035
1112
TACCACACATTCGAAGAACCCGTA
NEW





CU-5036
1113
TACAAAACCCACCCCATTCCTCCCCA
NEW





CU-5019
1114
GGAGGGGGGGTAAAAAAAA
NEW





CU-5037
1115
GCCCTCCTAATGACCTCC
NEW





CU-5038
1116
CTTCCCTCTACACTTATCATC
NEW





CU-5039
1117
CGGGCGGCCTGCGCTCTCA
NEW





CU-5040
1118
CCCGAGGCCGTGTGCAAATGCAT
NEW





CU-5020
1119
CCCCGGCATCTCCACC
NEW





CU-5041
1120
CCCCCAGTACCTCCACCA
NEW





CU-5042
1121
CCCCCACTGCTAAACTTGACTGGCTTT
NEW





CU-5043
1122
CCCACTCCACCTTACTACCA
NEW





CU-5044
1123
CCCAAGAACAGGGTGACCA
NEW





CU-5045
1124
CCAGTCGCGGCCAAATCA
NEW





CU-5046
1125
CCAGCTTCACCAAGGTATTGGTTA
NEW





CU-5047
1126
CCAGAAAAAACAGGCCTC
NEW





CU-5048
1127
CATCATAATCGGAGGCTTTGGCAAC
NEW





CU-5049
1128
CAGCAGGGGTAATAAGTGAAATCAAA
NEW





CU-5050
1129
CAATGGTGCAGCCGCTATTAAAGGTTCA
NEW





CU-5051
1130
CAACTCCTACATACTTCCCCC
NEW





CU-5053
1131
ATGCATCTCATATGCGAATAGGAATGC
NEW





CU-5054
1132
ATCCCACTTCTGTACCA
NEW





CU-5055
1133
ATAACACTAGAAAGTTGGGGCAGATTGC
NEW





CU-5056
1134
ACGTGGGCACATTACCCGTCTGACCTGA
NEW





CU-5057
1135
ACCCCTTATTAACCCA
NEW





CU-5058
1136
ACAAGGCACACCTACACCCCTTATCCC
NEW





CU-5059
1137
AAAAGACACCCCCCCACCA
NEW





CU-5060
1138
AAAACCCCTACGCATTTATAT
NEW





CU-5061
1139
AAAAAGACACCCCCCACCA
NEW





CU-5011
1140
GCTAAACCTAGCCCCAAACCC
piRNA-annotate;refseqGeneIntron-annotate





CU-5003
1141
ACCCCACTCCTGGTACCA
refseqGeneIntron-annotate





CU-5009
1142
TGCCCCCATGTCTAACAACATGGCTA
refseqGeneIntron-annotate;rnaGene-annotate





CU-5062
1143
CCCCGCCTGTTTACC
refseqGeneIntron-annotate





CU-5063
1144
CCCACTTCTGACACCA
computGene-annotate;refseqGeneIntron-annotate;





exEID-annotate





CU-5064
1145
CACCACCTCTTGCTCAGCC
mRNA-annotate;refseqGeneIntron-annotate





CU-5014
1146
CTGGAAAGTGCACTTGGACGAACA
refseqGeneIntron-annotate





CU-5065
1147
TGACCGCTCTGACCAC
refseqGeneIntron-annotate





CU-5066
1148
TGAAGTCCCTTTGCTTTGTT
refseqGeneIntron-annotate





CU-5067
1149
TGAACACACAATAGCTAAGACCC
mRNA-annotate;refseqGeneIntron-annotate





CU-5068
1150
TCGCCTTACCCCCCACTA
refseqGeneIntron-annotate





CU-5069
1151
TCGATAAACCCCGATCAACCT
mRNA-annotate;refseqGeneIntron-annotate





CU-5070
1152
TCCCCGTCACCTCCACCA
refseqGeneIntron-annotate





CU-5071
1153
TCCCCGGCACTCCACCA
refseqGeneIntron-annotate





CU-5072
1154
TCCCCCCGCTGCCACCA
refseqGeneIntron-annotate





CU-5073
1155
TCCCCCCCATCTCCACCA
refseqGeneIntron-annotate





CU-5074
1156
TACACACCGCCCGTCACCC
mRNA-annotate;refseqGeneIntron-annotate





CU-5076
1157
GCTTAGCCTAGCCACACCCCCACG
mRNA-annotate;refseqGeneIntron-annotate





CU-5077
1158
GCTCGCCAGAACACTACGA
mRNA-annotate;refseqGeneIntron-annotate





CU-5078
1159
GCCGGGGGGCGGGCGCA
refseqGeneIntron-annotate





CU-5079
1160
GAACCGGGCGGGAACACCA
refseqGeneIntron-annotate





CU-5080
1161
CGCCGCAGTACTGATCATTC
refseqGeneIntron-annotate





CU-5081
1162
CCGCACCAATAGGATCCTCC
refseqGeneIntron-annotate





CU-5082
1163
CCCGGCCGACGCACCA
refseqGeneIntron-annotate





CU-5083
1164
CCACCCCATCATACTCTTTC
refseqGeneIntron-annotate





CU-5084
1165
CACCCCCCAGCTCCTCCTTT
refseqGeneIntron-annotate





CU-5085
1166
ATAAGTAACATGAAAACATTCTCCTC
refseqGeneIntron-annotate





CU-5086
1167
ACTGCTCGCCAGAACAC
mRNA-annotate;refseqGeneIntron-annotate





CU-5087
1168
ACCCTGGTGTGGGATCTGCCCGATC
refseqGeneIntron-annotate





CU-5088
1169
AACCTCACCACCTCTTTCT
refseqGeneIntron-annotate





CU-5089
1170
AAAAGACACCCCCCACACCA
refseqGeneIntron-annotate





CU-5021
1171
ACCGGGCGGAAACACCA
tRNAprefix-annotate





CU-5022
1172
TCCCGGGTTCAAATCCCGGACGAGCCCCCA
tRNAprefix-annotate





CU-5010
1173
GGCCGTGATCGTATA
piRNA-annotate





CU-5025
1174
CCCCGTACTGGCCACCA
tRNAprefix-annotate





CU-5090
1175
TGGGATGCGAGAGGTCCCGGGT
rnaGene-annotate





CU-5031
1176
TCGAATCCTGTTCGTGACGCCA
tRNAprefix-annotate





CU-5091
1177
CTGAACTCCTCACACCC
piRNA-annotate





CU-5052
1178
ATTCAAAAAAGAGTACCA
tRNAprefix-annotate





CU-5092
1179
ATTAATCCCCTGGCCCAACCCG
computGene-annotate





CU-5093
1180
AGCCCCAAACCCACTCCAC
piRNA-annotate





CU-5094
1181
CGCGACCTCAGATCAGAC
rRNA-eliminate;piRNA-annotate;





refseqGeneIntron-annotate





CU-5013
1182
GGCCGGTGATGAGAACT
mRNAall-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;wgRNA-annotate;





snoRNA-annotate





CU-5015
1183
TCAAGTGATGTCATCTTACTACTGAGA
mRNAall-annotate;snoRNA-annotate;





refseqGeneExon-eliminate;rnaGene-annotate;





refseqGeneIntron-annotate;snoRNA-eliminate;





wgRNA-annotate





CU-5095
1184
TTGGGTGCGAGAGGTCCCGGGT
tRNAcomputational-annotate;tRNA-eliminate;





HStRNA-eliminate;rnaGene-annotate





CU-5096
1185
TCTCGGTGGGACCTCCA
tRNAprefix-annotate;refseqGeneExon-eliminate





CU-5097
1186
CCGCCCCCCGTTCCCCC
rRNA-eliminate





CU-5098
1187
CCCACTGCTAAATTTGACTGGCTT
mRNAall-annotate;yRNA-eliminate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-5099
1188
ACAGACCAAGAGCCTTC
tRNA-eliminate;rnaGene-annotate





CU-5100
1189
TGTAGTAGTCAATTAATGGATATTA
refseqGeneExon-eliminate





CU-5101
1190
TGGTTATCACGTTCGCCTCACACGCGA
tRNAcomputational-annotate;tRNA-eliminate;





HStRNA-eliminate;rnaGene-annotate





CU-5102
1191
TGGGAATACCGGGTG
rRNA-eliminate;rnaGene-annotate;piRNA-





annotate;refseqGeneIntron-annotate





CU-5103
1192
TGGCGGCCAAGCGTTCATAGCGACGTC
rRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-5104
1193
TCGTCATCCAGCTAAGGGCTCAGA
mRNAall-annotate;refseqGeneExon-eliminate;





exEID-annotate





CU-5105
1194
TCGCCTGCCACGCGGGAGGCCCGGGT
rnaGene-annotate;tRNAcomputational-annotate;





tRNA-eliminate;refseqGeneIntron-annotate;





mRNA-annotate;HStRNA-eliminate





CU-5106
1195
TCCCACTGCTTCACTTGA
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-5107
1196
GTTTAGACGGGCTCACATCACCCCA
tRNA-eliminate;piRNA-annotate;





refseqGeneIntron-annotate





CU-5075
1197
GGCCGGTGATGAGAACTTCTCCC
mRNAall-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;wgRNA-annotate;





snoRNA-annotate





CU-5108
1198
GCTAACTCATGCCCCCATGTC
tRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-5109
1199
GACTGTGGTGGTTGAATATA
mRNAall-annotate;computGene-annotate;





refseqGeneExon-eliminate;exEID-annotate





CU-5110
1200
CGCGACCTCAGATCAGACGTGGCGACC
rRNA-eliminate;piRNA-annotate;





refseqGeneIntron-annotate





CU-5111
1201
CGCCGCCGCCCCCCC
mRNAall-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;exEID-annotate





CU-5112
1202
CGCCCGACTACCACCACATCCA
mRNAall-annotate;computGene-annotate;





refseqGeneExon-eliminate;exEID-annotate





CU-5113
1203
CCCCCCTCCACGCGCCC
rRNA-eliminate;refseqGeneIntron-annotate





CU-5114
1204
CCCCACCCCGCGCCCTC
rRNA-eliminate;refseqGeneIntron-annotate





CU-5115
1205
CAGAGTGTAGCTTAACACAAAGCACCCAA
tRNA-eliminate;piRNA-annotate;rnaGene-





annotate





CU-5116
1206
CAATCTTGGCATGTTGGTCTGGTCACCCA
mRNAall-annotate;refseqGeneExon-eliminate;





exEID-annotate





CU-5117
1207
CAAAGCATCGCGAAGGCCC
mRNAall-annotate;rRNA-eliminate;piRNA-





annotate;rnaGene-annotate





CU-5118
1208
AACACCCTGATTGCTCCTGTCTGAT
mRNAall-annotate;exEID-annotate;snoRNA-





annotate;refseqGeneExon-eliminate;rnaGene-





annotate;snoRNA-eliminate;wgRNA-annotate





CU-5119
1209
AAAAAGGGCCTAAAGAAGATGCA
mRNAall-annotate;computGene-annotate;





refseqGeneExon-eliminate;refseqGeneIntron-





annotate;exEID-annotate
















TABLE 9







(PART B) List of short-RNA lacking genomic locations with


appropriate RNA secondary structures to be defined miRNAs.


Table includes information on counts.











Corrected Counts












SEQ ID

Naïve
Memory
Centroblasts
Ramos


NO:
Short-RNA sequence
(N)
(M)
(CB)
(RA)















1210
GAAGCGGGTGCTCTTATTTT
5
23
25
224





1211
GTGTAAGCAGGGTCGTTTT
0
0
0
7





1212
ATCCCACCGCTGCTACCA
0
1
0
2





1213
GGGAAGGTGACCTGAC
2
0
0
0





1214
CTCCCGCCTTTTTTCCC
0
2
0
0





1215
CGGAGCAAGAGCGT
2
0
0
0





1216
CCCCCGGCACCATCAATA
0
0
1
1





1217
CAGCCTAGCCCCTACCC
0
2
0
0





1218
CAGAAGGTCTCACTTTT
0
1
0
1





1219
AGTATTCTCTGTGGCTTT
0
0
0
2





1220
TGGAGTGACTATATGGATGCCCCC
0
0
1
0





1221
TCTGATAGCTTACTTT
0
1
0
0





1222
TCGAGCCCCAGTGGAACCAC
0
0
1
0





1223
TCCTCCCCACACTCATCGCCCTTACCA
0
0
1
0





1224
TATACTACAAGGACACCA
0
0
0
1





1225
TAGTGGGTGAAAAAAAAAAAA
0
0
0
1





1226
TACCACACATTCGAAGAACCCGTA
0
0
1
0





1227
TACAAAACCCACCCCATTCCTCCCCA
0
1
0
0





1228
GGAGGGGGGGTAAAAAAAA
0
0
1
0





1229
GCCCTCCTAATGACCTCC
0
0
1
0





1230
CTTCCCTCTACACTTATCATC
0
0
1
0





1231
CGGGCGGCCTGCGCTCTCA
1
0
0
0





1232
CCCGAGGCCGTGTGCAAATGCAT
0
0
1
0





1233
CCCCGGCATCTCCACC
1
0
0
0





1234
CCCCCAGTACCTCCACCA
0
1
0
0





1235
CCCCCACTGCTAAACTTGACTGGCTTT
0
0
1
0





1236
CCCACTCCACCTTACTACCA
0
0
0
1





1237
CCCAAGAACAGGGTGACCA
0
0
0
1





1238
CCAGTCGCGGCCAAATCA
0
1
0
0





1239
CCAGCTTCACCAAGGTATTGGTTA
0
0
1
0





1240
CCAGAAAAAACAGGCCTC
0
0
0
1





1241
CATCATAATCGGAGGCTTTGGCAAC
0
0
1
0





1242
CAGCAGGGGTAATAAGTGAAATCAAA
0
0
1
0





1243
CAATGGTGCAGCCGCTATTAAAGGTTCA
0
0
0
1





1244
CAACTCCTACATACTTCCCCC
1
0
0
0





1245
ATGCATCTCATATGCGAATAGGAATGC
0
0
1
0





1246
ATCCCACTTCTGTACCA
0
1
0
0





1247
ATAACACTAGAAAGTTGGGGCAGATTGC
0
0
1
0





1248
ACGTGGGCACATTACCCGTCTGACCTGA
0
0
0
1





1249
ACCCCTTATTAACCCA
0
1
0
0





1250
ACAAGGCACACCTACACCCCTTATCCC
0
0
1
0





1251
AAAAGACACCCCCCCACCA
0
0
0
1





1252
AAAACCCCTACGCATTTATAT
0
0
1
0





1253
AAAAAGACACCCCCCACCA
0
0
0
1





1254
GCTAAACCTAGCCCCAAACCC
9
16
13
18





1255
ACCCCACTCCTGGTACCA
1
11
5
6





1256
TGCCCCCATGTCTAACAACATGGCTA
7
4
1
1





1257
CCCCGCCTGTTTACC
0
5
2
0





1258
CCCACTTCTGACACCA
3
4
0
0





1259
CACCACCTCTTGCTCAGCC
1
3
0
0





1260
CTGGAAAGTGCACTTGGACGAACA
0
2
0
0





1261
TGACCGCTCTGACCAC
0
1
0
0





1262
TGAAGTCCCTTTGCTTTGTT
1
0
0
0





1263
TGAACACACAATAGCTAAGACCC
0
0
1
0





1264
TCGCCTTACCCCCCACTA
0
1
0
0





1265
TCGATAAACCCCGATCAACCT
0
0
1
0





1266
TCCCCGTCACCTCCACCA
0
0
1
0





1267
TCCCCGGCACTCCACCA
0
0
1
0





1268
TCCCCCCGCTGCCACCA
1
0
0
0





1269
TCCCCCCCATCTCCACCA
0
0
1
0





1270
TACACACCGCCCGTCACCC
0
0
1
0





1271
GCTTAGCCTAGCCACACCCCCACG
0
0
1
0





1272
GCTCGCCAGAACACTACGA
0
0
1
0





1273
GCCGGGGGGCGGGCGCA
0
1
0
0





1274
GAACCGGGCGGGAACACCA
0
0
0
1





1275
CGCCGCAGTACTGATCATTC
0
0
1
0





1276
CCGCACCAATAGGATCCTCC
0
1
0
0





1277
CCCGGCCGACGCACCA
1
0
0
0





1278
CCACCCCATCATACTCTTTC
0
0
1
0





1279
CACCCCCCAGCTCCTCCTTT
1
0
0
0





1280
ATAAGTAACATGAAAACATTCTCCTC
0
0
1
0





1281
ACTGCTCGCCAGAACAC
0
0
1
0





1282
ACCCTGGTGTGGGATCTGCCCGATC
0
0
1
0





1283
AACCTCACCACCTCTTTCT
0
0
1
0





1284
AAAAGACACCCCCCACACCA
0
0
0
1





1285
ACCGGGCGGAAACACCA
9
14
60
20





1286
TCCCGGGTTCAAATCCCGGACGAGCCC
0
0
4
37



CCA









1287
GGCCGTGATCGTATA
2
0
0
0





1288
CCCCGTACTGGCCACCA
2
0
0
0





1289
TGGGATGCGAGAGGTCCCGGGT
0
0
0
1





1290
TCGAATCCTGTTCGTGACGCCA
0
0
0
1





1291
CTGAACTCCTCACACCC
0
1
0
0





1292
ATTCAAAAAAGAGTACCA
0
0
1
0





1293
ATTAATCCCCTGGCCCAACCCG
0
0
0
1





1294
AGCCCCAAACCCACTCCAC
0
0
1
0





1295
CGCGACCTCAGATCAGAC
1
5
8
1





1296
GGCCGGTGATGAGAACT
4
3
0
0





1297
TCAAGTGATGTCATCTTACTACTGAGA
0
0
3
1





1298
TTGGGTGCGAGAGGTCCCGGGT
0
0
0
3





1299
TCTCGGTGGGACCTCCA
0
2
0
0





1300
CCGCCCCCCGTTCCCCC
1
1
0
0





1301
CCCACTGCTAAATTTGACTGGCTT
0
0
1
1





1302
ACAGACCAAGAGCCTTC
0
0
2
0





1303
TGTAGTAGTCAATTAATGGATATTA
0
0
1
0





1304
TGGTTATCACGTTCGCCTCACACGCGA
0
0
0
1





1305
TGGGAATACCGGGTG
0
0
1
0





1306
TGGCGGCCAAGCGTTCATAGCGACGTC
0
0
0
1





1307
TCGTCATCCAGCTAAGGGCTCAGA
0
0
1
0





1308
TCGCCTGCCACGCGGGAGGCCCGGGT
0
0
1
0





1309
TCCCACTGCTTCACTTGA
0
0
0
1





1310
GTTTAGACGGGCTCACATCACCCCA
0
0
1
0





1311
GGCCGGTGATGAGAACTTCTCCC
1
0
0
0





1312
GCTAACTCATGCCCCCATGTC
0
0
1
0





1313
GACTGTGGTGGTTGAATATA
0
0
0
1





1314
CGCGACCTCAGATCAGACGTGGCGACC
0
0
1
0





1315
CGCCGCCGCCCCCCC
0
1
0
0





1316
CGCCCGACTACCACCACATCCA
1
0
0
0





1317
CCCCCCTCCACGCGCCC
0
1
0
0





1318
CCCCACCCCGCGCCCTC
0
1
0
0





1319
CAGAGTGTAGCTTAACACAAAGCACCCAA
0
0
1
0





1320
CAATCTTGGCATGTTGGTCTGGTCACCCA
0
0
1
0





1321
CAAAGCATCGCGAAGGCCC
0
0
1
0





1322
AACACCCTGATTGCTCCTGTCTGAT
0
0
1
0





1323
AAAAAGGGCCTAAAGAAGATGCA
0
0
1
0
















TABLE 10







(PART A) List of short-RNA consensus with maximum 1 mismatch to the human genome.


Table includes information on genomic locations and annotations.











SEQ ID




ID
NO:
Short-RNA sequence
Annotations





CU-6232
1324
TGGCTCAGTTCAGCAGGAACAGT
Mature:hsa-miR-24:MIMAT0000080





CU-6180
1325
GTGGGGGAGAGGCTGTCGA
Mature:hsa-miR-1275:MIMAT0005929





CU-6130
1326
CGGGGCAGCTCAGTACAGGATT
Mature:hsa-miR-486-3p:MIMAT0004762





CU-6044
1327
AATTGCACGGTATCCATCTGTAT
Mature:hsa-miR-363:MIMAT0000707





CU-6239
1328
TGTCAGTTTGTTAATTGACCCAA
NEW





CU-6006
1329
GGCAATACGAGCACCCTG
NEW





CU-6133
1330
CGGGGGAGCGCCGCGTA
NEW





CU-6004
1331
CCGGGGCGTCTCGTAC
NEW





CU-6056
1332
AGCGGCTGTGCACAAA
NEW





CU-6242
1333
TGTCAGTTTGTTTAATCCAA
NEW





CU-6241
1334
TGTCAGTTTGTTATTACCAA
NEW





CU-6237
1335
TGTCAGGCACCATCAATAA
NEW





CU-6225
1336
TGATCTTGACACTTAAAGCC
NEW





CU-6219
1337
TCGTAGGCACCATCAAT
NEW





CU-6215
1338
TCGATCCCGGGTTTCGGCACCA
NEW





CU-6211
1339
TCGACTCCCGGTATGGGAACCA
NEW





CU-6187
1340
TAGGGAGGTTATGATTAACTTTT
NEW





CU-6183
1341
TAAAGTGCTTAGTGCAGGTA
NEW





CU-6181
1342
GTTTATGTTGCTTACCTCC
NEW





CU-6176
1343
GTAGATAAAATATTGGCG
NEW





CU-6163
1344
GGCGGGGACGACGTCAG
NEW





CU-6162
1345
GGCGGCGTCGCGGCGGGTC
NEW





CU-6161
1346
GGAGGGGGTGAACAAAAAGAAAAA
NEW





CU-6159
1347
GCTAAACCTAGCCCCAAACCCACTCCACA
NEW





CU-6142
1348
CTGGATAGCGCACTTCGTT
NEW





CU-6129
1349
CGGGCGAGGGGCGGACGTTCG
NEW





CU-6123
1350
CGGACCTATACCGGA
NEW





CU-6096
1351
CCCCGGGTTCAATCCCCGGCACCTCCACCA
NEW





CU-6088
1352
CCCCCCACAACCGCGAA
NEW





CU-6087
1353
CCCAGCATCTCCTGTGTTTA
NEW





CU-6086
1354
CCCACGTTGGGACGCCA
NEW





CU-6072
1355
ATCGTATCCCACTTCTGACACCA
NEW





CU-6064
1356
ATCACGTCCGTGCCTCCA
NEW





CU-6063
1357
ATAGCAATGTCAGCAGTACCT
NEW





CU-6051
1358
ACCCTGCTCGCTGCGCCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6198
1359
TCCCACCCAGGGACGCCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6218
1360
TCGTAGGCACATCAATA
refseqGeneIntron-annotate





CU-6007
1361
CCCCCACAACCGCGTA
refseqGeneIntron-annotate





CU-6001
1362
ACCCCGTCCGTGCCTCCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6039
1363
AAAAAAGACACCCCCCACA
refseqGeneIntron-annotate





CU-6005
1364
TGTCAGTTTGTTAACCCAA
refseqGeneIntron-annotate





CU-6204
1365
TCCCTGTGGTCTAGTGGTTAGG
refseqGeneIntron-annotate





CU-6172
1366
GGGGGGGTAAAAAAA
refseqGeneIntron-annotate





CU-6171
1367
GGGGGGGGAAAAAAAA
refseqGeneIntron-annotate





CU-6128
1368
CGGGCCCGGGTCTTCCC
refseqGeneIntron-annotate





CU-6002
1369
CCGCCCCCCGTTCCCCCCA
refseqGeneIntron-annotate





CU-6050
1370
ACCCCCGGCTCCTCCACCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6244
1371
TTTGGTGGAAATTTTTTGA
refseqGeneIntron-annotate





CU-6240
1372
TGTCAGTTTGTTATACCAA
refseqGeneIntron-annotate





CU-6238
1373
TGTCAGTTTGTAATTATCCCAA
refseqGeneIntron-annotate





CU-6236
1374
TGTCAATTTTTAACCCAA
refseqGeneIntron-annotate





CU-6227
1375
TGCTAGGGTAAAAAAAAAA
refseqGeneIntron-annotate





CU-6226
1376
TGCAACTCCAAATAAAAGTACCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6224
1377
TGAGGTAACGGGGAATTA
refseqGeneIntron-annotate





CU-6209
1378
TCCTCGGCATCTCCACCA
refseqGeneIntron-annotate





CU-6197
1379
TCATATGAAGTCACCCTAGCCATC
mitochondrion-annotate;refseqGeneIntron-





annotate





CU-6196
1380
TCAGTTTGTTTATTAACCCAA
refseqGeneIntron-annotate





CU-6195
1381
TCAGCGTGTCTTTGCCCT
refseqGeneIntron-annotate





CU-6194
1382
TCACTGGTGGTCTAGTGGT
refseqGeneIntron-annotate;rnaGene-annotate





CU-6193
1383
TCACAATGCTGCCACCA
refseqGeneIntron-annotate





CU-6189
1384
TAGTTGTTAATTAACCCAA
refseqGeneIntron-annotate





CU-6188
1385
TAGTCCTCATCGCCCTCC
mitochondrion-annotate;refseqGeneIntron-





annotate





CU-6184
1386
TAAAGTGCTTATAGTGCGGGTAA
refseqGeneIntron-annotate





CU-6179
1387
GTCCCACCAGAGTCGCCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6170
1388
GGGGGAGGGGCCAAAAAAA
refseqGeneIntron-annotate





CU-6167
1389
GGGACGCCGCGGTGTCG
refseqGeneIntron-annotate





CU-6166
1390
GGGAATACCGGGTGCTTTAGGCTT
refseqGeneIntron-annotate;rnaGene-annotate





CU-6160
1391
GGAAGAAGGTGGTGGTATA
refseqGeneIntron-annotate





CU-6156
1392
GCGGTGAAATGCGTA
computGene-annotate;Ecoli-annotate;





refseqGeneIntron-annotate





CU-6154
1393
GCGGGGAAGGTGGCAAA
refseqGeneIntron-annotate





CU-6152
1394
GCGACGACCTCGCGCCCACCTGGTCA
refseqGeneIntron-annotate





CU-6151
1395
GCCACCCGATACTGCTGT
refseqGeneIntron-annotate





CU-6150
1396
GATGTATGCTTTGTTTCTGTT
refseqGeneIntron-annotate





CU-6148
1397
GAGGGGGATTTAGAAAAAAA
refseqGeneIntron-annotate





CU-6147
1398
GAAGGAAAGTTCTATAGT
refseqGeneIntron-annotate





CU-6146
1399
GAAGCGGCTCTCTTATTT
refseqGeneIntron-annotate





CU-6145
1400
GAACGAGACTCTGGCATGCTGA
refseqGeneIntron-annotate;rnaGene-annotate





CU-6143
1401
CTGGTAGGCCCATCAAT
refseqGeneIntron-annotate





CU-6132
1402
CGGGGCCGATCGCGCGC
computGene-annotate;refseqGeneIntron-annotate





CU-6125
1403
CGGCCCCGGGTTCCTCCC
computGene-annotate;refseqGeneIntron-annotate





CU-6118
1404
CGAGCCCGGTTAGTA
refseqGeneIntron-annotate;rnaGene-annotate





CU-6117
1405
CGACTCTTAGCGGTGGA
piRNA-annotate;refseqGeneIntron-annotate





CU-6116
1406
CGAATCCCACTTCTGACACCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6113
1407
CGAAAGGGAATCGGGTC
refseqGeneIntron-annotate





CU-6112
1408
CCTTAGGTCGCTGGTAAA
refseqGeneIntron-annotate





CU-6108
1409
CCGTGCGAGAATACCA
tRNAprefix-annotate;refseqGeneIntron-annotate





CU-6107
1410
CCGGTCTCTCAAGCGGCC
refseqGeneIntron-annotate





CU-6099
1411
CCCGGCCCTCGCGCGTCC
computGene-annotate;refseqGeneIntron-annotate





CU-6094
1412
CCCCGGCATTTCCACCA
computGene-annotate;refseqGeneIntron-annotate





CU-6090
1413
CCCCCCCGGCTCCTCCACCA
refseqGeneIntron-annotate





CU-6089
1414
CCCCCCACAACCGCTA
refseqGeneIntron-annotate





CU-6085
1415
CCCAAGTATTGACTCACCC
mitochondrion-annotate;refseqGeneIntron-





annotate





CU-6084
1416
CCAGTAAGCGCGAGTC
refseqGeneIntron-annotate





CU-6082
1417
CCAAAGAAAGCACGTAGAG
refseqGeneIntron-annotate





CU-6081
1418
CATGTTTAACGGCCGCGGT
mitochondrion-annotate;refseqGeneIntron-





annotate





CU-6080
1419
CAGTTTGTAATTAACCCAA
refseqGeneIntron-annotate





CU-6079
1420
CAGGAACGGCGCACCA
computGene-annotate;refseqGeneIntron-annotate





CU-6078
1421
CAGAACCCTCTAAATCCCC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6076
1422
CACCCGGCTGTGTGCACATGTGT
miRBASE-annotate;computGene-annotate;





refseqGeneIntron-annotate;wgRNA-annotate





CU-6075
1423
CAATTGGACCAATCTATC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6074
1424
ATTCCTGTACTGCGATA
refseqGeneIntron-annotate





CU-6070
1425
ATCCCTGCGGCGTCTCCA
refseqGeneIntron-annotate





CU-6067
1426
ATCCCACCGCTGCCATCA
refseqGeneIntron-annotate





CU-6062
1427
AGTCAATAGAAGCCGGCGTA
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6061
1428
AGGTTCGTTTGTAAAAA
refseqGeneIntron-annotate





CU-6060
1429
AGGTCCTGGGTTTAAGTGT
computGene-annotate;refseqGeneIntron-annotate





CU-6058
1430
AGGGGGAAGTTCTATAGTC
refseqGeneIntron-annotate





CU-6057
1431
AGGCTGTGATGCTCTCNTGAGCCCT
refseqGeneIntron-annotate





CU-6055
1432
AGCCCCTCTCCGGCCCTTA
refseqGeneIntron-annotate





CU-6054
1433
ACTACCACCTACCTCCC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6052
1434
ACGCCCTTCCCCCCCTTCTTT
miRBASE-annotate;refseqGeneIntron-annotate





CU-6049
1435
ACCCCACTCCTGGTGCAC
refseqGeneIntron-annotate





CU-6048
1436
ACCACCTGATCCCTTCCC
refseqGeneIntron-annotate





CU-6047
1437
ACAGCTAAGCACCCACCA
refseqGeneIntron-annotate





CU-6045
1438
ACACATGTTTAACGGCC
mitochondrion-annotate;refseqGeneIntron-annotate





CU-6043
1439
AATTAGGGACCTGTATG
refseqGeneIntron-annotate





CU-6042
1440
AATGGCCCATTTGGGCAAACA
computGene-annotate;refseqGeneIntron-annotate





CU-6041
1441
AAAGCGGCTGTGCAAACA
refseqGeneIntron-annotate





CU-6030
1442
ATCCTGCCGACTACGCCA
tRNAprefix-annotate





CU-6210
1443
TCGAATCCCACTCCTGACACCA
tRNAprefix-annotate





CU-6069
1444
ATCCCATCCTCGTCGCCA
tRNAprefix-annotate





CU-6216
1445
TCGATTCCCCGACGGGGAGCCA
tRNAprefix-annotate





CU-6071
1446
ATCCGGGTGCCCCCTCCA
tRNAprefix-annotate





CU-6212
1447
TCGACTCCTGGCTGGCTCGCCA
tRNAprefix-annotate;wgRNA-annotate





CU-6202
1448
TCCCGGGCGGCGCACCA
tRNAprefix-annotate





CU-6066
1449
ATCCCACCAGAGTCGCCA
tRNAprefix-annotate





CU-6200
1450
TCCCCGGCATCTCCACCAA
computGene-annotate





CU-6192
1451
TCAAATCACGTCGGGGTCACCA
tRNAprefix-annotate





CU-6157
1452
GCGGTGGATCACTCGGCTCGTGCGT
rnaGene-annotate





CU-6214
1453
TCGATCCCCGTACGGGCCACCA
tRNAprefix-annotate





CU-6213
1454
TCGAGCCTCACCTGGAGCACCA
tRNAprefix-annotate





CU-6206
1455
TCCGGCTCGAAGGACCA
tRNAprefix-annotate





CU-6105
1456
CCGGGTGTTGTAGA
mRNAall-annotate;exEID-annotate





CU-6235
1457
TGTAGCGTGGCCGAGCGGT
rnaGene-annotate





CU-6234
1458
TGGGGCGACCTCGGAGCAG
mitochondrion-annotate





CU-6230
1459
TGGCGTCCTAAGCCAGGGATTGTGGGT
rnaGene-annotate





CU-6229
1460
TGGCAGGGGAGATACCATGATTT
rnaGene-annotate





CU-6222
1461
TCTGATCAGGGTGAGCATC
mitochondrion-annotate





CU-6220
1462
TCGTAGGCACCATCCAT
computGene-annotate





CU-6165
1463
GGGAAACGGGGCGCGGCTG
rnaGene-annotate





CU-6137
1464
CTACTCCTGCTCGCATCTGCTATA
mitochondrion-annotate





CU-6135
1465
CGGGTGGGTTTTTACCGG
computGene-annotate





CU-6120
1466
CGAGGAATTCCCAGTAAG
rnaGene-annotate





CU-6115
1467
CGAACGCACTTGCGGCCCC
rnaGene-annotate





CU-6093
1468
CCCCGCGCGGGTTCGAATC
rnaGene-annotate





CU-6059
1469
AGGGGTATGATTCCCGCTT
rnaGene-annotate





CU-6131
1470
CGGGGCCACGCGCGCGTC
mRNA-annotate;rRNA-eliminate





CU-6032
1471
TGGCGCTGCGGGATGAAC
rRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-1153
1472
CCCCCCACTGCTAAATTTGACTGGCTT
yRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-6182
1473
TAAAGGTTCGTTTGTAAAA
computGene-annotate;refseqGeneExon-eliminate





CU-6033
1474
CGGGGCCGAGGGAGCGA
rRNA-eliminate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6174
1475
GGGTTAGGCCTCTTTT
tRNA-eliminate;rnaGene-annotate





CU-6141
1476
CTGCGGAAGGATCATTA
rRNA-eliminate;rnaGene-annotate





CU-6101
1477
CCCTACCCCCCCGG
rRNA-eliminate;refseqGeneIntron-annotate





CU-6034
1478
CCCGCCGGGTCCGCCC
computGene-annotate;rRNA-eliminate;





refseqGeneExon-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6035
1479
CCCCGCGCCCTCTCTCTCTC
rRNA-eliminate;refseqGeneIntron-annotate





CU-6028
1480
CAGGCCTCCCTGGAATC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6029
1481
AGTCCCACCCGGGGTACCA
computGene-annotate;refseqGeneExon-eliminate;





tRNAprefix-annotate





CU-6243
1482
TTGACACGCCCCAGTGCCCTGT
refseqGeneExon-eliminate





CU-6233
1483
TGGGAGCGGGCGGGCGGTC
rRNA-eliminate;rnaGene-annotate





CU-6231
1484
TGGCGTGGAGCCGGGCGT
rRNA-eliminate;refseqGeneIntron-annotate





CU-6228
1485
TGGAGGTCCGTAGCGGT
rRNA-eliminate;mRNA-annotate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-6223
1486
TGAAGAAGGTCTCGAACA
computGene-annotate;refseqGeneExon-eliminate





CU-6221
1487
TCTCGCCGGGGCTTCCA
computGene-annotate;refseqGeneExon-eliminate;





rnaGene-annotate





CU-6217
1488
TCGTAGCACCATCAATAA
computGene-annotate;refseqGeneExon-eliminate





CU-6208
1489
TCCGGGTCCCCCCTCCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6207
1490
TCCGGGGCTGCACGCGCGCT
rRNA-eliminate;rnaGene-annotate





CU-6205
1491
TCCGGCCGTGTCGGT
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6203
1492
TCCCTGTCCTCCAGGAGT
miRBASE-annotate;computGene-annotate;





refseqGeneExon-eliminate;refseqGeneIntron-





annotate;wgRNA-annotate





CU-6201
1493
TCCCCTCCTCGTCGCCA
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6199
1494
TCCCAGGTAGTCTAGTGGT
refseqGeneExon-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6191
1495
TATTCATTTATCCCCAGCCTAT
miRBASE-annotate;snoRNA-eliminate;





refseqGeneIntron-annotate;wgRNA-annotate;





rnaGene-annotate





CU-6190
1496
TAGTTGTTATAACCCAA
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6186
1497
TAGATCACCCCCTCCCC
mitochondrion-annotate;refseqGeneExon-





eliminate;refseqGeneIntron-annotate





CU-6185
1498
TACCGGCACCTGGCGCC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6178
1499
GTATAGGGGCGAAAGAC
rRNA-eliminate;mRNA-annotate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-6177
1500
GTAGCTGGTTCCCTCCGAA
rRNA-eliminate;mRNA-annotate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-6175
1501
GGTAAGAAGCCCGGCTC
computGene-annotate;rRNA-eliminate;





refseqGeneExon-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6173
1502
GGGGGGGTTTAAAAAAAAA
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6169
1503
GGGGCGCACTACCGGCC
refseqGeneExon-eliminate





CU-6168
1504
GGGAGAGGCTGTCGCTGCG
computGene-annotate;refseqGeneExon-eliminate





CU-6164
1505
GGCGGGTGAAGCGGCG
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6158
1506
GCGGTTCCGGCGGCGTC
rRNA-eliminate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6155
1507
GCGGGGCGCCTAGGCCTGGTTTGT
refseqGeneExon-eliminate





CU-6153
1508
GCGGCGGTCGGCGGGCGGCGGG
rRNA-eliminate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6149
1509
GAGGGGGGGGGTGGGGGGGGA
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6144
1510
CTGTCGGCCACCATCAT
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6140
1511
CTGCAACTCGACCCCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6139
1512
CTCCTCTCCCCGCCCGCCG
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6138
1513
CTCAAAGATTAAGCCATGCATGTCTA
rRNA-eliminate;rnaGene-annotate





CU-6136
1514
CTACGCCGCGACGAG
computGene-annotate;rRNA-eliminate





CU-6134
1515
CGGGTGACGGGGAATCAGGGTT
rRNA-eliminate;rnaGene-annotate





CU-6127
1516
CGGGCAGCTTCCGGGA
computGene-annotate;rRNA-eliminate;





refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6126
1517
CGGGAGGCCCGGGTCCTG
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6124
1518
CGGCCCCGCATCCTCCC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6122
1519
CGCGGGTAAACGGCGGGAGTAACTAT
mRNAall-annotate;rRNA-eliminate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-6121
1520
CGCCCCCCGTTCCCCCCTCC
rRNA-eliminate





CU-6119
1521
CGAGCGGAAACACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;tRNAprefix-annotate





CU-6114
1522
CGAACCCGGCACCGC
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6111
1523
CCTCGGGCCGATCGCAC
rRNA-eliminate;rnaGene-annotate





CU-6110
1524
CCTATATATCTTACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate;tRNAprefix-annotate





CU-6109
1525
CCGTGGCGGCGACGACC
computGene-annotate;rRNA-eliminate;





refseqGeneExon-eliminate





CU-6106
1526
CCGGGTTCCGGCACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6104
1527
CCGCGAGGGGGGCCCG
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6103
1528
CCGCCTCACGGGACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6102
1529
CCGCCCGTCCCCGCCCCTTG
rRNA-eliminate;refseqGeneIntron-annotate;





rnaGene-annotate





CU-6100
1530
CCCGGGGCCGCGGTTCCG
computGene-annotate;rRNA-eliminate;





refseqGeneIntron-annotate





CU-6098
1531
CCCGAGCCGCCTGGAT
computGene-annotate;rRNA-eliminate;





refseqGeneExon-eliminate;refseqGeneIntron-





annotate;rnaGene-annotate





CU-6097
1532
CCCGACGGCCGAACT
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6095
1533
CCCCGGGGAGCCCGGCGGG
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6092
1534
CCCCCTCGCGGCCCTCCCC
rRNA-eliminate;refseqGeneIntron-annotate





CU-6091
1535
CCCCCCGTGGCGGCGAC
rRNA-eliminate;refseqGeneIntron-annotate





CU-6083
1536
CCACCCAGGGCACGCCA
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6077
1537
CACGGGTGACGGGGAA
computGene-annotate;rnaGene-annotate;





refseqGeneIntron-annotate;rRNA-eliminate;





refseqGeneExon-eliminate;piRNA-annotate





CU-6073
1538
ATGGGGAGGAAAAAAAAAAAAAA
refseqGeneExon-eliminate;refseqGeneIntron-





annotate





CU-6068
1539
ATCCCACCGCTGCCCCCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6065
1540
ATCACGTCGGTCACCA
computGene-annotate;refseqGeneExon-eliminate;





refseqGeneIntron-annotate





CU-6053
1541
ACGGGAAACCTCACCCGGCCCGG
rRNA-eliminate;piRNA-annotate;rnaGene-





annotate





CU-6046
1542
ACAGAGGCTTACGACCCCTTATTT
mitochondrion-annotate;tRNA-eliminate;





refseqGeneIntron-annotate;rnaGene-annotate





CU-6040
1543
AAAAAGGCATAATTAAACTT
mitochondrion-annotate;refseqGeneExon-





eliminate;refseqGeneIntron-annotatep
















TABLE 10







(PART B) List of short-RNA consensus with maximum 1 mismatch


to the human genome, including count information.









Corrected Counts












SEQ

Naïve
Memory
Centroblasts
Ramos


ID NO:
Short-RNA sequence
(N)
(M)
(CB)
(RA)















1544
TGGCTCAGTTCAGCAGGAACAGT
0
0
1
0





1545
GTGGGGGAGAGGCTGTCGA
0
0
0
1





1546
CGGGGCAGCTCAGTACAGGATT
0
0
1
0





1547
AATTGCACGGTATCCATCTGTAT
0
0
1
0





1548
TGTCAGTTTGTTAATTGACCCAA
0
0
1
1





1549
GGCAATACGAGCACCCTG
2
0
0
0





1550
CGGGGGAGCGCCGCGTA
2
0
0
0





1551
CCGGGGCGTCTCGTAC
2
0
0
0





1552
AGCGGCTGTGCACAAA
0
0
0
2





1553
TGTCAGTTTGTTTAATCCAA
0
0
0
1





1554
TGTCAGTTTGTTATTACCAA
0
0
0
1





1555
TGTCAGGCACCATCAATAA
0
0
0
1





1556
TGATCTTGACACTTAAAGCC
0
0
0
1





1557
TCGTAGGCACCATCAAT
0
0
0
1





1558
TCGATCCCGGGTTTCGGCACCA
0
0
1
0





1559
TCGACTCCCGGTATGGGAACCA
0
0
0
1





1560
TAGGGAGGTTATGATTAACTTTT
0
0
0
1





1561
TAAAGTGCTTAGTGCAGGTA
0
0
0
1





1562
GTTTATGTTGCTTACCTCC
0
0
1
0





1563
GTAGATAAAATATTGGCG
1
0
0
0





1564
GGCGGGGACGACGTCAG
0
0
0
1





1565
GGCGGCGTCGCGGCGGGTC
0
1
0
0





1566
GGAGGGGGTGAACAAAAAGAAAAA
0
0
0
1





1567
GCTAAACCTAGCCCCAAACCCACTCC
0
0
0
1



ACA









1568
CTGGATAGCGCACTTCGTT
0
0
0
1





1569
CGGGCGAGGGGCGGACGTTCG
0
0
1
0





1570
CGGACCTATACCGGA
1
0
0
0





1571
CCCCGGGTTCAATCCCCGGCACCTC
0
0
1
0



CACCA









1572
CCCCCCACAACCGCGAA
0
1
0
0





1573
CCCAGCATCTCCTGTGTTTA
0
1
0
0





1574
CCCACGTTGGGACGCCA
1
0
0
0





1575
ATCGTATCCCACTTCTGACACCA
0
0
0
1





1576
ATCACGTCCGTGCCTCCA
0
1
0
0





1577
ATAGCAATGTCAGCAGTACCT
0
0
1
0





1578
ACCCTGCTCGCTGCGCCA
9
17
4
7





1579
TCCCACCCAGGGACGCCA
8
2
1
0





1580
TCGTAGGCACATCAATA
0
0
0
4





1581
CCCCCACAACCGCGTA
0
4
0
0





1582
ACCCCGTCCGTGCCTCCA
2
1
1
0





1583
AAAAAAGACACCCCCCACA
0
0
0
3





1584
TGTCAGTTTGTTAACCCAA
0
0
0
2





1585
TCCCTGTGGTCTAGTGGTTAGG
0
0
1
1





1586
GGGGGGGTAAAAAAA
0
0
0
1





1587
GGGGGGGGAAAAAAAA
0
0
0
1





1588
CGGGCCCGGGTCTTCCC
1
1
0
0





1589
CCGCCCCCCGTTCCCCCCA
0
2
0
0





1590
ACCCCCGGCTCCTCCACCA
0
1
0
1





1591
TTTGGTGGAAATTTTTTGA
0
0
0
1





1592
TGTCAGTTTGTTATACCAA
0
0
0
1





1593
TGTCAGTTTGTAATTATCCCAA
0
0
0
1





1594
TGTCAATTTTTAACCCAA
0
0
0
1





1595
TGCTAGGGTAAAAAAAAAA
0
0
0
1





1596
TGCAACTCCAAATAAAAGTACCA
0
0
0
1





1597
TGAGGTAACGGGGAATTA
0
0
0
1





1598
TCCTCGGCATCTCCACCA
0
0
1
0





1599
TCATATGAAGTCACCCTAGCCATC
0
0
1
0





1600
TCAGTTTGTTTATTAACCCAA
0
0
0
1





1601
TCAGCGTGTCTTTGCCCT
1
0
0
0





1602
TCACTGGTGGTCTAGTGGT
0
1
0
0





1603
TCACAATGCTGCCACCA
1
0
0
0





1604
TAGTTGTTAATTAACCCAA
0
0
0
1





1605
TAGTCCTCATCGCCCTCC
0
1
0
0





1606
TAAAGTGCTTATAGTGCGGGTAA
0
0
0
1





1607
GTCCCACCAGAGTCGCCA
0
0
1
0





1608
GGGGGAGGGGCCAAAAAAA
0
0
0
1





1609
GGGACGCCGCGGTGTCG
1
0
0
0





1610
GGGAATACCGGGTGCTTTAGGCTT
0
1
0
0





1611
GGAAGAAGGTGGTGGTATA
0
0
0
1





1612
GCGGTGAAATGCGTA
1
0
0
0





1613
GCGGGGAAGGTGGCAAA
0
0
0
1





1614
GCGACGACCTCGCGCCCACCTGGTCA
0
1
0
0





1615
GCCACCCGATACTGCTGT
0
1
0
0





1616
GATGTATGCTTTGTTTCTGTT
0
0
1
0





1617
GAGGGGGATTTAGAAAAAAA
0
0
0
1





1618
GAAGGAAAGTTCTATAGT
0
0
0
1





1619
GAAGCGGCTCTCTTATTT
0
0
0
1





1620
GAACGAGACTCTGGCATGCTGA
0
0
1
0





1621
CTGGTAGGCCCATCAAT
0
0
0
1





1622
CGGGGCCGATCGCGCGC
0
1
0
0





1623
CGGCCCCGGGTTCCTCCC
1
0
0
0





1624
CGAGCCCGGTTAGTA
1
0
0
0





1625
CGACTCTTAGCGGTGGA
0
0
1
0





1626
CGAATCCCACTTCTGACACCA
0
0
0
1





1627
CGAAAGGGAATCGGGTC
1
0
0
0





1628
CCTTAGGTCGCTGGTAAA
0
0
1
0





1629
CCGTGCGAGAATACCA
0
1
0
0





1630
CCGGTCTCTCAAGCGGCC
1
0
0
0





1631
CCCGGCCCTCGCGCGTCC
0
1
0
0





1632
CCCCGGCATTTCCACCA
0
0
1
0





1633
CCCCCCCGGCTCCTCCACCA
0
0
0
1





1634
CCCCCCACAACCGCTA
0
1
0
0





1635
CCCAAGTATTGACTCACCC
0
1
0
0





1636
CCAGTAAGCGCGAGTC
1
0
0
0





1637
CCAAAGAAAGCACGTAGAG
0
0
0
1





1638
CATGTTTAACGGCCGCGGT
0
0
1
0





1639
CAGTTTGTAATTAACCCAA
0
0
0
1





1640
CAGGAACGGCGCACCA
0
0
1
0





1641
CAGAACCCTCTAAATCCCC
0
0
1
0





1642
CACCCGGCTGTGTGCACATGTGT
1
0
0
0





1643
CAATTGGACCAATCTATC
0
0
1
0





1644
ATTCCTGTACTGCGATA
0
0
0
1





1645
ATCCCTGCGGCGTCTCCA
0
0
0
1





1646
ATCCCACCGCTGCCATCA
0
1
0
0





1647
AGTCAATAGAAGCCGGCGTA
0
0
1
0





1648
AGGTTCGTTTGTAAAAA
0
0
0
1





1649
AGGTCCTGGGTTTAAGTGT
0
0
0
1





1650
AGGGGGAAGTTCTATAGTC
0
0
0
1





1651
AGGCTGTGATGCTCTCNTGAGCCCT
0
0
1
0





1652
AGCCCCTCTCCGGCCCTTA
0
1
0
0





1653
ACTACCACCTACCTCCC
1
0
0
0





1654
ACGCCCTTCCCCCCCTTCTTT
0
0
0
1





1655
ACCCCACTCCTGGTGCAC
1
0
0
0





1656
ACCACCTGATCCCTTCCC
1
0
0
0





1657
ACAGCTAAGCACCCACCA
0
0
1
0





1658
ACACATGTTTAACGGCC
1
0
0
0





1659
AATTAGGGACCTGTATG
0
0
1
0





1660
AATGGCCCATTTGGGCAAACA
0
0
0
1





1661
AAAGCGGCTGTGCAAACA
0
0
0
1





1662
ATCCTGCCGACTACGCCA
13
15
13
6





1663
TCGAATCCCACTCCTGACACCA
1
2
7
7





1664
ATCCCATCCTCGTCGCCA
0
0
10
3





1665
TCGATTCCCCGACGGGGAGCCA
1
1
1
9





1666
ATCCGGGTGCCCCCTCCA
2
4
0
1





1667
TCGACTCCTGGCTGGCTCGCCA
0
2
2
1





1668
TCCCGGGCGGCGCACCA
2
2
1
0





1669
ATCCCACCAGAGTCGCCA
0
0
2
3





1670
TCCCCGGCATCTCCACCAA
0
1
2
0





1671
TCAAATCACGTCGGGGTCACCA
0
1
2
0





1672
GCGGTGGATCACTCGGCTCGTGCGT
0
0
0
3





1673
TCGATCCCCGTACGGGCCACCA
0
0
1
1





1674
TCGAGCCTCACCTGGAGCACCA
0
0
2
0





1675
TCCGGCTCGAAGGACCA
0
0
2
0





1676
CCGGGTGTTGTAGA
2
0
0
0





1677
TGTAGCGTGGCCGAGCGGT
0
1
0
0





1678
TGGGGCGACCTCGGAGCAG
0
0
1
0





1679
TGGCGTCCTAAGCCAGGGATTGTGGGT
0
0
0
1





1680
TGGCAGGGGAGATACCATGATTT
0
0
1
0





1681
TCTGATCAGGGTGAGCATC
0
1
0
0





1682
TCGTAGGCACCATCCAT
0
0
0
1





1683
GGGAAACGGGGCGCGGCTG
0
1
0
0





1684
CTACTCCTGCTCGCATCTGCTATA
0
0
1
0





1685
CGGGTGGGTTTTTACCGG
1
0
0
0





1686
CGAGGAATTCCCAGTAAG
0
0
1
0





1687
CGAACGCACTTGCGGCCCC
1
0
0
0





1688
CCCCGCGCGGGTTCGAATC
1
0
0
0





1689
AGGGGTATGATTCCCGCTT
0
0
0
1





1690
CGGGGCCACGCGCGCGTC
3
6
0
0





1691
TGGCGCTGCGGGATGAAC
0
3
1
0





1692
CCCCCCACTGCTAAATTTGACTGGCTT
0
0
2
2





1693
TAAAGGTTCGTTTGTAAAA
0
0
0
3





1694
CGGGGCCGAGGGAGCGA
1
2
0
0





1695
GGGTTAGGCCTCTTTT
0
1
1
0





1696
CTGCGGAAGGATCATTA
1
0
1
0





1697
CCCTACCCCCCCGG
0
2
0
0





1698
CCCGCCGGGTCCGCCC
2
0
0
0





1699
CCCCGCGCCCTTCTCTCTC
0
2
0
0





1700
CAGGCCTCCCTGGAATC
2
0
0
0





1701
AGTCCCACCCGGGGTACCA
0
0
0
2





1702
TTGACACGCCCCAGTGCCCTGT
1
0
0
0





1703
TGGGAGCGGGCGGGCGGTC
0
1
0
0





1704
TGGCGTGGAGCCGGGCGT
0
1
0
0





1705
TGGAGGTCCGTAGCGGT
1
0
0
0





1706
TGAAGAAGGTCTCGAACA
0
0
0
1





1707
TCTCGCCGGGGCTTCCA
0
1
0
0





1708
TCGTAGCACCATCAATAA
0
0
0
1





1709
TCCGGGTCCCCCCTCCA
0
1
0
0





1710
TCCGGGGCTGCACGCGCGCT
0
1
0
0





1711
TCCGGCCGTGTCGGT
1
0
0
0





1712
TCCCTGTCCTCCAGGAGT
0
0
0
1





1713
TCCCCTCCTCGTCGCCA
1
0
0
0





1714
TCCCAGGTAGTCTAGTGGT
1
0
0
0





1715
TATTCATTTATCCCCAGCCTAT
0
1
0
0





1716
TAGTTGTTATAACCCAA
0
0
0
1





1717
TAGATCACCCCCTCCCC
0
1
0
0





1718
TACCGGCACCTGGCGCC
1
0
0
0





1719
GTATAGGGGCGAAAGAC
0
0
1
0





1720
GTAGCTGGTTCCCTCCGAA
0
0
0
1





1721
GGTAAGAAGCCCGGCTC
0
0
1
0





1722
GGGGGGGTTTAAAAAAAAA
0
0
0
1





1723
GGGGCGCACTACCGGCC
1
0
0
0





1724
GGGAGAGGCTGTCGCTGCG
0
0
0
1





1725
GGCGGGTGAAGCGGCG
0
1
0
0





1726
GCGGTTCCGGCGGCGTC
0
1
0
0





1727
GCGGGGCGCCTAGGCCTGGTTTGT
1
0
0
0





1728
GCGGCGGTCGGCGGGCGGCGGG
1
0
0
0





1729
GAGGGGGGGGGTGGGGGGGGA
0
0
0
1





1730
CTGTCGGCCACCATCAT
0
0
0
1





1731
CTGCAACTCGACCCCA
0
1
0
0





1732
CTCCTCTCCCCGCCCGCCG
0
0
1
0





1733
CTCAAAGATTAAGCCATGCATGTCTA
0
0
1
0





1734
CTACGCCGCGACGAG
1
0
0
0





1735
CGGGTGACGGGGAATCAGGGTT
1
0
0
0





1736
CGGGCAGCTTCCGGGA
0
0
0
1





1737
CGGGAGGCCCGGGTCCTG
1
0
0
0





1738
CGGCCCCGCATCCTCCC
1
0
0
0





1739
CGCGGGTAAACGGCGGGAGTAACTAT
0
0
1
0





1740
CGCCCCCCGTTCCCCCCTCC
0
1
0
0





1741
CGAGCGGAAACACCA
1
0
0
0





1742
CGAACCCGGCACCGC
1
0
0
0





1743
CCTCGGGCCGATCGCAC
0
0
1
0





1744
CCTATATATCTTACCA
0
1
0
0





1745
CCGTGGCGGCGACGACC
0
1
0
0





1746
CCGGGTTCCGGCACCA
1
0
0
0





1747
CCGCGAGGGGGGCCCG
1
0
0
0





1748
CCGCCTCACGGGACCA
1
0
0
0





1749
CCGCCCGTCCCCGCCCCTTG
0
1
0
0





1750
CCCGGGGCCGCGGTTCCG
1
0
0
0





1751
CCCGAGCCGCCTGGAT
0
1
0
0





1752
CCCGACGGCCGAACT
0
1
0
0





1753
CCCCGGGGAGCCCGGCGGG
1
0
0
0





1754
CCCCCTCGCGGCCCTCCCC
0
1
0
0





1755
CCCCCCGTGGCGGCGAC
0
1
0
0





1756
CCACCCAGGGCACGCCA
1
0
0
0





1757
CACGGGTGACGGGGAA
1
0
0
0





1758
ATGGGGAGGAAAAAAAAAAAAAA
0
0
0
1





1759
ATCCCACCGCTGCCCCCA
0
0
0
1





1760
ATCACGTCGGTCACCA
0
0
0
1





1761
ACGGGAAACCTCACCCGGCCCGG
0
0
1
0





1762
ACAGAGGCTTACGACCCCTTATTT
0
0
1
0





1763
AAAAAGGCATAATTAAACTT
0
0
1
0









Abundance and Evolutionary Conservation.


Previously reported miRNAs appeared to be generally more abundant than newly discovered miRNAs. Approximately 50% of previously reported miRNAs appeared in the libraries with more than 10 occurrences compared to 29% of the newly discovered miRNAs (FIG. 20A). Moreover, 48% of known miRNAs were expressed at all stages of mature B-cell development, while newly identified miRNAs showed a more distinct stage-specificity (FIG. 20B), consistent with the notion that presently known miRNAs are mostly representative of ubiquitously expressed miRNAs.


In order to investigate the presence of orthologous miRNA in other mammalian species, we relied on UCSC-provided Blastz pairwise alignments between human and target species and investigated conservation using two complementary methods, detailed in Supplemental Experimental Procedures. The analysis was performed on the complete set of miRNAs deposited in the miRBase database and on the miRNAs (known and new) represented in the B-cell libraries. Alignments of the human mature miRNA to its target species were required to have either perfect conservation of the entire mature miRNA sequence or conservation of seeds composed of seven bases starting from the second position of the human mature sequence followed by conservation of 3 bases starting from the 12th, 13th or 14th position as suggested by (Grimson et al., 2007) (FIG. 20C and Appendix Table 11).


The majority of miRBase-miRNAs showed conservation across mammalian genomes, from primates to rodents. Conservation frequency mimicked known phylogenetic distances to human, with the highest conservation in chimp and lowest in rat. The conservation frequencies of known and newly identified miRNAs in B cells were similar in chimp (Pan troglodytes) and monkey (Macacus rhesus), especially when conservation requirements were restricted to the seed region of miRNAs. However, conservation frequencies in dog, mouse and rat were significantly divergent, with known miRNAs more likely to exhibit conservation than new candidate miRNAs (FIG. 20C and Appendix Table 11). In summary, previously unreported miRNAs expressed at specific stages of B-cell differentiation were generally less abundant and showed a lower degree of conservation across species, as shown for other tissue-specific miRNAs.


Validation of Previously Unreported miRNAs.


All 75 newly identified miRNAs were investigated by RT-PCR analysis in order to independently validate their existence in vivo in B-cell lines and cells isolated from tonsils. Positive results were obtained in 66 of the cases (see FIG. 21A for representative results and Table 12). Eighteen previously unreported miRNAs were also tested by RNA blot analysis and 11 were detectable (FIG. 21B and Table 12), either using total cellular RNA or upon enrichment for the short-RNA fraction. Overall, 88% of the newly cloned and computationally validated miRNAs were detectable by RNA blot and/or RT-PCR. The validation process also led to the identification of numerous miRNA which are differentially regulated either in normal versus transformed cells (see examples CU-1440, CU-1241, CU-1276 and CU-1137 in FIG. 21) as well as during the GC reaction (FIG. 22).









TABLE 12







Summary of results obtained from the Northern Blot and/or


RT-PCR analyses performed on newly identified mature


miRNAs cloned multiple times in the B-cell libraries.












Seq ID





ID
No.
Mature miRNA sequence
Northern Blot
RT-PCR





CU-1369
1764
TCCCCGGCATCTCCACCA
negative
positive





CU-1254
1765
TCCCCGGCACCTCCACCA
positive
positive





CU-1298
1766
ATCCCGGACGAGCCCCCA
not tested
positive





CU-1303
1767
ATCCCACTTCTGACACCA
positive
positive





CU-1173
1768
ATCCCACTCCTGACACCA
positive
positive





CU-1242
1769
TCCCCGTACGGGCCACCA
not tested
positive





CU-1550
1770
CGGAAGCGTGCTGGGCCC
not tested
positive





CU-1186
1771
TCCCCGACACCTCCACCA
not tested
positive





CU-1368
1772
GACGAGGTGGCCGAGTGG
positive
positive





CU-1243
1773
GTCCCTTCGTGGTCGCCA
not tested
positive





CU-1470
1774
CTCCTGGCTGGCTCGCCA
not tested
positive





CU-1300
1775
TCCTCACACGGGGCACCA
not tested
positive





CU-1264
1776
GAGGGGGACCAAAAAAAA
not tested
negative





CU-1212
1777
TCCCCGGCACTTCCACCA
not tested
positive





CU-1345
1778
AGAACACTACGAGCCACA
not tested
positive





CU-1352
1779
ACCCCACTTCTGGTACCA
negative
positive





CU-1363
1780
CGTTCGCGCTTTCCCCTG
not tested
negative





CU-1220
1781
TTCCCCGACGGGGAGCCA
not tested
positive





CU-1197
1782
ATGTGGTGGCTTACTTTT
not tested
positive





CU-1241
1783
AGTCCCATCTGGGTCGCCA
positive
positive





CU-1148
1784
TGGTGTGGTCTGTTGTTTT
not tested
positive





CU-1288
1785
CGTCCATGATGTTCCGCAA
not tested
positive





CU-1528
1786
TAGGGGTATGATTCTCGCT
not tested
negative





CU-1175
1787
GGCGTGATTCATACCTTTT
not tested
positive





CU-1570
1788
ATCCCCAGCATCTCCACCA
not tested
positive





CU-1269
1789
TACCGAGCCTGGTGATAGC
not tested
positive





CU-1339
1790
ATCCCCAGCACCTCCACCA
not tested
positive





CU-1132
1791
GCCGGGTACTTTCGTATTTT
not tested
negative





CU-1370
1792
CTGATTGCTCCTGTCTGATT
not tested
positive





CU-1545
1793
CCACGAGGAAGAGAGGTAGC
not tested
negative





CU-1307
1794
ACCCCACTATGCTTAGCCCT
not tested
positive





CU-1294
1795
AAAGGACCTGGCGGTGCTTC
not tested
positive





CU-1371
1796
TCTAGAGGAGCCTGTTCTGTA
not tested
positive





CU-1244
1797
GTCAGGATGGCCGAGCGGTCT
not tested
positive





CU-1276
1798
TCGATTCCCGGCCAATGCACCA
positive
positive





CU-1142
1799
TCGATTCCCGGCCCATGCACCA
positive
positive





CU-1379
1800
TCGGGTGCGAGAGGTCCCGGGT
negative
positive





CU-1381
1801
TCGATTCCCGGTCAGGGAACCA
not tested
positive





CU-1403
1802
GCATTGGTGGTTCAGTGGTAGA
positive
positive





CU-1457
1803
TTCTCACTACTGCACTTGACTA
not tested
positive





CU-1557
1804
GGAGAGAACGCGGTCTGAGTGGT
not tested
positive





CU-1542
1805
GGCTGGTCCGATGGTAGTGGGTT
not tested
positive





CU-1221
1806
TGTGCTCCGGAGTTACCTCGTTT
not tested
negative





CU-1380
1807
ATAGGTTTGGTCCTAGCCTTTCT
not tested
positive





CU-1277
1808
GAGCCATGATGATACCACTGAGC
not tested
positive





CU-1281
1809
GCAGCGCCAGCCTCCCGCCCTAC
not tested
positive





CU-1524
1810
CCCCCACAACCGCGCTTGACTAGC
not tested
positive





CU-1477
1811
CTCCCACTGCTTCACTTGACTAGC
not tested
positive





CU-1575
1812
CCCCCCACTGCTAAATTTGACTG
not tested
positive




GA







CU-1137
1813
GCTAAGGAAGTCCTGTGCTCAG
positive
positive




TTTT







CU-1538
1814
GGCTGGTCCGAGTGCAGTGGTG
not tested
positive




TTTA







CU-1153
1815
CCCCCCACTGCTAAATTTGACTG
positive
positive




GCTT







CU-1513
1816
GCGGGTGATGCGAACTGGAGTC
positive
positive




TGAGC







CU-1293
1817
AGCAGTGATGTCCTGAAAATTCT
not tested
negative




GAAG







CU-1388
1818
TCCCTGGTGGTCTAGTGGTTAG
negative
positive




GATTCG







CU-1180
1819
AACCGAGCGTCCAAGCTCTTTC
not tested
positive




CATTTT







CU-1382
1820
TCCTCGTTAGTATAGTGGTGAGT
negative
positive




ATCCC







CU-1251
1821
CCCACCCAGGGACGCCA
negative
positive





CU-1191
1822
GCCCGCATCCTCCACCA
negative
positive





CU-1453
1823
CCCTGCTCGCTGCGCCA
not tested
positive





CU-1222
1824
TCACGTCGGGGTCACCA
not tested
Positive





CU-1178
1825
AGGGTGTGCGTGTTTTT
not tested
Positive





CU-1488
1826
TCCTGCCGCGGTCGCCA
not tested
Positive





CU-1164
1827
GAGAGCGCTCGGTTTTT
not tested
Negative





CU-1486
1828
CTGCTGTGATGACATTC
not tested
Positive





CU-1130
1829
CCCGGGTTTCGGCACCA
not tested
Positive





CU-1155
1830
TCCCCGCACCTCCACCA
not tested
Positive





CU-1278
1831
TAACGGCCGCGGTACCC
not tested
Positive





CU-1246
1832
AGGGGGGTAAAAAAAAA
not tested
Positive





CU-1440
1833
TGGTTATCACGTTCGCC
not tested
Positive





CU-1213
1834
TCACCCCATAAACACCA
not tested
Positive





CU-1146
1835
AGAAAGGCCGAATTTTA
not tested
Positive





CU-1323
1836
TGTATTGTGAGACATTC
not tested
Positive





CU-1324
1837
TCTCGGTGGAACCTCCA
not tested
Positive





CU-1396
1838
TAAGTGTTTGTGGGTTA
not tested
negative









In order to gain preliminary evidence of the functionality of the previously unreported miRNAs, a small subset of these miRNAs which were fully validated at the expression level was tested for incorporation in the functional miRNA-mRNA complex by co-immunoprecipitation with Ago2 proteins (Mourelatos et al., 2002). The results showed that the RNA fraction associated with the Ago complex was indeed enriched for each of the four tested previously unreported miRNAs (FIG. 25), confirming that the identified sequences enter the expected miRNA functional pathway.


Indirect clues on the functionality of miRNAs may also be obtained analyzing the effect of stage-specific miRNAs on the corresponding transcriptome since most miRNAs have been showed to affect the expression of their targets albeit to a modest degree (Filipowicz et al., 2008). Toward this end, the targets of 15 previously unreported GC-over-expressed miRNAs were predicted by two algorithms (miRanda and RNA22) (John et al., 2004; Miranda et al., 2006) and were tested for enrichment in genes down-regulated in GC versus naïve B cells. Eleven out of 15 miRNA showed an increase (and only two a decrease) in their candidate target enrichment p-value for GC down-regulated genes compared to control populations (FIG. 26 and Table 13). These results suggest that indeed miRNAs associated with GC B cells specifically affect the GC transcriptome.









TABLE 13







Enrichment for predicted miRNA targets in genes downregulated


in CB and in memory compared to naïve B cells.










Targets enrichment in
Targets enrichment in


CB over-expressed
genes downregulated in
genes downregulated in


miRNA
CB vs N (p-value)
M vs N (p-value)












CU-1380
0.0001
0.4079


CU-1388
0.0002
0.2699


CU-1477
0.0002
0.0514


CU-1538
0.0014
0.1609


CU-1142
0.0016
0.0012


CU-1382
0.0016
0.0242


CU-1403
0.0026
0.0032


CU-1470
0.0029
0.5392


CU-1276
0.0193
0.0187


CU-1371
0.0413
0.2252


CU-1153
0.091
1


CU-1575
0.1598
1


CU-1370
0.1708
0.4595


CU-1303
1
1


CU-1513
1
1









In summary, previously unreported miRNAs identified by cloning and computational analysis were validated at the expression level by multiple detection methods. For a small subset tested, their incorporation in the Ago complex and their activity on the GC transcriptome suggests biological functionality.


Transcriptional and Post-Transcriptional Regulation.


Most newly identified miRNAs tested by RNA blot showed a long abundant transcript (>150 nt) likely corresponding to the primary miRNA transcript and a second transcript (˜60-80 nt) consistent with the precursor miRNA. As shown in FIG. 21C (top panel), the precursor miRNA and the correspondent mature miRNA may be produced in some cell type but not in others, suggesting transcriptional regulation. Conversely, the relative abundance of precursor and mature miRNA was different is some cell types (FIG. 21C, bottom panel) suggesting the existence of post-transcriptional regulation most likely targeting the Dicer-dependent pre-miRNA processing (Lee et al., 2007; Michael et al., 2003; Thomson et al., 2006).


Taken together, these observations suggest that the expression of mature miRNAs may be affected by both transcriptional and post-transcriptional regulatory mechanisms.


Distinct miRNA Signatures in Normal B-Cell Subpopulations.


In order to further investigate whether specific miRNA regulation occurred in normal B-cell development or in transformed cells, miRNA representation was examined in libraries constructed from naïve, GC and memory B cells, as well as from the Ramos BL cell line. Differential expression of numerous known and newly identified miRNAs was evident during B-cell differentiation and GC transit as shown by hierarchical clustering using miRNA frequencies (defined as the fraction of the total pool of cloned miRNAs represented by a given miRNA in a library) obtained from the cloning data (FIG. 22A). Naïve and memory B cells appeared similar, sharing a large fraction of the most abundant miRNAs. Conversely, centroblasts showed a more distinct miRNA profile with a sizeable fraction of abundant miRNAs being specifically expressed in the CB library, suggesting specific functions. Some miRNAs were expressed in the GC-derived Ramos cells, but not in normal GC B cells, or vice versa in the normal but not in the tumor cells, suggesting that malignant transformation affects miRNA expression.


To independently validate results of the cloning experiment, miRNA expression profiling was performed of centroblasts, naïve and memory B cells (six donors/each) using a commercial microarray representative of 723 known human miRNAs (miRBase v.10.1). The Spearman correlation between cloning and microarray data is 0.7 corresponding to a p-value <3.9e-28 (FIG. 27). Each B-cell population showed a distinct miRNA expression profile. Consistent with the cloning data (FIG. 22A), GC B cells appeared to be quite distinct from naïve and memory B cells, which instead shared expression of a large fraction of miRNAs (FIG. 22B). The expression of several miRNAs was tested by qRT-PCR analysis, which confirmed that the microarray data were quantitatively accurate. Overall, these results show that the GC reaction is characterized by the specific expression of multiple miRNAs.


Discussion


The combination of cloning procedures and computational tools used in this study led to the identification of a large fraction of miRNA expressed during B-cell differentiation. These included 75 previously unreported miRNAs, as well as a potentially distinct class of short-RNAs not fulfilling current criteria for miRNAs. These findings have general implications for the understanding of the total miRNA content of the human genome as well as for future studies on the role of miRNAs in B-cell differentiation, function and lymphomagenesis.


The discovery of 75 previously unreported miRNAs expressed in normal and/or malignant B cells is in contrast with a previous study that reported the discovery of only 12 new human miRNA (Landgraf et al., 2007) from an analysis of a large panel of different organ systems and cell types and suggested that most miRNAs have already been identified and are ubiquitously expressed (Landgraf et al., 2007). These discordant results and conclusions may be due i) to the higher number of clones per library sequenced in this study (3500 versus 1300 on average in (Landgraf et al., 2007)), which allowed the detection of low-abundance miRNA species and ii) to the criteria applied in the miRNA identification which do not include conservation and allow consideration of repetitive elements (see Supplemental Experimental Procedures in Example 3). Moreover, the relatively lower degree of evolutionary conservation of previously unreported miRNAs may have prevented the cross-species identification of miRNAs using murine libraries (Chen et al., 2004; Neilson et al., 2007). Consistent with these observations, a recent report on short-RNAs in mouse embryonic stem cells discovered Dicer-dependent miRNAs characterized by both low abundance and low degree of conservation (Calabrese et al., 2007). Since 88% of the previously unreported miRNAs have been independently detected by RT-PCR and/or RNA blot analyses, our cloning and computational approach is largely validated.


We note that a fraction of the validated miRNAs display similarity to the 3′-end of post-transcriptionally modified tRNAs, raising the possibility that they may derive from loci with t-RNA homology or by direct processing of t-RNAs. We also identified a large set of candidate miRNAs (101 unreported, to our knowledge, and 27 known) that have been cloned as single occurrences in the B-cell libraries (Table 7). This group of candidate miRNAs has not yet been fully investigated, but nevertheless they may include bona fide miRNAs because 3 out of 3 tested were detectable by RNA blot or RT-PCR analyses. Thus, our data in B cells suggest that a large number of low-abundance, recently evolved, tissue-specific miRNAs remain to be discovered.


Two categories of short-RNAs were identified that could not be annotated as bona fide miRNAs. The first category is represented by short-RNAs which display all features required by the computational pipeline to be defined as candidate miRNAs, but nevertheless have an atypical length (<17 nt or >28 nt; 75 candidate miRNAs). Sequences belonging to this first category may include bona fide miRNAs since 2 out of 2 tested were detectable by RT-PCR. The second category is represented by those short-RNAs for which classic pre-miRNA structures could not be identified in the genome and no similarity to other non-coding RNA was found in the available databases. These short-RNAs may either be miRNA for which RNA secondary structure prediction algorithms failed to predict the correct hairpin structure or may represent new miRNA species of presently unknown mechanism of generation or other not yet described types of short-RNAs.


Finally, this analysis led to the discovery of short-RNAs that could not be accurately mapped to the genome. Considering that a fraction of these RNAs were cloned multiple times and showed a stage-specific behavior, we suggest that such short-RNAs do actually exist and that the lack of a match to the human genome may be due to polymorphisms, editing and other post-transcriptional modifications or to an incomplete or inaccurate sequencing of the corresponding genomic regions.


The specificity in mature miRNA expression appears to be regulated at the transcriptional as well as at the post-transcriptional, i.e. pre-miRNA processing, level. Indeed, the accumulation of pre-miRNA in absence of a mature miRNA can occur in a cell type-restricted manner, suggesting the presence of a mechanism of regulation at the pre-miRNA processing step. Both regulatory mechanisms may act during normal differentiation and may also be dysregulated during transformation as a consequence of genetic or epigenetic alterations (Lee et al., 2007; Michael et al., 2003; Thomson et al., 2006). Indeed, miRNAs CU-1137 and CU-1368 represent examples of transcriptional activation and post-transcriptional silencing associated with malignant transformation, respectively.


The stage-specific expression of various miRNAs strongly suggests highly specialized regulatory functions in B-cell biology. The role of miRNAs that show cell type-specific functions in lymphocytes has just begun to be elucidated (Dorsett et al., 2008; Li et al., 2007; Rodriguez et al., 2007; Teng et al., 2008; That et al., 2007; Xiao et al., 2007). The miRNAs specifically associated with GC or non-GC B cells by either cloning or miRNA expression profiling have not been previously reported in B-cell differentiation with the exception of miR-150 (Xiao et al., 2007). For example the miR-199 and miR-125 families as well as miR-138 show a distinct expression in GC B cells although none of these miRNAs has been investigated for a role in this cell compartment. The extent of post-transcriptional regulation added by miRNAs will be fully uncovered only in the context of the complex network of cellular interactions (Basso et al., 2005), which will require the integration of large scale gene and miRNA expression data.


miRNA expression can be affected by malignant transformation. For instance, the miR-17-92 cluster, previously reported as a potential oncogene (He et al., 2005), was found over-expressed in Ramos cell line compared to GC B cells. Moreover, several miRNAs (i.e. CU-1137, CU-1148) show expression in Ramos cells and in several additional BL cell lines, but not in their normal GC counterpart. Vice versa, as observed for the miR-199 family, the expression of some miRNAs is lost in the tumor cells. The data herein represents a useful basis to investigate whether lymphoma-associated chromosomal lesions affect genomic regions associated with miRNA expression.


Finally, the differences in miRNA expression profile between GC and non-GC B cells resembled those observed by expression profiling of coding genes (Klein et al., 2003), consistent with the previous observation that miRNA profiling may be equally or more informative in discriminating tumor phenotypes (Calin et al., 2005; Lu et al., 2005). This suggests that miRNA expression profiling, especially if including new B-cell specific miRNAs, may be useful in the differential diagnosis of lymphoid malignancies.


The expanded B-cell miRNome described here represents a resource which can be used to identify miRNAs expressed during the GC transit as well as specific differences in miRNA expression in normal versus lymphoma cells, and which can guide studies to unveil the function of miRNAs in normal B cell development and lymphomagenesis.


Experimental Procedures


Generation of Short-RNA Libraries.


Purification of naïve, memory and GC B cells was performed as previously reported (Klein et al., 2003) using magnetic cell sorting of mononucleated cells obtained from human tonsils. Total RNA was purified using the Trizol Reagent (Invitrogen) following the manufacturer's indications. The short-RNA libraries were generated using an established protocol described in detail in (Lau et al., 2001). Briefly, total RNA was separated on 15% polyacrylamide gel and the fragment corresponding to 15-30 nucleotides length was excised. The purified small RNAs were linked to adaptor oligonucleotides and gel purified. Upon adaptor ligation, RNA was reverse transcribed and cDNA was PCR amplified and cloned into pCR2.1-TOPO vector (Invitrogen). Sequencing was performed on colony PCR amplicons.


Computational Identification of Precursor and Mature miRNAs.


The bioinformatics miRNA analysis pipeline (FIG. 18) includes: (a) identification of short-RNAs from each library, (b) identification of exact and partial matches of the short-RNA sequences to the human genome, (c) testing each short-RNA genomic region for compatibility with hairpin secondary structures, (d) clustering genomic regions to predict mature miRNAs, (e) annotating and filtering short-RNAs and miRNAs candidates, (f) estimation of predicted miRNA frequencies in the libraries and (g) clustering short-RNAs that do not support miRNA candidates. The details are reported in the Supplemental Experimental Procedures in Example.


Orthology Analysis.


The identification of putative orthologous sequences of known and predicted precursor and mature human miRNAs in chimp (panTro2), monkey (rheMac2), dog (canFam2), mouse (mm8) and rat (rn4) was performed using UCSC-provided Blastz (Schwartz et al., 2003) pairwise alignments between human and target species. The details are reported below.


miRNA Expression Profiling.


The miRNA expression profiles were generated using the Human miRNA Microarray kit (Agilent Technologies) that allows detection of 723 known human (miRBase v.10.1) and 76 human viral miRNAs, following the manufacturer's indications. Analysis of raw data was performed using the Feature Extraction Software 9.5.3.1 (Agilent Technologies). The dendrograms (FIG. 22) were generated using a hierarchical clustering algorithm based on the average-linkage method (Eisen et al., 1998; Hartigan, 1975) and Spearman's correlation as provided by the geWorkbench platform (http://www.geworkbench.org).


RT-PCR Analysis.


Small RNA fractions were purified using the Trizol Reagent (Invitrogen) and the PureLink miRNA Isolation Kit (Invitrogen), following the manufacturer's indications. RT-PCR was performed as previously described (Sharbati-Tehrani et al., 2008). Briefly, miRNA sequences were reverse-transcribed from 50 ng short-RNA using Superscript III First Strand Synthesis Kit (Invitrogen), in the presence of 0.2 μM RTFS primer (miRNA-specific primers, see Table 14). 1/10th of the cDNA volume was then used as template for 34 cycles of PCR amplification in the presence of 4 nM SS primer (miRNA-specific primers, see Table 14) and 0.4 μM each of MPF and MPR universal primers (Table 14). PCR products were separated on 12% non-denaturing polyacrylamide gel, detected by SybrGold (1:10,000 dilution; Invitrogen) and visualized under UV light.









TABLE 14





List of probes and primers used for Northern Blot and RT-PCR analyses,


respectively.





















Seq
mature miRNA






ID
sequence
Seq ID
Probe sequence
Hybridization


ID
No.
(5′-3′)
No.
(5′-3′)
Temperature [° C.]





CU-
1839
GCTAAGGAAGTCCT
1926
AAAACTGAGCACAGGACTT
60


1137

GTGCTCAGTTTT

CCTTAGC






CU-
1840
TCGATTCCCGGCCC
1927
TGGTGCATGGGCCGGGAAT
55


1142

ATGCACCA

CGA






CU-
1841
CCCCCCACTGCTAA
1928
AAGCCAGTCAAATTTAGCA
50


1153

ATTTGACTGGCTT

GTGGGGGG






CU-
1842
ATCCCACTCCTGAC
1929
TGGTGTCAGGAGTGGGAT
50


1173

ACCA








CU-
1843
AGTCCCATCTGGGT
1930
TGGCGACCCAGATGGGACT
55


1241

CGCCA








CU-
1844
TCCCCGGCACCTCC
1931
TGGTGGAGGTGCCGGGGA
55


1254

ACCA








CU-
1845
TCGATTCCCGGCCA
1932
TGGTGCATTGGCCGGGAAT
60


1276

ATGCACCA

CGA






CU-
1846
ATCCCACTTCTGAC
1933
TGGTGTCAGAAGTGGGAT
50


1303

ACCA








CU-
1847
GACGAGGTGGCCGA
1934
AACCACTCGGCCACCTCGT
60


1368

GTGG

C






CU-
1848
GCATTGGTGGTTCA
1935
TCTACCACTGAACCACCAA
60


1403

GTGGTAGA

TGC






CU-
1849
GCGGGTGATGCGAA
1936
GCTCAGACTCCAGTTCGCA
50


1513

CTGGAGTCTGAGC

TCACCCGC






Seq



SS



ID
mature miRNA
Seq ID
RTFS primer
primer


ID
No.
sequence (5′-3′)
No.
sequence (5′-3′)
sequence (5′-3′)





CU-
1850

CCCGGGTTTCGGCA

1937
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2013


1130


CCA


CCGTGAGTGGTTGGTGC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCCC








GGGTTTCG






CU-
1851

GCCGGGTACTTTCG

1938
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2014


1132


TATTTT


CCGTGAGTGGTAAAATACG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGCC








GGGTACTTT






CU-
1852

GCTAAGGAAGTCCT

1939
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2015


1137


GTGCTCAGTTTT


CCGTGAGTGGTAAAACTGA
CGTCAGATGTCCGAGTA







GC

GAGGGGGAACGGCGGCT








AAGGAAGTCCTGT






CU-
1853

TATCAATGATGCTT

1940
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2016


1138


CTGAGA


CCGTGAGTGGTTCTCAG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTAT








CAATGATGCTT






CU-
1854

TCGATTCCCGGCCC

1941
TGTCAGGCAACCGTATCAC
SEQ ID NO: 2017


1142


ATGCACCA


CGTGAGTGGTTTGGTGC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCG








ATTCCCGGCCCAT






CU-
1855

AGAAAGGCCGAATT

1942
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2018


1146


TTA


CCGTGAGTG[GTTAAAATT
CGTCAGATGTCCGAGTA







CGG

GAGGGGGAACGGCGAGA








AAGGCCG






CU-
1856

TGGTGTGGTCTGTT

1943
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2019


1148


GTTTT


CCGTGAGTGGTAAAACAAC
CGTCAGATGTCCGAGTA







AG

GAGGGGGAACGGCGTGG








TGTGGTCTG






CU-
1857

CCCCCCACTGCTAA

1944
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2020


1153


ATTTGACTGGCTT


CCGTGAGTGGTAAGCCA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCCC








CCCACTGCTAAATTTG






CU-
1858

TCCCCGCACCTCCA

1945
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2021


1155


CCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CCGCACCT






CU-
1859

GAGAGCGCTCGGTT

1946
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2022


1164


TTT


CCGTGAGTGGTAAAAACCG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGAG








AGCGCT






CU-
1860

ATCCCACTCCTGAC

1947
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2023


1173


ACCA


CCGTGAGTGGTTGGTGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGATC








CCACTCCTG






CU-
1861

GGCGTGATTCATAC

1948
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2024


1175


CTTTT


CCGTGAGTGGTAAAAGGTA
CGTCAGATGTCCGAGTA







TG

GAGGGGGAACGGCGGGC








GTGATTCAT






CU-
1862

AGGGTGTGCGTGTT

1949
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2025


1178


TTT


CCGTGAGTG[GTAAAAACA
CGTCAGATGTCCGAGTA







CGC

GAGGGGGAACGGCGAGG








GTGTGCGT






CU-
1863

AACCGAGCGTCCAA

1950
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2026


1180


GCTCTTTCCATTTT


CCGTGAGTGGTAAAATG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGAAC








CGAGCGTCCAAGCTCT






CU-
1864

TCCCCGACACCTCC

1951
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2027


1186


ACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CCGACACCT






CU-
1865

GCCCGCATCCTCCA

1952
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2028


1191


CCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGCC








CGCATCCT






CU-
1866

ATGTGGTGGCTTAC

1953
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2029


1197


TTTT


CCGTGAGTGGTAAAAGTAA
CGTCAGATGTCCGAGTA







GC

GAGGGGGAACGGCGATG








TGGTGGCTT






CU-
1867

TCCCCGGCACTTCC

1954
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2030


1212


ACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CCGGCACTT






CU-
1868

TCACCCCATAAACA

1955
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2031


1213


CCA


CCGTGAGTGGTTGGTGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCA








CCCCATAA






CU-
1869

TTCCCCGACGGGGA

1956
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2032


1220


GCCA


CCGTGAGTGGTTGGCTC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTTC








CCCGACGGG






CU-
1870

TGTGCTCCGGAGTT

1957
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2033


1221


ACCTCGTTT


CCGTGAGTGGTAAACGAGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTGT








GCTCCGGAGTTA






CU-
1871

TCACGTCGGGGTCA

1958
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2034


1222


CCA


CCGTGAGTGGTTGGTGA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCA








CGTCGGGG






CU-
1872

AGTCCCATCTGGGT

1959
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2035


1241


CGCCA


CCGTGAGTGGTTGGCGA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGAGT








CCCATCTGGG






CU-
1873

TCCCCGTACGGGCC

1960
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2036


1242


ACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CCGTACGGG






CU-
1874

GTCCCTTCGTGGTC

1961
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2037


1243


GCCA


CCGTGAGTGGTTGGCGA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGTC








CCTTCGTGG






CU-
1875

GTCAGGATGGCCGA

1962
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2038


1244


GCGGTCT


CCGTGAGTGGTAGACCG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGTC








AGGATGGCCGAG






CU-
1876

AGGGGGGTAAAAAA

1963
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2039


1246


AAA


CCGTGAGTGGTTTTTTT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGAGG








GGGGTAAA






CU-
1877

CCCACCCAGGGACG

1964
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2040


1251


CCA


CCGTGAGTGGTTGGCGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCCC








ACCCAGGG






CU-
1878

TCCCCGGCACCTCC

1965
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2041


1254


ACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CCGGCACCT






CU-
1879

GAGGGGGACCAAAA

1966
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2042


1264


AAAA


CCGTGAGTGGTTTTTTTTT
CGTCAGATGTCCGAGTA







GG

GAGGGGGAACGGCGGAG








GGGGA






CU-
1880

TACCGAGCCTGGTG

1967
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2043


1269


ATAGC


CCGTGAGTGGTGCTATC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTAC








CGAGCCTGGT






CU-
1881

TCGATTCCCGGCCA

1968
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2044


1276


ATGCACCA


CCGTGAGTGGTTGGTGCAT
CGTCAGATGTCCGAGTA







TG

GAGGGGGAACGGCGTCG








ATTCCCGGC






CU-
1882

GAGCCATGATGATA

1969
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2045


1277


CCACTGAGC


CCGTGAGTGGTGCTCAG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGAG








CCATGATGATACCA






CU-
1883

TAACGGCCGCGGTA

1970
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2046


1278


CCC


CCGTGAGTGGTGGGTAC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTAA








CGGCCGCG






CU-
1884

GCAGCGCCAGCCTC

1971
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2047


1281


CCGCCCTAC


CCGTGAGTGGTGTAGGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGCA








GCGCCAGCCTCCCG






CU-
1885

CGTCCATGATGTTC

1972
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2048


1288


CGCAA


CCGTGAGTGGTTTGCGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCGT








CCATGATGTT






CU-
1886

AGCAGTGATGTCCT

1973
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2049


1293


GAAAATTCTGAAG


CCGTGAGTGGTCTTCAGAA
CGTCAGATGTCCGAGTA







TTT

GAGGGGGAACGGCGAGC








AGTGATGTCCTGA






CU-
1887

AAAGGACCTGGCGG

1974
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2050


1294


TGCTTC


CCGTGAGTGGTGAAGCA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGAAA








GGACCTGGCGG






CU-
1888

ATCCCGGACGAGCC

1975
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2051


1298


CCCA


CCGTGAGTGGTTGGGGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGATC








CCGGACGAG






CU-
1889

TCCTCACACGGGGC

1976
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2052


1300


ACCA


CCGTGAGTGGTTGGTGC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








TCACACGGG






CU-
1890

ATCCCACTTCTGAC

1977
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2053


1303


ACCA


CCGTGAGTGGTTGGTGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGATC








CCACTTCTG






CU-
1891

ACCCCACTATGCTT

1978
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2054


1307


AGCCCT


CCGTGAGTGGTAGGGCT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGACC








CCACTATGCTT






CU-
1892

TGTATTGTGAGACA

1979
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2055


1323


TTC


CCGTGAGTGGTGAATGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTGT








ATTGTGAG






CU-
1893

TCTCGGTGGAACCT

1980
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2056


1324


CCA


CCGTGAGTGGTTGGAGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCT








CGGTGGAA






CU-
1894

ATCCCCAGCACCTC

1981
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2057


1339


CACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGATC








CCCAGCACCT






CU-
1895

AGAACACTACGAGC

1982
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2058


1345


CACA


CCGTGAGTGGTTGTGGC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGAGA








ACACTACGA






CU-
1896

ACCCCACTTCTGGT

1983
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2059


1352


ACCA


CCGTGAGTGGTTGGTACCA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGACC








CCACTTC






CU-
1897

CGTTCGCGCTTTCC

1984
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2060


1363


CCTG


CCGTGAGTGGTCAGGGGAA
CGTCAGATGTCCGAGTA







AG

GAGGGGGAACGGCGCGT








TCGCG






CU-
1898

GACGAGGTGGCCGA

1985
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2061


1368


GTGG


CCGTGAGTGGTCCACTC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGAC








GAGGTGGCC






CU-
1899

TCCCCGGCATCTCC

1986
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2062


1369


ACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CCGGCATCT






CU-
1900

CTGATTGCTCCTAT

1987
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2063


1370_MOD


CTGATT


CCGTGAGTGGTAATCAG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCTG








ATTGCTCCTAT






CU-
1901

TCTAGAGGAGCCTG

1988
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2064


1371


TTCTGTA


CCGTGAGTGGTTACAGA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCT








AGAGGAGCCTGT






CU-
1902

TCGGGTGCGAGAGG

1989
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2065


1379


TCCCGGGT


CCGTGAGTGGTACCCGGGA
CGTCAGATGTCCGAGTA







CC

GAGGGGGAACGGCGTCG








GGTGCGAGA






CU-
1903

ATAGGTTTGGTCCT

1990
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2066


1380


AGCCTTTCT


CCGTGAGTGGTAGAAAG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGATA








GGTTTGGTCCTAGC






CU-
1904

TCGATTCCCGGTCA

1991
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2067


1381


GGGAACCA


CCGTGAGTGGTTGGTTC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCG








ATTCCCGGTCAGG






CU-
1905

TCCTCGTTAGTATA

1992
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2068


1382


GTGGTGAGTATCCC


CCGTGAGTGGTGGGATA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








TCGTTAGTATAGTGGT






CU-
1906

TCCCTGGTGGTCTA

1993
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2069


1388


GTGGTTAGGATTCG


CCGTGAGTGGTCGAATC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








CTGGTGGTCTAGTGGT






CU-
1907

TAAGTGTTTGTGGG

1994
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2070


1396


TTA


CCGTGAGTGGTTAACCCAC
CGTCAGATGTCCGAGTA







A

GAGGGGGAACGGCGTAA








GTGTT






CU-
1908

GCATTGGTGGTTCA

1995
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2071


1403


GTGGTAGA


CCGTGAGTGGTTCTACC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGCA








TTGGTGGTTCAGT






CU-
1909

TGGTTATCACGTTC

1996
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2072


1440


GCC


CCGTGAGTGGTGGCGAACG
CGTCAGATGTCCGAGTA







T

GAGGGGGAACGGCGTGG








TTATC






CU-
1910

CCCTGCTCGCTGCG

1997
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2073


1453


CCA


CCGTGAGTGGTTGGCGC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCCC








TGCTCGCT






CU-
1911

TTCTCACTACTGCA

1998
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2074


1457


CTTGACTA


CCGTGAGTGGTTAGTCA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTTC








TCACTACTGCACT






CU-
1912

CTCCTGGCTGGCTC

1999
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2075


1470


GCCA


CCGTGAGTGGTTGGCGA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCTC








CTGGCTGGC






CU-
1913

CTCCCACTGCTTCA

2000
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2076


1477


CTTGACTAGC


CCGTGAGTGGTGCTAGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCTC








CCACTGCTTCACTTG






CU-
1914

CTGCTGTGATGACA

2001
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2077


1486


TTC


CCGTGAGTGGTGAATGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCTG








CTGTGATG






CU-
1915

TCCTGCCGCGGTCG

2002
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2078


1488


CCA


CCGTGAGTGGTTGGCGA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTCC








TGCCGCGG






CU-
1916

GCGGGTGATGCGAA

2003
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2079


1513


CTGGAGTCTGAGC


CCGTGAGTGGTGCTCAG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGCG








GGTGATGCGAACTGGA






CU-
1917

CCCCCACAACCGCG

2004
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2080


1524


CTTGACTAGC


CCGTGAGTGGTGCTAGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCCC








CCACAACCGCGCTTG






CU-
1918

TAGGGGTATGATTC

2005
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2081


1528


TCGCT


CCGTGAGTGGTAGCGAG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGTAG








GGGTATGATT






CU-
1919

GGCTGGTCCGAGTG

2006
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2082


1538


CAGTGGTGTTTA


CCGTGAGTGGTTAAACACC
CGTCAGATGTCCGAGTA







AC

GAGGGGGAACGGCGGGC








TGGTCCGAGTGCAGTG






CU-
1920

GGCTGGTCCGATGG

2007
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2083


1542


TAGTGGGTT


CCGTGAGTGGTAACCCA
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGGC








TGGTCCGATGGTAG






CU-
1921

CCACGAGGAAGAGA

2008
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2084


1545


GGTAGC


CCGTGAGTGGTGCTACCTC
CGTCAGATGTCCGAGTA







T

GAGGGGGAACGGCGCCA








CGAGGAAG






CU-
1922

CGGAAGCGTGCTGG

2009
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2085


1550


GCCC


CCGTGAGTGGTGGGCCC
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCGG








AAGCGTGCT






CU-
1923

GGAGAGAACGCGGT

2010
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2086


1557


CTGAGTGGT


CCGTGAGTGGTACCACT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGGGA








GAGAACGCGGTCTG






CU-
1924

ATCCCCAGCATCTC

2011
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2087


1570


CACCA


CCGTGAGTGGTTGGTGG
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGATC








CCCAGCATCT






CU-
1925

CCCCCCACTGCTAA

2012
TGTCAGGCAACCGTATTCA
SEQ ID NO: 2088


1575


ATTTGACTGGA


CCGTGAGTGGTTCCAGT
CGTCAGATGTCCGAGTA







GAGGGGGAACGGCGCCC








CCCACTGCTAAATTTG














Universal primer ID
Universal primer sequences (5′-3′)






MPF
SEQ ID NO: 2089




TGTCAGGCAACCGTATTCACC






MPR
SEQ ID NO: 2090




CGTCAGATGTCCGAGTAGAGG






Control Primer ID
Control primer sequences (5′-3′)






5s_rRNA_FWD
SEQ ID NO: 2091




GCCCGATCTCGTCTGATCT






5s_rRNA_REV
SEQ ID NO: 2092




AGCCTACAGCACCCGGTATT









RNA Blot.


Total RNA and small RNA fractions were purified using the Trizol Reagent (Invitrogen) and the PureLink miRNA Isolation Kit (Invitrogen), respectively, following the manufacturer's indications. Electrophoresis was performed on 15% denaturing polyacrylamide gel and then RNA was transferred on Duralon UV membrane (Stratagene) using a semidry transfer apparatus. Pre-hybridization and hybridization were performed in 5×SSC, 20 mM Na2HPO4 pH7.2, 7% SDS, 3×Denhardt's Solution. Oligonucleotide probes were [γ-32P]-ATP labeled by polynucleotide kinase (Fermentas). The list of oligonucleotides and their hybridization temperature is reported in Table S8. After over-night hybridization, membranes were washed at the same temperature in 3×SSC, 25 mM NaH2PO4 pH 7.5, 5% SDS, 10×Denhardt's Solution for 15-20′ and in 1× SSC, 1% SDS for 5′. Images were obtained by exposure to phosphoimager cassette and acquisition by Storm 840 Phosphoimager (Molecular Dynamics) and by film exposure for approximately 2 weeks.


Estimation of Library Complexity.


A bootstrap technique was used to estimate the total number of miRNAs expressed in each library and the number of short-RNAs must be sequenced to achieve a complete coverage. Bootstrapping is a statistical technique for estimating properties of an “estimator” by measuring those properties in multiple subsets of the samples (Harrell, 2001; Hinkley, 1997). Specifically, we estimated the distribution of mature miRNAs obtained by random sub-sampling different size short-RNA libraries from each complete library. For each size N=10, 20, . . . Nt, where Nt is the total number of short-RNAs in the library, we randomly sampled 1000 libraries of size N and computed the number r(N) of inferred miRNAs, resulting in a distribution p(r(N)) for which we could compute standard statistical parameters such as average, variance, mode and median. Based on this sampling, we can extrapolate p(r(N)) for increasing values of N to determine at which point it is no longer efficient to use larger values of N to increase miRNA coverage. To achieve this, we fitted the data to the parametric function ƒ(x)=K*(1−e−mx).


Since we include both experimentally confirmed and putative mature miRNAs and since bootstrapping can produce optimistic results we expect that the estimated values constitute an upper boundary on the real library complexity. Based on this analysis, we estimated that the total numbers of mature miRNAs are: 129 (naïve), 154 (memory), 204 (centroblasts) and 189 (Ramos). Thus, the libraries sequenced in this study cover respectively 90.7% (naïve), 88.3% (memory), 85.8% (centroblasts), and 91% (Ramos) of the expressed miRNAs in these cellular phenotypes. FIG. 23 gives the 95% confidence intervals for p(r(N)) at each sampling point, in addition to the curve of the associated extrapolated function for each library. Clearly, the bootstrap analysis estimate of the total number of miRNA is correct only if the abundance of the miRNAs expressed in the sampled populations closely matches that of known miRNA in miRBase. This is not unreasonable if, as done here, only miRNAs that are specific to a B cell differentiation stage or transformation are considered. Thus, this does not estimate the total number of miRNA expressed across all human cell types, stages of differentiation and neoplastic transformations, which could be several fold larger than what was estimated from the B cell libraries.


Orthology and Conservation Analysis.


We investigated conservation of known and predicted precursor and mature human miRNA in chimp (panTro2), monkey (rheMac2), dog (canFam2) mouse (mm8) and rat (rn4). We obtained 678 miRNA precursor sequences from miRBase (v.11.0), 666 mature miRNAs and 167 star sequences. In total, we obtained 947 locations for mature and star mirBase sequences. We predicted 388 precursors of which 114 match miRBase precursors and 274 are newly predicted. Categorizing these by their corresponding mature sequences, 255 precursors correspond to mature miRNAs that are not included in the miRBase and 133 precursors are associated with 103 predicted miRNAs that match miRBase miRNAs. Of the 274 newly predicted precursors, 19 associated with 8 mature sequences listed in miRBase database.


miRNA conservation has been repeatedly used to help identify putative miRNA mappings to genomes. To identify putative ortholog miRNAs we relied on UCSC-provided Blastz pairwise alignments between human and target species (Schwartz et al., 2003). We used two related but complementary methods: (1) map the mature human miRNA to its ortholog location as specified by pairwise alignment; and (2) map the precursor of the human miRNA to its ortholog location as specified by pairwise alignment, expanding the human region to include at least 80 bases from both sides of the mature region, and identifying regions in the target that match the sequence of the mature human miRNA.


Method 1 is the simplest but fails to account for alignment inaccuracies and local mutations that may shift the position of the mature sequence in the target species. Method 2 accounts for locally imperfect Blastz mapping, but relies on conservation of larger regions that may not be subject to the same selective pressure as the mature miRNA. Alignment-based mapping of the human mature miRNA to its target were required to have either perfect conservation of the entire mature miRNA sequence or conservation of seeds composed of seven bases starting from the second position of the human mature sequence followed by conservation of 3 bases starting from the 12th, 13th or 14th position as suggested by (Grimson et al., 2007) (Appendix Table 11). We scanned the entire mapped ortholog region for a match to the human mature sequence or to its seed.


miRNA Target Prediction and Analysis.


Target predictions for not previously reported miRNAs were performed by miRanda v1.0 (John et al., 2004) and RNA22 (Miranda et al., 2006) using recommended parameters with the exception of RNA22 energy threshold that was changed from default −25.0 kcal/mol to −20.0 kcal/mol.


In order to investigate the potential effect of miRNAs on the transcriptome, predicted targets were tested for enrichment in genes down-regulated in the same population over-expressing the tested miRNA. Over-expressed miRNA were selected based on a minimum frequency value >0.08 and a three-fold increase in their cloning frequency comparing CB vs naïve or memory B cell libraries. Genes differentially expressed across normal B cell populations were identified based on intensity fold change greater than 1.5, and p-value under 0.01 according to a non-parametric U test applied to six biological replicates per cell type (gene expression data are available from GEO database; GSE2350).


For each miRNA, using a Fisher exact test, we compared the numbers of down- and up-regulated predicted targets to down- and up-regulated genes that are not predicted targets. Setting a p-value threshold of 0.01, targets of most GC-specific miRNAs were significantly down regulated in CB. Conversely, targets of naïve- and memory-specific miRNAs were not significantly differentially regulated. Therefore the analysis was focused on targets of the 15 GC-specific miRNAs. Predicted targets of 8 out of 15 miRNAs showed significant enrichment (p-value <0.001) in genes down-regulated in GC compared to naïve B cells and 2 of them showed enrichment for genes down-regulated in a control population (memory compared to naïve) (Table 13). We can conclude that targets of GC-specific miRNAs are significantly more likely to be down regulated in CB than in naïve B cells with p<0.05 according to a Fisher exact test. Moreover, down-regulation p-values in CB were systematically lower than in memory (FIG. 26 and Table 13). Of the target sets for the 15 GC-specific miRNAs, 11 were more significantly down regulated in CB, 2 were more significantly down regulated in the control population (memory), and 2 were not down regulated in either (FIG. 26). Using down-regulation in memory as control, we therefore conclude that down-regulation p-values are lower for CB with p<0.05 according to a binomial test with an 11/15 rate under a null hypothesis of equally likely odds for greater down regulation. In summary, while targets of naïve and memory specific miRNAs were not found differentially expressed in our data, we were able to demonstrate that predicted targets of GC-specific miRNAs are enriched in genes that are down regulated in GC.


Correlation Between Cloning and Microarray miRNA Profiling.


In order to compare cloning and microarray data, we focused on the 89 miRNAs for which both types of data were available. A significant correlation (p-value ≦3.9e-28) was shown between cloning and miRNA microarray data as measured by Spearman correlation. The corresponding scatter plot is shown in FIG. 27. Furthermore, to investigate if miRNA cloning counts were predictive of differential expression as measured by miRNA microarray, we identified 39 miRNAs whose cloning frequency was at least 2 fold greater in one normal B cell subset relative to each of the remaining two subsets. Of these, 25 (64.1%) miRNAs were found to be over-expressed in the same B cell subset according to miRNA microarray profiling. Over-expression was measured using a one sided U test, with threshold corresponding to p<0.01. We used permutation testing to estimate the significance of the success rate, randomly shuffling expression labels while keeping clone frequencies unchanged. The distribution of confirmed clone predictions using the shuffled expression data had mean of 1.2% and standard deviation of 5.1%, corresponding to 12.2 standard deviations away from our prediction success rate and a p-value near zero. We conclude that miRNA cloning counts are predictive of miRNAs concentration levels and differential expression.


Immunoprecipitation.


Immunoprecipitations were performed from Ramos cells grown in IMDM, 10% fetal bovine serum, 1% Penicillin/Streptomycin. 1-2×10^8 cells were collected and resuspended in 1 ml lysis buffer (10 mM Tris pH 7.5, 2 mM MgCl2, 10 mM KCl, 2.5 mM DTT, 1× protease inhibitors, 40 U/ul Ambion Superase-IN). Lysate supernatant was mixed with 500 ul ATP depletion mix (450 mM KCl, 100 mM glucose, 0.5 U/ul Sigma-Aldrich hexokinase). Cleared supernatant was divided equally between paramagnetic protein G beads (New England Biolabs) bound to either a monoclonal rat antibody raised against human Ago2 protein (Rudel et al., 2008) or total purified rat IgG (Sigma-Aldrich). Beads were incubated with lysate under rotation for 2 hours at 4° C., then washed three times with ice-cold lysis buffer and collected in Trizol (Invitrogen) for RNA extraction. RNA from three sequential immunoprecipitations was pooled and 1/10th of yield was used for reverse transcription of each miRNA species using Superscript III First Strand Synthesis Kit (Invitrogen), in the presence of 0.2 μM RTFS primer (miRNA-specific primers, see Table S8). cDNA was also generated from reverse transcription in the presence of random hexamers to test expression of 5s rRNA. 1/10th of the cDNA volume was used as template for SYBR (Applied Biosystems) qPCR amplification in the presence of 4 nM SS primer (miRNA-specific primers, see Table 14) and 0.4 μM each of MPF and MPR universal or 0.4 μM each of 5sRNA primers (Table 14). Each qPCR reaction was performed in triplicate. The tested miRNA were selected based on the availability of optimized qRT-PCR conditions among the ones detectable both by RNA blot and RT-PCR.


Accession Numbers.


The miRNA array profiles data are available from the GEO repository (GSE15144).


REFERENCES



  • Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-297.

  • Basso, K., Margolin, A. A., Stolovitzky, G., Klein, U., Dalla-Favera, R., and Califano, A. (2005). Reverse engineering of regulatory networks in human B cells. Nat Genet. 37, 382-390.

  • Bentwich, I., et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet. 37, 766-770.

  • Calabrese, J. M., Seila, A. C., Yeo, G. W., and Sharp, P. A. (2007). RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci USA 104, 18097-18102.

  • Calin, G. A., et al. (2002). Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA 99, 15524-15529.

  • Calin, G. A., Ferracin, M., Cimmino, A., Di Leva, G., Shimizu, M., Wojcik, S. E., Iorio, M. V., Visone, R., Sever, N. I., Fabbri, M., et al. (2005). A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 353, 1793-1801.

  • Chen, C. Z., Li, L., Lodish, H. F., and Bartel, D. P. (2004). MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-86.

  • Cummins, J. M., He, Y., Leary, R. J., Pagliarini, R., Diaz, L. A., Jr., Sjoblom, T., Barad, O., Bentwich, Z., Szafranska, A. E., Labourier, E., et al. (2006). The colorectal microRNAome. Proc Natl Acad Sci USA 103, 3687-3692.

  • Dorsett, Y., et al. (2008). MicroRNA-155 suppresses activation-induced cytidine deaminase-mediated Myc-Igh translocation. Immunity 28, 630-638.

  • Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863-14868.

  • Filipowicz, W., Bhattacharyya, S. N., and Sonenberg, N. (2008). Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat Rev Genet. 9, 102-114.

  • Griffiths-Jones, S. (2006). miRBase: the microRNA sequence database. Methods Mol Biol 342, 129-138.

  • Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A., and Enright, A. J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-144.

  • Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and Bartel, D. P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105.

  • Harrell, F. E. (2001). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (N.Y., Springer).

  • Hartigan, J. A. (1975). Clustering Algorithms (New York, Wiley).

  • He, L., et al., (2005). A microRNA polycistron as a potential human oncogene. Nature 435, 828-833.

  • Hinkley, A. C. D. a. D. V. (1997). Bootstrap Methods and their Applications (New York, Cambridge University Press).

  • John, B., Enright, A. J., Aravin, A., Tuschl, T., Sander, C., and Marks, D. S. (2004). Human MicroRNA targets. PLoS Biol 2, e363.

  • Kawahara, Y., Zinshteyn, B., Sethupathy, P., Iizasa, H., Hatzigeorgiou, A. G., and Nishikura, K. (2007). Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137-1140.

  • Kim, V. N. (2005). MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol 6, 376-385.

  • Klein, U., and Dalla-Favera, R. (2008). Germinal centres: role in B-cell physiology and malignancy. Nat Rev Immunol 8, 22-33.

  • Klein, U., Tu, Y., Stolovitzky, G. A., Keller, J. L., Haddad, J., Jr., Miljkovic, V., Cattoretti, G., Califano, A., and Dalla-Favera, R. (2003). Transcriptional analysis of the B cell germinal center reaction. Proc Natl Acad Sci USA 100, 2639-2644.

  • Kuppers, R., and Dalla-Favera, R. (2001). Mechanisms of chromosomal translocations in B cell lymphomas. Oncogene 20, 5580-5594.

  • Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A. O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414.

  • Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

  • Lee, E. J., Baek, M., Gusev, Y., Brackett, D. J., Nuovo, G. J., and Schmittgen, T. D. (2007). Systematic evaluation of microRNA processing patterns in tissues, cell lines, and tumors. Rna 14, 35-42.

  • Li, Q. J., Chau, J., Ebert, P. J., Sylvester, G., Min, H., Liu, G., Braich, R., Manoharan, M., Soutschek, J., Skare, P., et al. (2007). miR-181a is an intrinsic modulator of T cell sensitivity and selection. Cell 129, 147-161.

  • Lu, J., et al. (2005). MicroRNA expression profiles classify human cancers. Nature 435, 834-838.

  • Luciano, D. J., Mirsky, H., Vendetti, N. J., and Maas, S. (2004). RNA editing of a miRNA precursor. Rna 10, 1174-1177.

  • Michael, M. Z., SM, O. C., van Holst Pellekaan, N. G., Young, G. P., and James, R. J. (2003). Reduced accumulation of specific microRNAs in colorectal neoplasia. Mol Cancer Res 1, 882-891.

  • Miranda, K. C., Huynh, T., Tay, Y., Ang, Y. S., Tam, W. L., Thomson, A. M., Lim, B., and Rigoutsos, I. (2006). A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126, 1203-1217.

  • Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 16, 720-728.

  • Neilson, J. R., Zheng, G. X., Burge, C. B., and Sharp, P. A. (2007). Dynamic regulation of miRNA expression in ordered stages of cellular development. Genes Dev 21, 578-589.

  • Rodriguez, A., Vigorito, E., Clare, S., Warren, M. V., Couttet, P., Soond, D. R., van Dongen, S., Grocock, R. J., Das, P. P., Miska, E. A., et al. (2007). Requirement of bic/microRNA-155 for normal immune function. Science 316, 608-611.

  • Schwartz, S., Kent, W. J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R. C., Haussler, D., and Miller, W. (2003). Human-mouse alignments with BLASTZ. Genome Res 13, 103-107.

  • Sharbati-Telurani, S., Kutz-Lohroff, B., Bergbauer, R., Scholven, J., and Einspanier, R. (2008). miR-Q: a novel quantitative RT-PCR approach for the expression profiling of small RNA molecules such as miRNAs in a complex sample. BMC Mol Biol 9, 34.

  • Teng, G., et al. (2008). MicroRNA-155 is a negative regulator of activation-induced cytidine deaminase. Immunity 28, 621-629.

  • Thai, T. H., et al. (2007). Regulation of the germinal center response by microRNA-155. Science 316, 604-608.

  • Thomson, J. M., Newman, M., Parker, J. S., Morin-Kensicki, E. M., Wright, T., and Hammond, S. M. (2006). Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20, 2202-2207.

  • Xiao, C., Calado, D. P., Gaiter, G., That, T. H., Patterson, H. C., Wang, J., Rajewsky, N., Bender, T. P., and Rajewsky, K. (2007). MiR-150 Controls B Cell Differentiation by Targeting the Transcription Factor c-Myb. Cell 131, 146-159

  • Griffiths-Jones, S. (2006). miRBase: the microRNA sequence database. Methods Mol Biol 342, 129-138.

  • Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A., and Enright, A. J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-144.

  • Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and Bartel, D. P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105.

  • Harrell, F. E. (2001). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (N.Y., Springer).

  • Hinkley, A. C. D. a. D. V. (1997). Bootstrap Methods and their Applications (New York, Cambridge University Press).

  • John, B., Enright, A. J., Aravin, A., Tuschl, T., Sander, C., and Marks, D. S. (2004). Human MicroRNA targets. PLoS Biol 2, e363.

  • Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A. O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414.

  • Miranda, K. C., Huynh, T., Tay, Y., Ang, Y. S., Tam, W. L., Thomson, A. M., Lim, B., and Rigoutsos, I. (2006). A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126, 1203-1217.

  • Rudel, S., Flatley, A., Weinmann, L, Kremmer, E., and Meister, G. (2008). A multifunctional human Argonaute2-specific monoclonal antibody. Rna 14, 1244-1253.

  • Schwartz, S., Kent, W. J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R. C., Haussler, D., and Miller, W. (2003). Human-mouse alignments with BLASTZ. Genome Res 13, 103-107.


Claims
  • 1. An isolated nucleotide sequence differing by no more than one nucleotide from the sequence of SEQ ID NO: 236, wherein the nucleotide sequence is 19, 20, 21, or 22 nucleotides in length.
  • 2. An isolated nucleotide sequence that is 19, 20, 21, or 22 nucleotides and length and is complementary to the nucleic acid of claim 1.
  • 3. An isolated nucleotide sequence that is 19, 20, 21, or 22 nucleotides and length and is complementary to all but 1, 2, 3, 4, or 5 nucleotides of the nucleotide sequence of claim 2.
  • 4. An isolated nucleotide sequence that is 19, 20, 21, or 22 nucleotides, wherein the nucleotide sequence is complementary to at least 19, 20, 21, or 22 nucleotides of the nucleotide sequence of claim 1.
  • 5. The nucleic acid of claim 1, wherein the nucleotide sequence is single stranded.
  • 6. The nucleic acid of claim 2, wherein the nucleotide sequence is single stranded.
  • 7. The nucleic acid of claim 3, wherein the nucleotide sequence is single stranded.
  • 8. The nucleic acid of claim 1, wherein the nucleotide sequence is expressed by a B cell.
  • 9. The nucleotide sequence of claim 8, wherein the B cell comprises a Naïve B cell, a centroblast, a memory B cell, or a Ramos Burkitt Lymphoma cell.
  • 10. A composition comprising one or more nucleotide sequences of claim 1.
  • 11. The composition of claim 10, further comprising one or more carriers, excipients, solvents, bases, or a combination thereof.
  • 12. A composition comprising one or more nucleotide sequences of claim 2.
  • 13. A composition comprising one or more nucleotide sequences of claim 3.
  • 14. The composition of claim 12 or 13, further comprising one or more carriers, excipients, solvents, bases, or a combination thereof.
Parent Case Info

This application is a Continuation-In-Part of International Patent Application No. PCT/US2008/070082, filed Jul. 15, 2008, which claims priority of U.S. Provisional Patent Application No. 60/950,474, filed Jul. 18, 2007, and of U.S. Provisional Patent Application No. 61/020,625, filed Jan. 11, 2008 each of which is incorporated herewith in its entirety.

GOVERNMENT INTERESTS

The work described herein was supported in whole, or in part, by National Cancer Institute Grant No. R01-CA109755 “Genetic Network Interference with Combinatorial Phenotypes”, and National Institute of Allergy and Infectious Diseases Grant No. R01 AI066116 “Regulatory Modules in Normal and Transformed b-Cell”. Thus, the United States Government has certain rights to the invention.

US Referenced Citations (36)
Number Name Date Kind
3536809 Applezweig Oct 1970 A
3598123 Zaffaroni Aug 1971 A
3845770 Theeuwes et al. Nov 1974 A
3916899 Theeuwes et al. Nov 1975 A
4008719 Theeuwes et al. Feb 1977 A
4683195 Mullis et al. Jul 1987 A
4710384 Rotman Dec 1987 A
5059595 Le Grazie Oct 1991 A
5073543 Marshall et al. Dec 1991 A
5120548 McClelland et al. Jun 1992 A
5139941 Muzyczka et al. Aug 1992 A
5252479 Srivastava Oct 1993 A
5270163 Gold et al. Dec 1993 A
5354556 Sparks et al. Oct 1994 A
5427916 Gewirtz et al. Jun 1995 A
5582981 Toole et al. Dec 1996 A
5591767 Mohr et al. Jan 1997 A
5639476 Oshlack et al. Jun 1997 A
5674533 Santus et al. Oct 1997 A
5733566 Lewis Mar 1998 A
5756291 Griffin et al. May 1998 A
5849902 Arrow et al. Dec 1998 A
5989912 Arrow et al. Nov 1999 A
6083685 Petrik Jul 2000 A
7232806 Tuschl et al. Jun 2007 B2
20020086356 Tuschl et al. Jul 2002 A1
20020173478 Gewirtz Nov 2002 A1
20040014113 Yang et al. Jan 2004 A1
20040018176 Tolentino et al. Jan 2004 A1
20050182005 Tuschl et al. Aug 2005 A1
20050261218 Esau et al. Nov 2005 A1
20060051771 Murphy et al. Mar 2006 A1
20060105360 Croce et al. May 2006 A1
20060142228 Ford et al. Jun 2006 A1
20070072204 Hannon et al. Mar 2007 A1
20070123482 Stoffel et al. May 2007 A1
Foreign Referenced Citations (17)
Number Date Country
WO-8804300 Jun 1988 WO
WO-9324641 Dec 1993 WO
WO-9413788 Jun 1994 WO
WO-9742317 Nov 1997 WO
WO-9907409 Feb 1999 WO
WO-9927133 Jun 1999 WO
WO-9932619 Jul 1999 WO
WO-9954506 Oct 1999 WO
WO-0001846 Jan 2000 WO
WO-0020040 Apr 2000 WO
WO-0044895 Aug 2000 WO
WO-0044914 Aug 2000 WO
WO-0129058 Apr 2001 WO
WO-0136646 May 2001 WO
WO-0329459 Apr 2003 WO
WO-2005097205 Oct 2005 WO
WO-2009012263 Jan 2009 WO
Non-Patent Literature Citations (150)
Entry
Hinkley, A. C. D. a. D. V. Bootstrap Methods and their Applications (Cambridge University Press, New York, 1997).
Klein et al., New Insights into the phenotype and cell derivation of B cell chronic lymphocytic leukemia, Curr Top Microbiol Immunol, vol. 294, pp. 31-49 (2005).
International Search Report and Written Opinion mailed Jan. 22, 2009 for International Patent Application No. PCT/US08/70082 filed Jul. 15, 2008.
“Transcription and Translation” B. D. Hames & S. J. Higgins eds. 1984.
Alexander et al., “Selected technologies to control genes and their products for experimental and clinical purposes,” (2007) Arch Immunol Ther Exp (Warsz). May-Jun. 2007;55(3):139-49.
Anderson (1998), “Human gene therapy,” Nature 392:25-30.
B. Perbal, A Practical Guide to Molecular Cloning (1984).
Bandres et al., “MicroRNAs as Cancer Players: Potential Clinical and Biological Effects,” (2007) DNA Cell Biol. 26(5):273-82.
Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-297.
Barutcuoglu et al., “Hierarchical multi-label prediction of gene function,” Bioinformatics, vol. 22, pp. 830-836 (2006).
Bass (2001), “The short answer,” Nature, 411, 428 429.
Basso et al., “Identification of the Human mature B cell miRNome,” Immunity, vol. 30, pp. 744-752, (May 2009).
Basso, K., Margolin, A. A., Stolovitzky, G., Klein, U., Dalla-Favera, R., and Califano, A. (2005). Reverse engineering of regulatory networks in human B cells. Nat Genet 37, 382-390.
Been and Cech, “One binding site determines sequence specificity of tetrahymena pre-rRNA Self-splicing, Trans-Splicing, and RNA Enzyme Activity,” 1986, Cell, 47:207-216.
Bentwich, I. et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37, 766-70 (2005).
Blind (1999) , “Cytoplasmic RNA modulators of an inside-out signal-transduction cascade,” Proc. Nat'l. Acad. Sci. USA 96:3606-3610.
Brummelkamp et al. (2002), “A System for Stable Expression of Short Interfering RNAs in mammalian Cells,” Science 296:550-553.
Calabrese, J. M., Seila, A. C., Yeo, G. W. & Sharp, P. A. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci U S A 104, 18097-102 (2007).
Caldas et al., “sizing up miRNAS as cancer genes,” Nature medicine, vol. 11, pp. 712-714 (Jul. 2005).
Calin et al., “Genomics of Chronic Lymphocytic Leukemia MicroRNAs as New Players with Clinical Significance,” Semin Oncol 33: 167-173 (2006).
Calin et al., “Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers,” PNAS, vol. 101, pp. 2999-3004 (Mar. 2004).
Calin et al., “MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias,” PNAS, vol. 101, pp. 11755-11760 (Aug. 2004).
Calin et al., “MicroRNA signatures in human cancers,” Nature reviews, vol. 6, pp. 857-866 (Nov. 2006).
Calin et al., “MicroRNA-Cancer Connection: the Beginning of a New tale,” Cancer Res, vol. 66, pp. 7390-7394 (2006).
Calin, G. A. et al. A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N. Engl J Med 353, 1793-801 (2005).
Calin, G. A. et al. Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A 99, 15524-9 (2002).
Chen, C. Z., Li, L., Lodish, H. F. & Bartel, D. P. MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-6 (2004).
Colas et al., (1996), “Genetic selection of peptide aptamers that recognize and inhibit cyclin-dependent kinase 2,” Nature 380:548-550.
Costinean et al., “Pre-B cell proliferation and lymphoblastic leukemia/high grade lymphoma in Eμ-miR155 transgenic mice,” PNAS, vol. 10, pp. 7024-7029 (May 2006).
Crooke, 1997, “Advances in understanding the pharmacological property of antisense oligonucleotides,” Ad. Pharmacol., 40, 1-49.
Crooke, 1998, “Antisense Therapeutics,” Biotech. Genet. Eng. Rev., 15, 121-157.
Crooke, 2000, “Progress in antisense technology. The end of the beginning,” Methods Enzymol., 313, 3-45.
Culture of Animal Cells (R. I. Freshney, Alan R. Liss, Inc., 1986).
Cummins, J. M., He, Y., Leary, R. J., Pagliarini, R., Diaz, L. A., Jr., Sjoblom, T., Barad, O., Bentwich, Z., Szafranska, A. E., Labourier, E., et al. (2006). The colorectal microRNAome. Proc Natl Acad Sci U S A 103, 3687-3692.
Dallas et al., “RNAi: A novel antisense technology and its therapeutic potential,” (2006) Med. Sci. Monit.12(4):RA67-74.
Delihas et al., 1997, “Natural antisense RNA/target RNA interactions: Possible models for antisense oligonucleotide drug design,” Nature, 15, 751-753.
DeRisi et al., “Use of a cDNA microarray to analyse gene expression patterns in human cancer,” Nature genetics vol. 14, pp. 457-460 (Dec. 1996).
DNA Cloning, vols. I and II (D. N. Glover ed., 1985).
Dornburg (1995), “Reticuloendotheliosis virus and derived vectors,” Gene Therap. 2:301-310.
Dorsett, Y., et al. (2008). MicroRNA-155 suppresses activation-induced cytidine deaminase-mediated Myc-Igh translocation. Immunity 28, 630-638.
Dvorak et al., 2003, Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub147(2):131-5.
Eglitis (1988), “Retroviral vectors for introduction of genes into mammalian cells,” Biotechniques 6:608-614.
Eis et al., “Accumulation of miR-155 and BIC RNA in human B cell lymphomas,” PNAS, vol. 102, pp. 3627-3632 (Mar. 2005).
Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Aced Sci U S A 95, 14863-8 (1998).
Elbashir et al., (2001), “Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells,” Nature, 411, 494 498.
Ellington and Szostak (1990),“In vitro selection of RNA molecules that bind specific ligands,” Nature 346:818.
Fienberg et al. (1983), “A Technique for radiolabeling DNA Restricyion Endonuclease fragments to high specific activity,” Anal. Biochem. 132:6-13.
Filipowicz, W., Bhattacharyya, S. N., and Sonenberg, N. (2008). Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat Rev Genet 9, 102-114.
Fisher et al. (1996), “Transduction with recombinant adeno-associated virus for gene therapy is limited by leading-strand synthesis,” J. Virol., 70:520-532.
Fodor et al., 1991; “Light-Directed, spatially addressable parallel chemical synthesis,” Maskos and Southern, 1992.
Gaidatzis et al., “Inference of miRNA targets using evolutionary conservation and pathway analysis,” BMC Bioinformatics, vol. 8:69 p. 1-22 (2007).
Gene Transfer Vectors for Mammalian Cells (J. H. Miller and M. P. Calos eds., 1987, Cold Spring Harbor Laboratory).
Griffiths-Jones, S. miRBase: the microRNA sequence database. Methods Mol Biol 342, 129-38 (2006).
Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-4 (2006).
Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91-105 (2007).
Hammond SM, 2006; “MicroRNAs as oncogenes,” Curr Opin Genet Dev. 16(1):4-9.
Handbook of Experimental Immunology, vols. I-IV (D. M. Weir and C. C. Blackwell, eds., 1986).
Harrell, F. E. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis (Springer, New York, 2001).
Haseloff et al., “Simple RNA enzymes with new and highly specific endoribonuclease activities,” nature, vol. 334, pp. 585-591 (Aug. 1988).
He, L. et al. A microRNA polycistron as a potential human oncogene. Nature 435, 828-33 (2005).
Huizenga and Szostak (1995) , “A DNA aptamer that binds adenosine and ATP,” Biochem. 34:656-665.
Hwang and Mendell, 2006,“MicroRNAs in cell proliferation cell death, and tumorigenesis,” Br J Cancer 94(6):776-80.
James W., (2001), “Nucleic acid and polypeptide aptamers: a powerful approach to ligand discovery,” Current Opinion in Pharmacology, 1:540-546.
Jay et al., “miRNA Profiling for diagnosis and prognosis of human cancer,” DNA and Cell Biology, vol. 26, pp. 293-300 (2007).
John, B., Enright, A. J., Aravin, A., Tuschl, T., Sander, C., and Marks, D. S. (2004). Human MicroRNA targets. PLoS Biol 2, e363.
Kalota et al., (2006) “Progress in the development of Nucleic Acid Therapeutics,” Handb. Exp. Pharmacol. 173:173-96.
Kawahara, Y. et al. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137-40 (2007).
Kim, V. N. (2005). MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol 6, 376-385.
Klein, U. & Dalla-Favera, R. Germinal centres: role in B-cell physiology and malignancy. Nat Rev Immunol 8, 22-33 (2008).
Klein, U. et al. Transcriptional analysis of the B cell germinal center reaction. Proc Natl Acad Sci U S A 100, 2639-44 (2003).
Kricka, “Nonisotopic DNA Probe Techniques”, Academic Press San Diego, Calif. (1992).
Krützfeldt et al., 2007, “Specificity, duplex degradation and subcellular localization of antagomirs,” Nuc Acid Res 35(9): 2885-2892.
Kuppers, R., and Dalla-Favera, R. (2001). Mechanisms of chromosomal translocations in B cell lymphomas. Oncogene 20, 5580-5594.
Landgraf, P. et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-14 (2007).
Lau, N. C., Lim, L. P., Weinstein, E. G. & Bartel, D. P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-62 (2001).
Lawrie et al., “MicroRNA expression distinguishes between germinal center B cell-like and activated B cell-like subtypes of diffuse large B cell lymphoma,” Int. J Cancer, vol. 121, pp. 1156-1161 (2007).
Lee et al. (2002), “Expression of small interfering RNAs targeted against HIV-1 rev transcripts in human cells,”Nat. Biotechnol. 20:500-505.
Lee et al., “Systematic evaluation of microRNA processing patterns in tissues, cell lines, and tumors,” RNA, vol. 14, pp. 35-42 (2008).
Lee, E. J. et al. Systematic evaluation of microRNA processing patterns in tissues, cell lines, and tumors. Rna 14, 35-42 (2007).
Leung and Whittaker (2005) , “RNA Interference: from gene silencing to gene-specific therapeutics,” Pharmacol Ther. 107(2):222-39.
Li, Q. J. et al. miR-181a is an intrinsic modulator of T cell sensitivity and selection. Cell 129, 147-61 (2007).
Lindow et al., “Principles and limitations of computational MicroRNA Gene and target finding,” DNA and Cell Biology, vol. 26, pp. 339-351 (2007).
Lockhart et al., “Expression monitoring by hybridization to high-density oligonucleotida arrays,” Nature Biotechnology, vol. 14, pp. 1675-1680 (Dec. 1996).
Lorsch and Szostak (1994),“In vitro selection of RNA Aptamers specific for Cyanobalamin,” Biochem. 33:973.
Lu et al., (2005), “In vivo application of RNA interference: from functional Genomics to therapeutics,” Adv Genet. 54:117-42.
Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834-8 (2005).
Luciano, D. J., Mirsky, H., Vendetti, N. J. & Maas, S. RNA editing of a miRNA precursor. Rna 10, 1174-7 (2004).
Lutzelburger et al., (2006) “Strategies to identified Potential Therapeutic target sites in RNA,” Handb. Exp. Pharmacol. 173:243-59.
Mannironi et al., “In vitro selection of dopamine RNA ligands,” (1997) Biochem. 36:9726.
Maskos et al., “Oligonucleotide hybridisations on glass supports: a novel linker for oligonucleotide synthesis and hybridisation properties of oligonucleotides synthesised in situ,” Nucleic Acids Research, vol. 20, pp. 1679-1684 (1992).
Maskos et al., “Parallel analysis off oligodeoxyribonucleotide (oligonucleotide) interactions.I. Analysis of factors influencing oligonucleotide duplex formation,” Nucleic Acids Research, vol. 20, pp. 1675-1678 (1992).
Mattes et al., (2007) “Regulation of MicroRNA by Antagomirs. A NEw Class of Pharmacological Antagonists for the Specific Regulation of Gene Function?,” Am J Resp Cell Mol Biot 36: 8-12.
McCaffrey et al., 2002, “RNA interference in adult mice,” Nature, 418:38-9.
McManus et al., 2002, “Gene silencing using micro-RNA designed hairpins,” RNA, 8:842-50.
Mendell, JT, 2005, “MicroRNAs. Critical regulators of develpment, cellular physiology and malignancy,” Cell Cycle 4(9):1179-84.
Methods in Enzymology, vols. 154 and 155 (Wu et al. eds.).
Michael, M. Z., SM, O. C., van Hoist Pellekaan, N. G., Young, G. P. & James, R. J. Reduced accumulation of specific microRNAs in colorectal neoplasia. Mol. Cancer Res 1, 882-91 (2003).
Miller (1990), “Retrovirus Packaging Cells,” Hum. Gene Therap. 1:5-14.
Miranda, K. C. et al. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126, 1203-17 (2006).
Miyagishi et al. (2002), “U6 promoter-driven siRNAs with four uridine 3′ overhangs efficiently suppress targeted gene expression in mammalian cells,” Nat. Biotechnol. 20:497-500.
Molecular Cloning a Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch and Maniatis (Cold Spring Harbor Laboratory Press: 1989).
Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 16, 720-728.
Neilson, J. R., Zheng, G. X., Burge, C. B. & Sharp, P. A. Dynamic regulation of miRNA expression in ordered stages of cellular development. Genes Dev 21, 578-89 (2007).
Nucleic Acid Hybridization; B. D. Hames & S. J. Higgins eds. 1984.
O'Driscoll, “The emerging world of MicroRNAs,” Anticancer Research, vol. 26, pp. 4271-4278 (2006).
Oligonucleotide Synthesis (M. J. Gait ed., 1984).
Osada and Takahashi, 2007, “MicroRNAs in biological processe and carcinogenesis,” Carcinogenesis 28(1):2-12.
Paddison et al. (2002), “Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells,” Genes Dev. 16:948-958.
Paul et al. (2002), “Effective expression of small interfering RNA in human cells,” Nat. Biotechnol. 20:505-508.
Pekarsky et al., “Tcl1 Expression in Chronic Lymphocytis leukemia is regulated by miR-29 and miR-18,” Cancer Res., vol. 66, pp. 11590-11593 (2006).
Piccaluga et al., “Gene Expression analysis of peripheral T cell Lymphoma, unspecified reveals distinct profiles and new potential therapeutic targets,” The Journal of Clinical Investigation, vol. 117, pp. 823-834 (2007).
Pietu et al., “Novel Gene transcripts preferentially expressed in Human Muscles revealed by quatitative hybridization of a High density cDNA Array,” Genome Research vol. 6, pp. 492-503 (1996).
Rabinowitz J. E. et al. (2002), “Cross-Packaging of a single adeno-associated virus (AAV) type 2 vector genome into AAV Serotypes enables transduction with broad specificity,” J Virol 76:791-801.
Ramkissoon et al., “Hematopoietic-specific microRNA expression in human cells,” Leukemia Research, vol. 30, pp. 643-647 (2006).
Rigby et al. (1977), “Labeling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I,” J. Mol. Biol. 113:237-251.
Rodriguez, A. et al. Requirement of bic/microRNA-155 for normal immune function. Science 316, 608-11 (2007).
Romkes et al., 2005, “Strategies for measurement of biotransformation enzyme gene expression,” Methods Mol Biol. ;291:387-98.
Rudel, S., Flatley, A., Weinmann, L., Kremmer, E., and Meister, G. (2008). A multifunctional human Argonaute2-specific monoclonal antibody. Rna 14, 1244-1253.
Samulski et al. (1987), “A Recombinant plasmid from which an infectious adeno-associated virus genome can be excised in vitro and its use to study viral replication,” J. Virol. 61:3096-3101.
Samulski et al. (1989), “Helper-Free stocks of recombinant adeno-associated viruses: normal integration does not require viral gene expression,” J. Virol. 63:3822-3826.
Sasaki et al., 2003, “Identification of eight members of argonaute family in the human genome,” Genomics 82, 323-330.
Schena et al., “Parallel human genome analysis: Microarray-based expression monitoring of 1000 genese,” Proc. Natl. Acad. sci. USA, vol. 93, pp. 10614-10619 (Oct. 1996).
Schena et al., “Quantitative monitoring of Gene Expression Patterns with a complementary DNA Microarray,” Science vol. 270, pp. 467-470 (1995).
Schmajuk et al., 1999, “Antisense oligonucleotides with different backbones,” J. Biol. Chem., 274, 21783-21789.
Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res 13, 103-7 (2003).
Shalon et al., “A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization,” Genome Research, vol. 6, pp. 639-645 (1996).
Sharbati-Tehrani, S., Kutz-Lohroff, B., Bergbauer, R., Scholven, J., and Einspanier, R. (2008). miR-Q: a novel quantitative RT-PCR approach for the expression profiling of small RNA molecules such as miRNAs in a complex sample. BMC Mol Biol 9, 34.
Shivdasani RA, 2006, “MicroRNAs: regulators of gene expression and cell differentiation,” Blood 108(12):3646-53.
Stein and Cheng (1993), “Antisense Oligonucleotides as therapeutic agents—Is the bullet really magical?,” Science 261:1004.
Stein et al., 1997, “A specificity comparison of four antisense types: morpholino, 2′-O-Methyl RNA, DNA and Phosphorothioate,DNA,” Antisense N. A. Drug Dev. , 7, 151.
Takeshita and Ochiva (2006), “Therapeutic potential of RNA interference against cancer,” Cancer Sci. 97(8):689-96.
Teng, G., et al. (2008). MicroRNA-155 is a negative regulator of activation-induced cytidine deaminase. Immunity 28, 621-629.
Thai, T. H., et al. (2007). Regulation of the germinal center response by microRNA-155. Science 316, 604-608.
Thomson, J. M. et al. Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20, 2202-7 (2006).
Troyanskaya et al., A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae), PNAS, vol. 100, pp. 8348-8353 (Jul. 2003).
Tuerk and Gold (1990), “Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase,” Science 249:505.
Tuschl (2002), “Expanding small RNA interference,” Nat. Biotechnol, 20:446-448.
Venturini et al., “Expression of the miR-17-92 polycistron in chronic myeloid leukemia (CML) CD34+ cells,” Blood, vol. 109, pp. 4399-4405 (2007).
Woodward, “Immobilized Cells and Enzymes” (IRL Press, 1986).
Xia et al. (2002), “siRNA-mediated gene silencing in vitro and in vivo,” Nat. Biotech. 20:1006-1010.
Xiao, C. et al. MiR-150 Controls B Cell Differentiation by Targeting the Transcription Factor c-Myb. Cell 131, 146-59 (2007).
Xu and Li (2007), “MicroRNA gene expression in matignant lymphoproliferative disorders,” Chin Med J (Engl). 120(11):996-9.
Yu et al., “RNA interference by expression of short-interfering RNAs and Hairpin RNAs in mammalian cells,” Proc Natl Acad Sci USA, 99:6047-52.
Yu et al., 2002, “Unique MicroRNA signature and clinical outcome of cancers,” DNA and Cell Biology, vol. 26, pp. 283-292.
Zaug and Cech, 1986, “The intervening Sequence RNA of tetrahymena is an enzyme,” Science, 231:470-475.
Zaug, et al., 1984, “A labile phosphodiester bond at the ligation junction in a circular intervening sequence RNA,” Science, 224:574-578.
Zeng et al. (2002), “Both natural and designed micro RNAs can inhibit the expression of cognate mRNAs when expressed in human cells,” Molecular Cell 9:1327-1333.
Zhang et al., 2007 “MicroRNAs as oncogenes and tumor suppressors,” Dev Biol. 302(1):1-12.
Zhao et al., “High-density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression,” Gene, vol. 156, pp. 207-213 (1995).
Zhou et al., “miR-150, a microRNA expressed in mature B and T cells, blocks early B cell development when expressed prematurly,” PNAS vol. 104, pp. 7080-7085 (Apr. 2007).
Related Publications (1)
Number Date Country
20100197772 A1 Aug 2010 US
Provisional Applications (2)
Number Date Country
60950474 Jul 2007 US
61020625 Jan 2008 US
Continuation in Parts (1)
Number Date Country
Parent PCT/US2008/070082 Jul 2008 US
Child 12688680 US