RNA CONTAINING COMPOSITIONS AND METHODS OF THEIR USE

FIELD OF THE INVENTION

The present invention relates to RNA containing compositions and methods of their use.

BACKGROUND OF THE INVENTION

The recent development of total RNA sequencing has allowed a better appreciation of the complexity and breadth of the entire transcriptome (Djebali et al., “Landscape of Transcription in Human Cells,” Nature 48:101-108 (2012); ENCODE Project Consortium, “An Integrated Encyclopedia of DNA Elements in the Human Genome,” Nature 489:57-74 (2012); Harrow et al., “GENCODE: The Reference Human Genome Annotation for the ENCODE Project,” Genome Res. 22:1760-1774 (2012), and Martin et al., “Next-Generation Transcriptome Assembly,” Nature Rev. Genet. 12:671-682 (2011)). Analysis by the Encyclopedia of DNA Elements (“ENCODE”) consortium unexpectedly showed that far more of the mammalian genome than previously appreciated is transcribed into non-coding RNA (“ncRNA”). Several short ncRNA have conserved metabolic and regulatory functions and some anti-viral properties have been assigned to novel classes of ncRNA such as eukaryotic small-interfering RNA, piwi interacting RNA, and prokaryotic CRISPR RNA (Rinn et al., “Genome Regulation by Long Noncoding RNAs,” Ann. Rev. Biochem. 81:145-66 (2012)). In eukaryotes, long non-coding RNA (“lncRNA”), such as long-intergenic non-coding RNA, have been associated with transcriptional, post-transcriptional, and epigenetic regulation (Atianand et al., “Molecular Basis of DNA Recognition in the Immune System,” J. Immunol. 190:1911-1918 (2013) and Zhang et al., “The Ways of Action of Long Non-Coding RNAs in Cytoplasm and Nucleus,” Gene 547:1-9 (2014)).

It is now evident that germ line and cancer cells can have atypical ncRNA transcription, including repetitive elements from regions usually silenced in steady state (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) and Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011)). In eukaryotes, transcription of endogenous retroviruses and mobile elements is mostly repressed epigenetically through processes such as histone modification and DNA methylation, preventing disruptive or deregulatory effects due to integration into coding regions. In mammals, DNA methylation targets the cytidine in CpG motifs to form 5-methyl cytosine contributing to down-regulation of transcription for methylated sequences (Jones et al., “The Role of DNA Methylation in Mammalian Epigenetics,” Science 293:1068-1070 (2001)). Epigenetic regulation is strongly associated with developmental process whereas its deregulation, such as by disruption of DNA methylation, can be associated with de-differentiation and carcinogenic processes (Feinberg et al., “The History of Cancer Epigenetics,” Nature Rev. Cancer 4:143-153 (2004) and Yi et al., “Multiple Roles of p53-Related Pathways in Somatic Cell Reprogramming and Stem Cell Differentiation,” Cancer Res. 72:5635-5645 (2012)).

When expressed, endogenous retroviral RNA can activate the innate immune response via several pathways (Zeng et al., “MAVS cGAS and Endogenous Retroviruses in T-independent B Cell Responses,” Science 346:1486-1492 (2014)). In cancers, such as those driven by p53 mutations and epigenetic alterations, ncRNA associated with repetitive elements can be induced (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) and Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011)). In a study of mouse and human epithelial malignancies (Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011)), several repetitive elements emanating from genomic dark matter and often repressed in steady state conditions, particularly pericentromeric repeats such as GSAT (major satellite) in mouse and HSATII in humans, were only transcribed in cancer cells. A strong induction of repetitive elements from the mouse genome (particularly GSAT, B1, and B2) along with several other ncRNAs in cells bearing p53 oncogenic mutations and exposed to epigenome altering demethylating agents has been demonstrated (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013)). Anomalous expression of the murine repetitive element GSAT was shown to trigger transcription of repeat-dependent activated interferon response (TRAIN), which can regulate apoptosis related cell death. The mechanism is that the double strands form immediately via bi-directional transcription. That is, as GSAT is being transcribed in the positive sense by one polymerase (pol II) its complementary DNA strand is also being transcribed by pol-III at the same time. In this model, there is never single stranded GSAT transcribed; the double stranded RNA is formed during RNA transcription. There has been no indication in Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) or elsewhere that single stranded RNA GSAT would be immunostimulatory.

The present invention is directed to overcoming these and other deficiencies in the art.

SUMMARY OF THE INVENTION

One aspect of the present invention relates to a composition comprising an isolated, single-stranded RNA molecule having a nucleotide sequence comprising 20 or more bases and a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero, and a pharmaceutically acceptable carrier suitable for injection.

Another aspect of the present invention relates to a kit comprising a cancer vaccine and the composition of the present invention as an adjuvant to the cancer vaccine.

A further aspect of the present invention relates to a method of treating a subject for a tumor. This method involves administering to a subject the composition of the present invention (i.e., a composition comprising an isolated, single stranded RNA molecule having a nucleotide sequence comprising 20 or more bases and a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero, and a pharmaceutically acceptable carrier suitable for injection) under conditions effective to treat the subject for the tumor.

Another aspect of the present invention relates to a method of stimulating an immune response. This method involves providing the composition of the present invention (i.e., a composition comprising an isolated, single-stranded RNA molecule having a nucleotide sequence comprising 20 or more bases and a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero, and a pharmaceutically acceptable carrier suitable for injection) and contacting a cell or tissue with the composition under conditions effective to induce or increase an immune response against cancer in the cell or tissue.

A set of novel mathematical tools originally developed to analyze potentially immunostimulatory motif usage in viral and host genome coding sequences was used here. These methods were recently recast in the language of statistical physics and are extended here to analyze ncRNA motif usage (Greenbaum et al., “Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses,” PLoS Path. 4:e1000079 (2008) and Greenbaum et al., “Quantitative Theory of Entropic Forces Acting on Constrained Nucleotide Sequences Applied to Viruses,” Proc. Natl. Acad. Sci. 111:5054-5059 (2014)). For the first time, large-scale patterns of motif usage in human and murine transcriptomes, which are used to find anomalies ncRNA expressed in cancer transcriptomes (Rinn et al., “Genome Regulation by Long Noncoding RNAs,” Ann. Rev. Biochem. 81:145-66 (2012) and Ulitsky et al., “lincRNAs: Genomics Evolution and Mechanisms,” Cell 154:26-46 (2013)), were analyzed. As a result, features of ncRNA over-expressed in cancerous cells relative to normal cells were characterized (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013); Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011); Levine et al., “The maintenance of epigenetic states by p53: the guardian of the epigenome,” Oncotarget 3:1503-1504 (2012)). This analysis includes several large datasets of functionally characterized ncRNA, in addition to pseudogenes and repetitive elements such as satellite DNA, endogenous retroviruses, and long and short interspersed elements. It is demonstrated that many ncRNAs preferentially expressed in cancerous cells display anomalous motif usage patterns compared to the vast majority of ncRNAs whose patterns of motif usage are shown to be consistent with those in coding regions. Based on their unusual pattern of motif usage and differential expression in cancerous versus normal cells, it is predicted that the ncRNA HSATII (human) and the nRNA GSAT (murine) incorporate immunostimulatory motifs in humans and mice respectively. Remarkably, the prediction demonstrating that both directly stimulate antigen-presenting cells and accordingly label them immunostimulatory ncRNAs (“i-ncRNAs”) is validated.

Other features and advantages of the invention will be apparent from the following detailed description and claims.

BRIEF DESCRIPTION OF DRAWINGS

FIGS. 1A-B demonstrate that ncRNA expressed in cancer differ from general lncRNA motif usage patterns. FIG. 1A shows the fraction of GENCODE human lncRNA sequences where a motif occurs the expected number of times as defined by corresponding to a probability p greater than 0.05 (EQUATION 5). FIG. 1B is a graph showing the fraction of GENCODE lncRNA sequences in humans and mice where the occurrence of CpG motifs occurs the expected number of times compared to those expressed in human cancerous cells and mouse cancer cell lines.

FIGS. 2A-B are graphs demonstrating that CpG and UpA are generally under-represented in ncRNA. FIG. 2A shows the histogram of forces (i.e., strength of statistical bias) on CpG, and FIG. 2B shows the histogram of forces (i.e., strength of statistical bias) on UpA, both for lncRNA from the GENCODE human transcript database. These forces (i.e., strengths of statistical bias) are consistent with those observed in mice and those from coding regions.

FIGS. 3A-B demonstrate that forces (i.e., strengths of statistical bias) on CpG and UpA dinucleotides are independent. FIG. 3A is a graph showing the least principal components for all significant forces (i.e., strengths of statistical bias) on motifs for human GENCODE ncRNA, and FIG. 3B shows the least principal components for all significant forces (i.e., strengths of statistical bias) on motifs for mouse GENCODE ncRNA. In both cases, CpG and UpA dominantly project onto the two least axes of variation.

FIGS. 4A-B demonstrate that GSAT is expressed in mouse testicular teratoma and liposarcoma by showing the study results of the relative levels of expression of GSAT RNA by a custom Taqman assay in normal murine tissue versus murine tumor tissue samples. FIG. 4A is a graph showing results from the testicular teratoma tumor mouse models. FIG. 4B is a graph showing results from the liposarcoma induced tumor in p53KO background. In all instances, GSAT levels were increased in the tumor samples as compared to normal samples, to varying degrees.

FIGS. 5A-D demonstrate that ncRNA from cancer cells contain outliers from normal motif usage. The distribution of the strength (force) of statistical bias is shown for UpA and CpG (FIGS. 5A-B) and CAG and CUG (FIGS. 5C-D) in lncRNA taken from human tumors (FIG. 5A and FIG. 5C) and murine cell lines (FIG. 5B and FIG. 5D), (dark data points), plotted against lncRNA from GENCODE (light grey data points). Each ellipse indicates one standard deviation from the mean value in the GENCODE dataset.

FIGS. 6A-C demonstrate that ncRNA require transfection to induce cellular innate immune responses. 2 ug/ml of the various ncRNA (HSATII, HSATII-sc; GSAT; GSAT-sc) were used to stimulate human DCs in 96 well plates with (DOTAP) or without (NT) the use of DOTAP as a gentle liposomal transfection reagent. In absence of transfection reagent, the ncRNA were not sensed by the DCs whereas transfected immunogenic ncRNA HSATII and GSAT, in addition to Poly-IC and R848, were properly sensed and induced a cellular inflammatory response in TNFalpha (FIG. 6A), IL-12 (FIG. 6B), and IL-6 (FIG. 6C).

FIG. 7 is a schematic illustration showing the innate immune pathways involved in the sensing of nucleic acids which were investigated in the work described herein. MYD88 and UNC93b were directly implicated in i-ncRNA sensing.

FIGS. 8A-B demonstrate that i-ncRNA stimulates human moDC cytokine production. Quantification of inflammatory cytokine production upon liposomal transfection of human in human i-ncRNA (HSATII) and murine i-ncRNA (GSAT) versus their scrambled and endogenous controls is shown for human moDCs in FIG. 8A and murine imBM in FIG. 8B. Each point represents the mean value of the experimental replicates for each individual condition; the bar represents the median. The significance of i-ncRNA stimulation is analyzed by the non-parametric Mann-Whitney test to compare their effect versus their scrambled and endogenous controls.

FIGS. 9A-C demonstrate that human moDCs and mouse imBM cells respond to common PAMPs and DAMPs. Quantification of inflammatory cytokine production in human moDCs is shown in the graphs of FIG. 9A, and in murine imBM in the graph of FIG. 9B, upon stimulation with common PAMPs or DAMPs known to activate PRR innate immune pathways, which are listed in the Examples infra. Each point represents the mean value of the experimental replicates for each individual condition; the bar represents the median. FIG. 9C is a heat map showing the inflammatory response related to type I IFN pathway induction in imBM upon stimulation of the PRR related innate immune pathways analyzed by qRT-PCR. The heat-map represents the log of the relative expression of each gene based on relative quantification analysis using the ddCT bi-dimensional normalization method (housekeeping genes and non-stimulated cells).

FIGS. 10A-C demonstrate that MYD88 and UNC93b control GSAT i-ncRNA stimulation. FIGS. 10A-C are graphs showing the results of genetic screening of the innate immune pathway related to i-ncRNA function in murine imBM. imBM cells of different genotype (WT (FIG. 10A), MYD88 KO (FIG. 10B), and UNC93b3d/3d MUT (FIG. 10C)) have been stimulated by liposomal transfection of the murine i-ncRNA (GSAT). TNFa production in the supernatant has been quantified, and each point represents the mean value of the experimental replicates for each individual condition; the bar represents the median.

FIGS. 11A-B show that the genetic screen of innate immune pathways related to i-ncRNA function in murine imBM. FIG. 11A is a series of graphs showing imBM cells of different knockout genotypes related to TLR PRRs (TLR2-4 dbKO, TLR3 KO, TLR4 KO, TLR7 KO, TLR9 KO). FIG. 11B is a series of graphs showing imBM cells of different knockout genotypes related to STING, inflammasome, and MAV dependent helicases pathways (STING KO, MAV KO, ICE KO); and common innate immune signaling (TRIF KO, TRAM KO, IRF3/IRF7 dbKO). Cells have been stimulated by liposomal transfection of the murine i-ncRNA (GSAT). The TNFa production in the supernatant has been quantified and each point represents the mean value of the experimental replicates for each individual condition; the bar represents the median.

FIGS. 12A-B show the stimulation of KO and mutant imBM with common PAMPs and DAMPs. Quantification of inflammatory cytokine production in PRR KO imBM (FIG. 12A) and innate immune signaling related KO and mutant (FIG. 12B) upon stimulation with common PAMPs or DAMPs known to activate PRR innate immune pathways is shown. Each point represents the mean value of the experimental replicates for each individual condition; the bar represents the median.

FIG. 13 demonstrates that motif usage in HSATII and GSAT clusters with foreign RNA. A comparison of the forces (i.e., strengths of statistical bias) on CpG dinucleotides is plotted against the distribution of forces (i.e., strengths of statistical bias) on all GENCODE lncRNA relative to a sequences nucleotide bias. The force on CpG dinucleotides for HSATII and GSAT are shown on the distribution, along with the average values for the longest gene (PB2) in human influenza B and avian H5N1 and all E. coli coding regions.

FIGS. 14A-S show mouse repeat RNA sequences from the Repbase database with anomalous CpG motif usage.

FIGS. 15A-F show mouse ncRNA sequences from the ENCODE database with anomalous CpG motif usage.

FIGS. 16A-Y show human repeat RNA sequences from the Repbase database with anomalous CpG motif usage.

FIGS. 17A-L show human ncRNA repeat sequences from the ENCODE database with anomalous CpG motif usage.

DETAILED DESCRIPTION OF THE INVENTION

The invention described herein relates to RNA-containing compositions and methods of their use.

In a first aspect, the present invention relates to a composition comprising an isolated, single stranded RNA molecule having a nucleotide sequence comprising 20 or more bases and a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero, and a pharmaceutically acceptable carrier suitable for injection.

The composition of the present invention may be a pharmaceutical composition in the form of a vaccine, or a pharmaceutical composition intended to be co-administered with a vaccine, e.g., as an adjuvant.

In one embodiment, the RNA molecule in the composition of the present invention is an isolated RNA molecule. The term “isolated RNA molecule” includes RNA molecules which are separated from other nucleic acid molecules which are present in the natural source of the RNA. An “isolated” nucleic acid molecule is free of sequences which naturally flank the nucleic acid (i.e., sequences located at the 5′ and 3′ ends of the nucleic acid molecule). For example, in various embodiments, the isolated RNA molecule contains a defined number of bases. Moreover, an “isolated” nucleic acid molecule is substantially free of other cellular material, or culture medium, when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.

In one embodiment, the RNA molecule is a single-stranded RNA molecule.

In another embodiment, the composition comprises an isolated RNA molecule having a nucleotide sequence comprising 20 or more bases and a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero, with the proviso that the RNA molecule is not GSAT.

Suitable RNA molecules in the composition of the present invention include, without limitation, an RNA molecule having the nucleotide sequence of SEQ ID NOs:1-319, or a fragment thereof. Such RNA molecules can be isolated using standard molecular biology techniques and the sequence information provided herein. In one embodiment, using all or a portion of the nucleic acid sequence of SEQ ID NOs:1-319 as a hybridization probe, RNA molecules can be isolated using standard hybridization and cloning techniques (e.g., as described in Sambrook, J. et al. Molecular Cloning: A Laboratory Manual, 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989, which is hereby incorporated by reference in its entirety).

Moreover, an RNA molecule in the composition of the present invention can be isolated by the polymerase chain reaction (PCR) using synthetic oligonucleotide primers. In one embodiment, the primers are designed based upon the sequence (or a portion thereof) of any one or more of SEQ ID NOs:1-319.

The RNA molecule in the composition is an RNA molecule of about 20 or more bases in length. The length of the RNA molecule (i.e., the total number of bases) may vary depending on the pattern of CpG dinucleotides and the strength of statistical bias. In one embodiment, the RNA molecule has about 20-1200 bases, about 20-1100 bases, about 20-1000 bases, about 20-900 bases, about 20-800 bases, about 20-700 bases, about 20-600 bases, about 20-500 bases, about 20-450 bases, about 20-400 bases, about 20-350 bases, about 20-300 bases, about 20-250 bases, about 20-200 bases, about 20-190 bases, about 20-185 bases, about 20-180 bases, about 20-175 bases, about 20-170 bases, about 20-165 bases, about 20-160 bases, about 20-155 bases, about 20-150 bases, about 20-145 bases, about 20-140 bases, about 20-135 bases, about 20-130 bases, about 20-125 bases, about 20-120 bases, about 20-115 bases, about 20-110 bases, about 20-105 bases, about 20-100 bases, about 20-95, about 20-90, about 20-85, about 20-80 bases, about 20-75 bases about 20-70 bases, about 20-65 bases, about 20-60 bases about 20-55 bases, about 20-55 bases, about 20-50 bases, about 20-45 bases, about 20-40 bases, about 20-35 bases, or about 20-30 bases.

The RNA molecule of the composition has a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero. A physical system can be defined by the various states in which it can exist, and all the parameters involved in known constraints. When no assumption is made about the particular state the system is in, the system can be defined by the probability distribution of each of the states being occupied.

An RNA molecule with a pattern of motifs (e.g., CpG dinucleotides) can be defined by its length, nucleotide frequencies (i.e., the proportion of each nucleotide present in the sequence), and the number of times the motif is observed in the sequence. An RNA molecule of length L can take 4̂^Ldifferent states, with each of those states being characterized by a number of motifs.

When considering the probability of a number of motifs (e.g., CpG dinucleotides) observed in a particular sequence, a random-nucleotide model can be used to define the probability distribution of observing a given number of motifs in all 4̂^Lpossible sequences of length L, and with nucleotide frequencies according to the proportion observed in the given sequence. The random model gives rise to a distribution of states for such a sequence, each state having a number of motifs.

To quantify deviation of the particular observed sequence (i.e., state) from the random expectation, an additional parameter, referred to here as selective force, or simply force (e.g., force on CpG or force on UpA) may be added to the model. This additional parameter introduces a statistical bias in the probability distribution towards observing a particular state (i.e., a particular number of observed motifs). In the absence of this statistical bias, the probability of a given state (i.e., the number of observed motifs in a particular sequence) simplifies to the product of its nucleotide frequencies, whereas positive force shifts the distribution towards a larger number of observed motifs than what one would expect under the purely random model. Given a particular sequence, the “strength of statistical bias” is defined herein as the value of the force that maximizes the probability of the observed sequence. That is, the strength of statistical bias is the value for the force that results in a probability distribution of the number of motifs for a given sequence with length L and nucleotide frequencies such that the mean of the probability distribution is equal to the observed number of motifs in the sequence, as demonstrated in Example 5 (infra).

The larger the deviation of the number of the motifs observed in a given sequence is from random, the larger the force required to generate a distribution in which the number of observed motifs in the sequence is equal to the mean of the distribution.

The strength of statistical bias can be used as a parameter for identifying anomalous (i.e., outlier) states in a system, including anomalous use of motifs (e.g., CpG dinucleotides and other dinucleotide or trinucleotide repeats) in nucleotide sequences. In order to identify outliers, one must identify a threshold for which any strength of statistical bias that meets or exceeds the threshold will be considered anomalous. In order to identify a threshold, one may generate the distribution of observed strengths of statistical bias against a collection of samples chosen to represent the system (i.e., a reference set or panel). For example, a reference set for nucleotide sequences may include a set of biologically similar sequences, such as non-coding RNAs drawn from a database, such as the ENCODE database, as described in the Examples (infra). After the distribution of observed strengths of statistical bias is generated, it may be fit to a Gaussian distribution, characterized by a mean and standard deviation, and utilized as a null hypothesis (i.e., null distribution) against which to test the strength of statistical bias on any single sample. Once a statistical threshold is set, the identification of anomalous states may be carried out based only on the strength of statistical bias for the particular state in question, without the use of a reference set.

The present invention, as demonstrated in Example 6 (infra), has defined the statistical threshold for identifying sequences with anomalous patterns of CpG dinucleotides as those sequences having a strength of statistical bias greater than or equal to zero.

Specific exemplary RNA molecules of the composition include, without limitation, SEQ ID NOs:1-96 (FIGS. 14A-S), SEQ ID NOs:97-120 (FIGS. 15A-F), SEQ ID NOs:121-255 (FIGS. 16A-Y), SEQ ID NOs:256-319 (FIGS. 17A-L), and immunostimulating fragments thereof.

The RNA molecule in the composition of the present invention has an immunostimulating effect on cells, including tumor cells. As used herein, the term “immunostimulating effect” or “stimulating an immune response” includes eliciting an immune response, e.g., inducing or increasing T cell-mediated and/or B cell-mediated immune responses that are influenced by modulation of T cell costimulation. Exemplary immune responses include B cell responses (e.g., antibody production), T cell responses (e.g., cytokine production, and cellular cytotoxicity), and activation of cytokine responsive cells, e.g., macrophages. Eliciting an immune response includes an increase in any one or more immune responses. It will be understood that upmodulation of one type of immune response may lead to a corresponding downmodulation in another type of immune response. For example, upmodulation of the production of certain cytokines (e.g., IL-10) can lead to downmodulation of cellular immune responses. The RNA molecule elicits an immunostimulating effect on immune cells. As used herein, the term “immune cell” includes cells that are of hematopoietic origin and that play a role in the immune response. Immune cells include lymphocytes, such as B cells and T cells; natural killer cells; and myeloid cells, such as monocytes, macrophages, eosinophils, mast cells, basophils, and granulocytes. The term “T cell” includes CD4+ T cells and CD8+ T cells. The term T cell also includes both T helper 1 type T cells and T helper 2 type T cells.

In formulating the RNA-containing composition of the present invention, the amount of RNA molecule included in the composition will vary depending on the choice of RNA molecule, its immunostimulating activity, and its intended treatment and subject.

In the composition of the present invention, the RNA molecule is incorporated into pharmaceutical compositions suitable for administration (e.g., by injection). Such compositions typically comprise the RNA molecule and a carrier, e.g., a pharmaceutically acceptable carrier. The pharmaceutically acceptable carrier suitable for injection is, according to one embodiment, a carrier for the RNA molecule. As used herein the language “pharmaceutically acceptable carrier” is intended to include any and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, compatible with pharmaceutical administration. Supplementary active compounds can also be incorporated into the compositions.

The pharmaceutically acceptable carrier may be a stabilizer, an emulsion, liposome, microsphere, immune stimulating complex, nanospheres, montanide, squalene, cyclic dinucleotides, complementary immune modulators, or any combination thereof. The carrier should be suitable for the desired mode of delivery of the composition (i.e., suitable for injection). Exemplary modes of delivery include, without limitation, intravenous injection, intra-arterial injection, intramuscular injection, intracavitary injection, subcutaneously, intradermally, transcutaneously, intrapleurally, intraperitoneally, intraventricularly, intra-articularly, intraocularly, intratumorally, or intraspinally.

A pharmaceutical composition of the invention is formulated to be compatible with its intended route of administration. Solutions or suspensions used for parenteral, intradermal, or subcutaneous application can include the following components: a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene glycol, or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates, or phosphates; and agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral preparation can be enclosed in ampoules, disposable syringes, or multiple dose vials made of glass or plastic.

Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions (where water soluble) or dispersions and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersion. For intravenous administration, suitable carriers include physiological saline, bacteriostatic water, Cremophor EL™ (BASF, Parsippany, N.J.) or phosphate buffered saline (PBS). The composition must be sterile and should be fluid to the extent that easy syringeability exists. It must be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, liquid polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the required particle size in the case of dispersion and by the use of surfactants. Prevention of the action of microorganisms can be achieved by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. It may be preferable to include isotonic agents, for example, sugars, polyalcohols such as manitol, sorbitol, and sodium chloride in the composition. Prolonged absorption of the injectable compositions can be brought about by including in the composition an agent which delays absorption, for example, aluminum monostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating the active compound (i.e., RNA molecule) in the required amount in an appropriate solvent with one or a combination of ingredients enumerated above, as required, followed by filtered sterilization. Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle which contains a basic dispersion medium and the required other ingredients from those enumerated above. In the case of sterile powders for the preparation of sterile injectable solutions, the preferred methods of preparation are vacuum drying and freeze-drying which yields a powder of the active ingredient plus any additional desired ingredient from a previously sterile-filtered solution thereof.

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease of administration and uniformity of dosage. Dosage unit form as used herein refers to physically discrete units suited as unitary dosages for the subject to be treated; each unit containing a predetermined quantity of active compound (i.e., RNA molecule) calculated to produce the desired therapeutic effect in association with the required pharmaceutical carrier. The specification for the dosage unit forms of the invention are dictated by and directly dependent on the unique characteristics of the active compound and the particular therapeutic effect to be achieved, and the limitations inherent in the art of compounding such an active compound for the treatment of individuals.

Toxicity and therapeutic efficacy of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals. The data obtained from the cell culture assays and animal studies can be used in formulating a range of dosage for use in humans. The dosage of such compounds lies preferably within a range of circulating concentrations that include the ED₅₀with little or no toxicity. The dosage may vary within this range depending upon the dosage form employed and the route of administration utilized. For any compound used in the methods of the invention (described infra), the therapeutically effective dose can be estimated initially from cell culture assays. A dose may be formulated in animal models to achieve a circulating plasma concentration range that includes the IC₅₀(i.e., the concentration of the test compound which achieves a half-maximal activity) as determined in cell culture. Such information can be used to more accurately determine useful doses in humans. Levels in plasma may be measured, for example, by high performance liquid chromatography.

As defined herein, a therapeutically effective amount of an RNA molecule (i.e., an effective dosage) ranges from about 0.001 to 30 mg/kg body weight, or about 0.01 to 25 mg/kg body weight, or about 0.1 to 20 mg/kg body weight, or about 1 to 10 mg/kg, 2 to 9 mg/kg, 3 to 8 mg/kg, 4 to 7 mg/kg, or 5 to 6 mg/kg body weight. The skilled artisan will appreciate that certain factors may influence the dosage required to effectively treat a subject, including but not limited to, the severity of the disease or disorder, previous treatments, the general health and/or age of the subject, and other diseases present. Moreover, treatment of a subject with a therapeutically effective amount of an agent can include a single treatment or, preferably, can include a series of treatments.

In one embodiment, a subject is treated with the composition of the present invention in the range of between about 0.1 to 20 mg/kg body weight, one time per week for between about 1 to 10 weeks, preferably between 2 to 8 weeks, more preferably between about 3 to 7 weeks, and even more preferably for about 4, 5, or 6 weeks. It will also be appreciated that the effective dosage of composition used for treatment may increase or decrease over the course of a particular treatment. Changes in dosage may result and become apparent from the results of diagnostic assays.

In one embodiment, nucleic acid molecules can be inserted into vectors and used as gene therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, intravenous injection, local administration (U.S. Pat. No. 5,328,470, which is hereby incorporated by reference in its entirety) or by stereotactic injection (Chen et al., “Regression of Experimental Gliomas by Adenovirus-Mediated Gene Transfer In Vivo,” Proc. Natl. Acad. Sci. USA 91:3054-3057 (1994), which is hereby incorporated by reference in its entirety). The pharmaceutical preparation of the gene therapy vector can include the gene therapy vector in an acceptable diluent or can comprise a slow release matrix in which the gene delivery vehicle is imbedded. Alternatively, where the complete gene delivery vector can be produced intact from recombinant cells, e.g., retroviral vectors, the pharmaceutical preparation can include one or more cells which produce the gene delivery system. The pharmaceutical compositions can be included in a container, pack, or dispenser together with instructions for administration.

The composition of the present invention can also include an effective amount of an additional adjuvant or mitogen.

Suitable additional adjuvants include, without limitation, Freund's complete or incomplete, mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, Bacille Calmette-Guerin, Carynebacterium parvum, non-toxic Cholera toxin, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP), N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanme-2-(r-2′-dipalmitoyl-s-n-glycero-3-hydroxyphosphoryloxy)-ethylamine (CGP 19835 A, referred to as MTP-PE), and RIBI, which contains three components extracted from bacteria, monophosphoryl lipid A, trehalose dimycolate, and cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/TWEEN® 80 emulsion.

As used herein, “mitogen” refers to any agent that stimulates lymphocytes to proliferate independently of an antigen. The mitogen, in combination with the RNA molecule in the composition of the present invention helps to promote an immunostimulating effect on tumor cells. Exemplary mitogen include, without limitation, CpG oligodeoxynucleotides that stimulate immune activation as described in U.S. Pat. No. 6,194,388; U.S. Pat. No. 6,207,646; U.S. Pat. No. 6,214,806; U.S. Pat. No. 6,218,371; U.S. Pat. No. 6,239,116; U.S. Pat. No. 6,339,068; U.S. Pat. No. 6,406,705; and U.S. Pat. No. 6,429,199, each of which is hereby incorporated by reference in its entirety. Any suitable dosage of mitogen can be used to promote an immunostimulating effect on tumor cells. For example, a suitable dosage of mitogen comprises about 50 ng up to about 100 μg per ml, about 100 ng up to about 25 μg per ml, or about 500 ng up to about 5 μg per ml.

The composition may also include an antigen or an antigen-encoding RNA molecule. As used herein, “antigen” refers to any agent that induces an immune response, i.e., a protective immune response, against the antigen, and thereby affords protection against a pathogen or disease (e.g., cancer). The antigen can take any suitable form including, without limitation, whole virus or bacteria; virus-like particle; anti-idiotype antibody; bacterial, viral, or parasite subunit vaccine or recombinant vaccine; and bacterial outer membrane (“OM”) bleb formations containing one or more of bacterial OM proteins.

The antigen can be present in the compositions in any suitable amount that is sufficient to generate an immunologically desired response. The amount of antigen or antigen-encoding RNA molecule to be included in the composition will depend on the immunogenicity of the antigen itself and the efficacy of any adjuvants co-administered therewith. In general, an immunologically or prophylactically effective dose comprises about 1 μg to about 1,000 μg of the antigen, about 5 μg to about 500 μg, or about 10 μg to about 200 μg.

According to another embodiment, the composition (i.e., a first pharmaceutical composition) may further include a cancer vaccine (i.e., as a second pharmaceutical composition) that includes an antigen or a nucleic acid molecule encoding the antigen, and a pharmaceutically suitable carrier. According to this embodiment, the first pharmaceutical composition is intended to be co-administered with the second pharmaceutical composition for purposes of enhancing the efficacy of the vaccine. The first pharmaceutical composition is formulated for and/or administered in a manner that achieves an immunostimulating effect on tumor cells.

Cancer vaccines are known, and include, for example, sipuleucel-T (Provenge®, manufactured by Dendreon), which is approved for use in some men with metastatic prostate cancer. This vaccine is designed to stimulate an immune response to prostatic acid phosphatase (“PAP”), an antigen that is found on most prostate cancer cells. Sipuleucel-T is customized to each patient. The vaccine is created by isolating immune system cells called antigen-presenting cells (“APCs”) from a patient's blood through a procedure called leukapheresis. The APCs are sent to Dendreon, where they are cultured with a protein called PAP-GM-CSF. This protein consists of PAP linked to another protein called granulocyte-macrophage colony-stimulating factor (GM-CSF). The latter protein stimulates the immune system and enhances antigen presentation. APC cells cultured with PAP-GM-CSF constitute the active component of sipuleucel-T. Each patient's cells are returned to the patient's treating physician and infused into the patient, Patients receive three treatments, usually 2 weeks apart, with each round of treatment requiring the same manufacturing process. Although the precise mechanism of action of sipuleucel-T is not known, it appears that the APCs that have taken up PAP-GM-CSF stimulate T cells of the immune system to kill tumor cells that express PAP.

Vaccines to prevent HPV infection and to treat several types of cancer are being studied in clinical trials. Active clinical trials of cancer treatment vaccines include vaccines for bladder cancer, brain tumors, breast cancer, cervical cancer, Hodgkin lymphoma, kidney cancer, leukemia, lung cancer, melanoma, multiple myeloma, non-Hodgkin lymphoma, pancreatic cancer, prostate cancer, and solid tumors. Active clinical trials of cancer preventive vaccines include those for cervical cancer and solid tumors. Cancer vaccines approved from these and other trials may be suitable cancer vaccines for use in combination with the composition of the present invention.

Another aspect of the present invention relates to a kit comprising a cancer vaccine and the composition of the present invention, as well as instructions and a suitable delivery device, which can optionally be pre-filled with the vaccine formulation (i.e., the composition of the present invention and the cancer vaccine). An exemplary delivery device includes, without limitation, a syringe comprising an injectable dose.

In one embodiment of this and other methods described herein, the subject is a mammal including, without limitation, humans, non-human primates, dogs, cats, rodents, horses, cattle, sheep, and pigs. Both juvenile and adult mammals can be treated. The subject to be treated in accordance with the present invention can be a healthy subject, a subject with a tumor, a subject with cancer, a subject being treated for cancer, a subject in cancer remission, or a subject that has an immune deficiency or is immunosuppressed. Although otherwise healthy, the elderly and the very young may have a less effective (or less developed) immune system and they may benefit greatly from the enhanced immune response.

Tumors include, without limitation, sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, or carcinoma cell tumors.

In carrying out this and the other methods described herein, administering may be carried out as described supra, including, for example, intratumorally or systemically using a pharmaceutical composition as described supra, and amounts, dosages, and administration frequencies described supra.

A further aspect of the present invention relates to a method of stimulating an immune response against cancer in a cell or tissue. This method involves providing the composition of the present invention and contacting a cell or tissue with the composition under conditions effective to stimulate an immune response against cancer in the cell or tissue.

Cancers suitable for treatment in carrying out this aspect of the present invention include, for example and without limitation, those that are incident to pathogen infection, e.g., cervical cancer, vaginal cancer, vulvar cancer, oropharyngeal cancers, anal cancer, penile cancer, and squamous cell carcinoma of the skin caused by papillomavirus infection (D'Souza et al, “Case-Control Study of Human Papillomavirus and Oropharyngeal Cancer,” NEJM 356(19):1944-1956 (2007); Harper et al., “Sustained Immunogenicity and High Efficacy Against HPV 16/18 Related Cervical Neoplasia: Long-term Follow up Through 6.4 Years in Women Vaccinated with Cervarix (GSK's HPV-16/18 AS04 candidate vaccine),” Gynecol. Oncol. 109:158-159 (2008), each of which is hereby incorporated by reference in its entirety) and liver cancer caused by Hepatitis B virus infection (Chang et al., “Decreased Incidence of Hepatocellular Carcinoma in Hepatitis B Vaccines: A 20-Year Follow-up Study,” J. Natl. Cancer Inst. 101:1348-1355 (2009), which is hereby incorporated by reference in its entirety) and Hepatitis C virus infection, Burkitt lymphoma, non-Hodgkin lymphoma, Hodgkin lymphoma, nasopharyngeal carcinoma caused by the Epstein-Barr virus, Kaposi sarcoma caused by the Kaposi sarcoma-associated herpesvirus, adult T-cell leukemia/lymphoma, caused by the human T-cell lymphotropic virus type 1, stomach cancer, mucosa-associated lymphoid tissue lymphoma caused by the bacterium Helicobacter pylori, bladder cancer caused by the parasite Schistosoma hematobium, and cholangiocarcinoma caused by the parasite Opisthorchis viverrini. An enhanced immune response achieved by the methods of treatment and compositions of the present invention may enhance the preventative efficacy of such vaccines for the prevention of cancers.

In one embodiment this and other methods of the present invention are carried out to treat cancers that have already developed in a subject. Thus, the methods and compositions of the present invention are intended to delay or stop cancer cell growth: to cause tumor shrinkage; to prevent cancer from coming back: or to eliminate cancer cells that have not been killed by other forms of treatment.

According to one embodiment, a composition to be administered includes the antigen that is intended to generate the desired immune response as well as the RNA molecule having a pattern of CpG dinucleotides defined by a strength of statistical bias greater than or equal to zero. Thus, the antigen and the RNA molecule are co-administered simultaneously. The composition may be administered as a vaccine in a single dose or in multiple doses, which can be the same or different.

This embodiment may optionally include further administration of a composition of the present invention that includes the RNA molecule but not the antigen. This composition can be administered once or twice daily within several days preceding vaccine administration and for a period of time following vaccine administration. By way of example, post-vaccine administration can be carried out for up to about six weeks following each vaccine administration, preferably at least about two to three weeks, or at least about 3 to 10 days following each vaccine administration.

According to a second embodiment, a vaccine composition to be administered includes the antigen that is intended to generate the desired immune response but not the RNA molecule. However, the RNA molecule can be co-administered at about the same time. For instance, the dosage of the vaccine can be administered interperitoneally or intransally, and a dosage of the RNA molecule can be administered orally at about the same time (same day). The dosage containing the RNA molecule can also be once or twice administered daily for up to about six weeks following the vaccine administration.

In carrying out this method of the present invention, contacting the cell or tissue with the composition may be carried out in vitro or in vivo.

According to another aspect of the present invention, the RNA-containing composition has an immunostimulating effect that primes (e.g., stimulates, induces, enhances, alters, or modulates) the anti-pathogen response of a subject's innate immune system in non-tumor cells. Such a response may find use, e.g., as an adjuvant to a vaccine, a vaccine supplement, or under conditions where such an immunostimulating effect is desirable.

Yet a further aspect of the present invention relates to a method for identifying RNA molecules with immunostimulating patterns of CpG dinucleotides. This method involves providing an RNA molecule, determining the length and frequency of nucleotides in the RNA molecule, determining the number of CpG dinucleotides present in the RNA molecule, calculating the strength of statistical bias on CpG dinucleotides for the RNA molecule, defining a threshold of statistical bias, determining if the strength of statistical bias on CpG dinucleotides for the RNA molecule meets or exceeds the threshold, and characterizing the RNA molecule sequence as possessing an immunostimulating pattern if it meets or exceeds the threshold of statistical bias.

In carrying out this method of the present invention, nucleotide frequencies are calculated by counting the number of times that a nucleotide occurs and dividing that number by the total length of the sequence, L (which may also occur as ambiguously defined bases that cannot be assigned as A, C, G, U, or T). For example, f^θ(A), the frequency of A nucleotides, would be the number of occurrences of the base, A, in S₀divided by L, the length of S₀, even when ambiguous bases are included.

In a further embodiment, the strength of statistical bias on CpG dinucleotides for the RNA molecule sequence (x(S₀)) is determined by maximizing the probability of a sequence (S₀) over x, where

$\begin{matrix} P (S | x, m) = \frac{1}{Z_{m} (x)} \prod_{i = 1}^{L} f^{0} (s_{i}) \exp ({xN}_{m} (S)) & [EQUATION 1] \\ Z_{m} (x) = \sum_{sequence s} \prod_{i = 1}^{L} f^{0} (s_{i}) \exp ({xN}_{m} (S)) & [EQUATION 2] \end{matrix}$

Z_m(x) is the normalization constant,

P(S|x, m) is the probability of the sequence given the force (x) and motif m,

x is the force on the motif m that introduces a statistical bias over P,

N_m(S) is the number of observed motifs, and

f^θ(s_i) is the nucleotide frequencies.

Defining a threshold of statistical bias can be carried out by providing a reference set comprising a plurality of RNA molecule sequences, calculating the strength of statistical bias on CpG dinucleotides for each RNA molecule sequence in the reference set, generating a distribution of the strengths of statistical bias on CpG dinucleotides for the RNA molecule sequences in the reference set to define a null distribution, setting a statistical significance level, and determining the value of the strength of statistical bias that meets or exceeds the statistical significance value.

The present invention may be further illustrated by reference to the following examples, which should not be construed as limiting.

EXAMPLES
Example 1—General Motif Usage Patterns in lncRNAs

Using a novel approach from statistical physics, the experiments described herein quantify global transcriptome-wide motif usage for the first time in human and murine ncRNAs determining that most have motif usage consistent with the coding genome. However, an outlier subset of tumor-associated ncRNAs typically of recent evolutionary origin has motif usage that is often indicative of pathogen-associated RNA. For instance, as demonstrated in these examples, the tumor associated human repeat HSATII is enriched in motifs containing CpG dinucleotides in AU-rich contexts which most of the human genome and human adapted viruses have evolved to avoid. It is further demonstrated that a key subset of these ncRNAs function as immunostimulatory “self-agonists” and directly activate cells of the mononuclear phagocytic system to produce pro-inflammatory cytokines. These ncRNAs arise from endogenous repetitive elements that are normally silenced, yet are often very highly expressed in cancers. The innate response in tumors may partially originate from direct interaction of immunogenic ncRNAs expressed in cancer cells with innate pattern recognition receptors and thereby assign a new danger-associated function to a set of dark matter repetitive elements. These findings potentially reconcile several observations concerning the role of ncRNA expression in cancers and their relationship to the tumor microenvironment.

Employing the GENCODE database of long non-coding RNA transcripts from humans and mice (Versions 19 and 2 for human and mouse, respectively) the strength of statistical bias (referred to as a force) on sequence motif usage for all contained lncRNAs was calculated as described in Example 5 (infra). GENCODE lncRNA established a baseline of sequence motif usage expressed in a broad array of cells and tissues so that these patterns of motif usage could be compared with those of ncRNAs expressed in certain cancers. For each sequence, the force (i.e. strength of statistical bias) on all two and three nucleotide motifs was calculated using EQUATION 5 (infra) to calculate the probability of observing a sequence with that number of motifs. The number of sequences in GENCODE for which a given dinucleotide is aberrantly expressed is illustrated in FIG. 1A. CpG dinucleotides are vastly underrepresented, as indicated by their negative forces (i.e. strengths of statistical bias) in Table 1. UpA dinucleotides are often underrepresented though to a lesser extent. These patterns cannot be explained by nucleotide frequencies, such as GC content, which are accounted and normalized for with this method.

TABLE 1

Average Forces on Motifs are Similar between Humans and Mice

Human
Mouse

CG
−1.419
−1.3750

UA
−0.6040
−0.5480

ACG
−1.7586
−1.6216

CAG
0.5534
0.5612

CCG
−1.5095
−1.3287

CGA
−1.8995
−1.7082

CGC
−1.7304
−1.5525

CGG
−1.5110
−1.2629

CGU
−1.7833
−1.6463

CUG
0.6690
0.6748

GCG
−1.7480
−1.5592

GUA
−0.8632
−0.7451

UAC
−0.7368
−0.6298

UAG
−0.7330
−0.5920

UCG
−1.9391
−1.7049

Average force (i.e. strength of statistical bias) on a given motif in the Human and Mouse GENCODE dataset, for lncRNAs with length greater than 500 nucleotides. The forces (i.e. strengths of statistical bias) are listed for the significant motifs in humans. The force is a measure of the strength of statistical bias to enhance or suppress a motif versus what is expected from that sequence's nucleotide content.

These dinucleotide motif usage patterns are similar in human and mouse genomes across the wide array of cells and cell lines contained in GENCODE (Djebali et al., “Landscape of Transcription in Human Cells,” Nature 48:101-108 (2012) and Harrow et al., “GENCODE: The Reference Human Genome Annotation for the ENCODE Proejct,” Genome Res. 22:1760-1774 (2012), which are hereby incorporated by reference in their entirety). Strikingly, avoidance of the CpG and UpA dinucleotide motifs in this dataset is stronger than in coding regions (FIGS. 2A-B). One can conclude that the patterns previously observed in virus and host coding genes are not due to effects from coding regions, such as codon usage patterns (Coleman et al., “Virus Attenuation by Genome-Scale Changes in Codon Pair Bias,” Science 320:1784-1787 (2008); Mueller et al., “Live Attenuated Influenza Virus Vaccines by Computer-Aided Rational Design,” Nature Biotech. 28:723-726 (2010); Mueller et al., “Reduction of The Rate of Poliovirus Protein Synthesis Through Large-Scale Codon Deoptimization Causes Attenuation of Viral Virulence by Lowering Specific Infectivity,” J. Virol. 80:9687-9696 (2006), which are hereby incorporated by reference in their entirety). Rather, such constraints in coding regions likely weaken the strength of a statistical bias that comes from the same underlying mechanisms. This suggests selective restrictions on dinucleotide frequencies observed in ncRNAs preserving a function or avoiding a detrimental consequence such as a chronic autoinflammatory response that could result from presenting danger-associated molecular patterns (DAMPs). Adaptation of dinucleotide motif usage in these elements over time is analogous to the viral mimicry of host patterns of sequence motif usage (Greenbaum et al., “Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses,” PLoS Path 4:e1000079 (2008) and Karlin et al, “Why is CpG Suppressed in the Genomes of Virtually all Small eukaryotic Viruses but not in those of Large Eukaryotic Viruses?” J. Virol. 68:2889-2897 (1994), which are hereby incorporated by reference in their entirety). When an avian influenza virus enters the human population, one can observe adaptation to analogous patterns emerging over time (Greenbaum et al, “Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses,” PLoS Path. 4: e1000079 (2008); Greenbaum et al., “Quantitative Theory of Entropic Forces Acting on Constrained Nucleotide Sequences Applied to Viruses,” Proc. Natl. Acad. Sci. 111:5054-5059 (2014); Greenbaum et al, “Patterns of Oligonucleotide Sequences in Viral and Host cell RNA Identify Mediators of the Host Innate Immune System,” PLoS One 4:e5969 (2009); Jimenez-Baranda et al., “Oligonucleotide Motifs that Disappear During the Evolution of Influenza Virus in Humans Increase Alpha Interferon Secretion by Plasmacytoid Dendritic Cells,” J. Virol 85:3893-3904 (2011), which are hereby incorporated by reference in their entirety). In that case, mutation rates in influenza are very high so one can follow these evolutionary adaptations over far shorter time periods.

Trinucleotide motifs with significant forces are listed in Table 1, along with dinucleotide motifs. Trinucleotide motifs with significant forces (i.e. strengths of statistical bias) acting on them are conserved between humans and mice, as was the case for dinucleotides, with the exception of UAC and UAG (which are significant in humans but less so in mice). Except for UAG (chain termination codons used in coding RNAs), whenever a trinucleotide motif is significantly enhanced or avoided in humans its reverse complement is also significantly enhanced or avoided suggesting avoidance of complementary motifs. The strongest forces (i.e. strengths of statistical bias) suppress CpG and CpG-containing trinucleotides, particularly when an A or U is next to the core CpG motif. This is consistent with the avoidance of CpGs in AU contexts observed in influenza viruses replicating in humans (Greenbaum et al, “Quantitative Theory of Entropic Forces Acting on Constrained Nucleotide Sequences Applied to Viruses,” Proc. Natl. Acad. Sci. 111:5054-5059 (2014); Greenbaum et al, “Patterns of Olignonculeotide Sequences in Viral and Host Cell RNA Identify Mediators of the Host Innate Immune System,” PLoS One 4:e5969 (2009); Jimenez-Baranda et al., “Oligonucleotide Motifs that Disappear During the Evolution of Influenza Virus in Humans Increase Alpha Interferon Secretion by Plasmacytoid Dendritic Cells,” J. Virol. 85:3893-3904 (2011), which are hereby incorporated by reference in their entirety). Given the apparent bias against CpG and UpA, it was further determined if these were linked. Pearson correlation between these forces across all GENCODE ncRNA in humans and mice showed no correlation between CpG and UpA biases (r=0.0006; FIGS. 3A-B). Therefore, the forces on CpG and UpA are likely independent. Moreover, every significant trimer across GENCODE is correlated to CpG, UpA, or both. As a result, all significant trimers can be explained by their CpG or UpA motif usage.

Example 2—Cancer Enriched Non-Coding Repeat RNA May have Anomalous Motif Usage

Prior work revealed aberrant expression of non-coding RNA across a spectrum of mouse and human cancers (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) and Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011), which are hereby incorporated by reference in their entirety). These sequences were found in the Repbase database of human and murine repetitive elements and the FANTOM database of murine non-coding elements (currently NONCODE) (Jurka et al., “Repbase Update A Database of Eukaryotic Repetitive Elements,” Cytogenetic and Genome Res. 110:462-467 (2005) and Xie et al., “NONCODEv4: Exploring the World of Long Non-Coding RNA Genes,” Nucleic Acids Res. 42:D98-D103 (2014), which are hereby incorporated by reference in their entirety). A high induction of GSAT in a murine testicular teratoma and liposarcoma tumor model was also found (FIGS. 4A-B) (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) and Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011), which are hereby incorporated by reference in their entirety). Focusing on these cancer expressed repeats, a surprisingly significant enrichment of anomalous motif usage patterns was found, as compared to other ncRNAs. In Repbase, it was tested whether the bias on di- and tri-nucleotide motifs observed in repetitive element sequences fell outside the distribution obtained from GENCODE lncRNA. Remarkably, hundreds of sequences falling outside of this distribution were found. Many have high usage of CpG dinucleotides including a set of endogenous viruses (Table 2) recently implicated in the innate immune response in tumors (Zeng et al., “MAVS cGAS and Endogenous Retroviruses in T-independent B Cell Responses,” Science 346:1486-1492 (2014), which is hereby incorporated by reference in its entirety). It was concluded that while the portion of the noncoding regions typically expressed as lncRNAs have similar motif usage patterns as RNA from coding regions, there are many genomic regions with atypical motif usage that are not transcribed in normal cells or tissues.

TABLE 2

Many Repetitive Elements Have High CpG Forces

CpG Force

(Strength of

Level of
Statistical

ncRNA
Class
Conservation
Bias)

MER123
DNA_transposon
Amniota
1.1039

HSATII
SAT
Primates
1.0360

UCON21
Transposable_Element
Amniota
0.9465

MER6B
Mariner/Tc1
Homo_spaiens
0.9230

Eulor1
Transposable_Element
Amniota
0.8481

Eulor5B
Transposable_Element
Tetrapoda
0.8474

Eulor2C
Transposable_Element
Amniota
0.7676

Eulor6A
Transposable_Element
Tetrapoda
0.7466

MER131
SINE
Amniota
0.6223

Eulor4
Transposable_Element
Tetrapoda
0.6067

Eulor10
Transposable_Element
Amniota
0.6064

MER6C
Mariner/Tc1
Eutheria
0.5667

Eulor12
Transposable_Element
Amniota
0.5295

MER5C1
hAT
Eutheria
0.4582

MER47B
Mariner/Tc1
Eutheria
0.4518

UCON39
DNA_transposon
Mammalia
0.4443

UCON16
Transposable_Element
Amniota
0.4436

Tigger3d
Mariner/Tc1
Primates
0.4374

TIGGER5A
Mariner/Tc1
Eutheria
0.4212

MER75
DNA_transposon
Homo_sapiens
0.4134

Tigger4a
Mariner/Tc1
Primates
0.3815

npiggy2_Mm
piggyBac
Microcebus_murinus
0.3725

MER58B
hAT
Eutheria
0.3657

Eulor6C
Transposable_Element
Tetrapoda
0.3571

Eulor11
Transposable_Element
Amniota
0.3561

UCON15
Transposable_Element
Amniota
0.3560

Tigger2b_Pri
Mariner/Tc1
Primates
0.3548

MER44B
Mariner/Tc1
Homo_sapiens
0.3536

SUBTEL_sat
Satellite
Primates
0.3527

Eulor9A
Transposable_Element
Amniota
0.3465

MER44C
Mariner/Tc1
Homo_sapiens
0.3439

Eulor8
Transposable_Element
Amniota
0.3416

MER44D
Mariner/Tc1
Eutheria
0.3211

npiggy1_Mm
piggyback
Microcebus_murinus
0.3131

UCON26
Transposable_Element
Amniota
0.2985

MER127
Mariner/Tc1
Amniota
0.2984

MER97d
hAT
Eutheria
0.2939

Eulor6D
Transposable_Element
Tetrapoda
0.2866

Eulor2B
Transposable_Element
Amniota
0.2852

MER119
hAT
Homo_sapiens
0.2794

MER134
Transposable_Element
Amniota
0.2786

Eulor9C
Transposable_Element
Amniota
0.2751

MER8
Mariner/Tc1
Homo_sapiens
0.2669

Ricksha_a
MuDR
Eutheria
0.2607

MER129
SINE
Amniota
0.2444

MacERV6_LTR3
ERV3
Cercopithecidae
0.2404

MER57B2
ERV1
Homo_sapiens
0.2403

HSMAR1
Mariner/Tc1
Homo_sapiens
0.2397

Eulor12_CM
Transposable_Element
Amniota
0.2269

MERX
Mariner/Tc1
Eutheria
0.2207

Tigger12A
Mariner/Tc1
Mammalia
0.2170

MER58A
hAT
Eutheria
0.2006

Listed above are the repetitive elements from Repbase with a significantly high CpG force. These elements are typically not found to be expressed in normal tissue, yet some may be expressed in cancer cells and cell lines.

The forces which quantify the strength of the statistical bias on the often underrepresented CpG and UpA dinucleotides were used to differentiate between ncRNAs found preferentially in cancerous cells and the total lncRNA referenced in GENCODE for humans and mice, as these two dinucleotides essentially account for all significant trinucleotide motifs in this set. The distribution of forces (i.e. strengths of statistical bias) on CpG and UpA were used to define a null hypothesis, which was approximate by a Gaussian distribution (FIGS. 5A-D). Many ncRNAs from cancerous cells are clearly outside the distribution—often to a large extent. In particular, HSATII, the main ncRNA upregulated in human pancreatic cancers, is far outside the human distribution, and GSAT, the main murine ncRNA implicated in murine tumoral cell lines, is well outside of the mouse distribution. Within the null hypothesis, the p-values for all ncRNAs considered here are less than 10⁻⁶¹for human pancreatic cancer data and less than 10⁻²for murine cell line data.

Many of the ncRNAs from Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) and Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011), which are hereby incorporated by reference in their entirety are outliers of at least three standard deviations with respect to at least one of the significant motifs implicated in the previous section, accounting for 70.46% of the modulated Repbase RNA expression induced in pancreatic cancer along with even higher percentages (74.86% and 85.30%, respectively) in the smaller sets of prostate and lung cancers. HSATII is the most differentially expressed (by a considerable margin) in the pancreatic cancer data and HSATII and BSR are the highest in prostate and lung. In p53 knockout murine cell lines treated with demethylation agents, around 68 ncRNAs are significantly modulated (Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of Repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013), which is hereby incorporated by reference in its entirety). Among those, 78.96% of the total expression comes from outliers as defined above, with the vast majority coming from GSAT and B2. Overall, it was observed that repetitive sequences containing unusual motif usage had varying degrees of conservation. However, the subset preferentially expressed in cancerous cells and tissues are encoded by sequences of more recent evolutionary origin. HSATII and GSAT are only conserved back to primates and mouse, respectively, and 21 of the 22 ncRNAs from Ting et al., “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers,” Science 331:593-596 (2011), hereby incorporated by reference in its entirety, are conserved in humans and primates but no further back in evolution. Any function is likely to be species specific.

Example 3—ncRNAs with Unusual Motif Usage Highly Expressed in Cancers are Immunostimulatory

This analysis highlights that many ncRNAs upregulated in cancer display abnormal nucleotide motif usage that had previously been related to immunogenic properties in viruses. The innate immune system contains several effector cells that react to immunogenic nucleic acids such as exogenous viral and bacterial nucleic acids as well as endogenous nucleic acids which can be released upon cell death (Atianand et al., “Molecular basis of DNA Recognition in the Immune System,” J. Immunol. 190:1911-1918 (2013), which is hereby incorporated by reference in its entirety). Among those effectors, the mononuclear phagocytic system (macrophages, monocytes, and dendritic cells (“DC” s)) contains key regulators of innate immune activation and adaptive immunity (Guilliams et al., “Dendritic Cells Monocytes and Macrophages: A Unified Nomenclature Based on Ontogeny,” Nature Rev. Immunol. 14:571-578; Kroemer et al., “Immunogenic Cell Death in Cancer Therapy,” Ann. Rev. Immunol. 31:51-72 (2013); Sabado et al., “Dendritic Cell Immunotherapy,” Ann. New York Acad. Sci. 1284:31-45 (2013), which are hereby incorporated by reference in their entirety). DCs efficiently sense and sample their environment to integrate information and mount a proper response which may be tolerogenic or immunogenic. To test whether ncRNA with highly unusual motif usage could be recognized as a danger-associated molecular pattern (“DAMP”) by some nucleic acid sensing pattern recognition receptors (“PRRs”), the effect of human HSATII and murine GSAT following transfection in human monocyte derived DCs (“moDCs”) and murine bone marrow derived macrophages was studied. Liposomal transfection was required for stimulation, whereas naked RNA had no effect; implying recognition is consistent with activation via an endosomal or intracellular sensor (FIGS. 6A-C). The general sets of recognition pathways tested are indicated in FIG. 7.

Different ncRNA were generated by in vitro transcription using minigenes coding for the two main candidate outliers computationally predicted to have immunogenic motif usage (HSATII and GSAT). RNA from minigenes was derived as controls, encoding scrambled versions with the same nucleotide content but normal motif usage (labeled “HSATII-sc” and “GSAT-sc”) and repetitive elements of comparable length, but which have normal motif usage patterns (RMER33 and UCON18), as described below. In human moDCs liposomal transfection of HSATII induced significant production of interleukin 6 and 12 (IL-6 and IL-12), and TNFalpha relative to both endogenous controls and their scrambled versions (FIGS. 8A-B). A similar profile of cytokines was elicited by moDCs in response to selected Toll-like receptor (TLR) agonists (FIG. 9A). The candidate murine immunogenic ncRNA GSAT had less pronounced immunogenic properties but still induced IL-12 (FIG. 8A). Upon liposomal transfection of the same ncRNA into immortalized murine bone marrow derived macrophages (“imBMs”), the immunogenic properties of HSATII were strongly attenuated, whereas the murine GSAT induced high levels of TNFalpha (FIG. 8B) and MCP-1 but not interferon gamma, IL-6, or IL-12. imBM almost exclusively regulates TNFalpha in response to pattern recognition receptor agonists (FIG. 9B).

HSATII and GSAT ncRNA induced IL-12 in human moDCs similarly to the TLR3 ligand poly-IC (a synthetic dsRNA mimic; FIG. 7). The absence of an effect by ncRNA with normal motif usage, i.e., the scramble forms (FIGS. 8A-B), suggest specific sequence patterns within the RNA, such as CpG and UpA motifs, regulate immunostimulatory activity. Such motif usage could also influence secondary conformation that may contribute to immunogenic properties, though it was checked that the scrambled sequences did not lower the RNA minimum folding energy. Based upon these observations, HSATII and GSAT are referred to as immunogenic-ncRNA or “i-ncRNA.” Interestingly, this study corroborates previous findings by Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013) that ncRNA such as GSAT can induce an innate response, although in those studies the type I interferon pathway was also activated. The initial investigations into this pathway were inconclusive (FIG. 9C).

Example 4—Dissection of the Immunostimulatory Properties of i-ncRNA

Pathogen-associated molecular patterns (“PAMPs”) and danger-associated molecular patterns (DAMPs) activate innate immune cells through pattern recognition receptors (PRRs). To better characterize the mechanisms involved in sensing i-ncRNA, the immunomodulatory properties of HSATII and GSAT on a panel of imBMs that lack specific PRRs or effector molecules in their downstream signaling pathways was studied (FIG. 7). Whereas GSAT induced a TNFalpha response, HSATII did not induce differential cytokine expression in these immortalized cells, indicating that either there is a species-specific effect, as the cells are murine, or cell type specific effect, as these cells are macrophages. This is perhaps unsurprising as different species and cell types express different pattern recognition receptors, and HSATII and GSAT have different sequence compositions. Significantly, the absence of two key adaptor and regulatory proteins MYD88 and UNC93B1:UNC93B3d (UNC93b), respectively, eliminated the differential response to GSAT in imBMs (FIGS. 10A-C).

MYD88 is a key cytosolic adaptor protein that is used by all TLRs except TLR3 to activate the transcription factor NFkB. Similarly, the mutated form of UNC93b essentially eliminated inflammatory responses in imBMs. While less well characterized than MYD88, this protein is known to interact with several endosomal Toll-like receptors (TLR3, 7, and 9), and has been implicated in TLR trafficking between the endoplasmic reticulum and endosomes, and their resultant maturation (Casrouge et al, “Herpes Simplex Virus Encephalities in Human UNC-93B Deficiency,” Science 314:308-312 (2006); Lee et al., “UNC93B1 Mediates Differential Trafficking of Endosomal TLRs,” eLife 2:e00291; Tabeta et al., “The Unc93B1 Mutation 3d Disrupts Exogenous Antigen Presentation and Signaling via Toll-like Receptors 3 7 and 9,” Nature Immunol. 7:156-164 (2006), which are hereby incorporated by reference in their entirety). The requirement for TLR3, TLR7, and TLR9, which are known to recognize double-stranded RNA, single-stranded RNA, and CpG DNA respectively, was tested (FIGS. 11A-B, FIGS. 12A-B) (O'Neill et al., “The History of Toll-Like Receptors—Redefining Innate Immunity,” Nature Rev. Imm. 13:453-60 (2013); Broz et al., “Newly Described Pattern Recognition Receptors Team Up Against Intracellular Pathogens,” Nature Rev. Immunol. 13:551-565 (2013); Gajewski et al., “Innate and Adaptive Immune Cells in the Tumor Microenvironment,” Nature Immunol. 14:1014-1022 (2013), which are hereby incorporated by reference in their entirety). None of these receptors were required for GSAT to activate TNFalpha production from imBM. Additional pathways investigated, including the STING and inflammasome pathways, are discussed below and did not contribute to i-ncrNA stimulatory activity. Altogether, the data are consistent with a requirement for i-ncRNA activation through signaling pathways that rely upon MYD88 and UNC93b. The precise receptor involved in initial recognition remains to be determined.

There is a surprising similarity to be drawn between foreign viral nucleotide sequences and select ncRNAs silent in normal cells, yet transcribed in cancer cells, activating innate immunity (Jimenez-Baranda et al., “Olignonucleotide Motifs That Disappear During the Evolution of Influenza Virus in Humans Increase Alpha Interferon Secretion by Plasmacytoid Dendritic Cells,” J. Virol. 85:3893-3904 (2011); Casrouge et al., “Herpes Simplex Virus Encephalitis in Human UNC-93B Deficiency,” Science 314:308-312 (2006); Bogunovic et al., “Immune Profile and Mitotic Index of Metastatic Melanoma Lesions Enhance Clinical Staging in Predicting Patient Survival,” Proc. Natl. Acad. Sci. 106:20429-20434 (2009); Cosset et al., “Comprehensive Metagenomic Analysis of Glioblastoma Reveals Absence of Known Virus Despite Antiviral-Like Type I Interferon Gene Response,” International J. Cancer 135:1381-1389 (2014), which are hereby incorporated by reference in their entirety). It was determined that ncRNAs expressed predominantly in normal cells from humans and mice reflect patterns of nucleotide sequence motif avoidance, such as underrepresentation of CpG containing sequences and reduced UpA, similar to protein coding RNA. This often includes a many-fold underrepresentation of CpG containing sequences and reduced UpA motif usage when compared to expected levels. However, the genome also harbors repetitive elements, which often have abnormal usage of CpG and UpA motifs than that observed in RNA expressed in normal cells and tissues. Sets of these ncRNA, typically newer genome entries over evolutionary time scales, can be expressed in very high levels in cancerous cells and tumors. This is why human and mouse elements expressed in cancer cells can have different sequences but can share high CpG content and are not generally observed in the human or mouse transcriptome in normal cells.

It was previously proposed that immunostimulatory and proinflammatory properties of highly inflammatory influenza and other RNA viruses derive in part from RNA containing CpGs in AU-rich contexts, which are avoided in RNA viruses circulating in humans. Experimental evidence has supported this hypothesis (Jimenez-Baranda et al., “Olignonucleotide Motifs That Disappear During the Evolution of Influenza Virus in Humans Increase Alpha Interferon Secretion by Plasmacytoid Dendritic Cells,” J. Virol. 85:3893-3904 (2011); Atkinson et al., “The Influence of CpG and UpA Dinocleotide Frequencies on RNA Virus Replication and Characterization of the Innate Cellular Pathways Underlying Virus Attenuation and Enhanced Replication,” Nucleic Acids Res. 42:4527-4545 (2014) and Vabret et al., “The Biased Nucleotide Composition of HIV-1 Triggers Type I Interferon Response and Correlates with Subtype D Increased Pathogenicity,” PLoS One 7:e33501 (2012), which are hereby incorporated by reference in their entirety). The analysis was recently recast in the language of statistical physics in a way that is theoretically insightful and computationally efficient (Greenbaum et al., “Quantitative Theory of Entropic Forces Acting on Constrained Nucleotide Sequences Applied to Virus,” Proc. Natl. Acad. Sci. 111:5054-5059 (2014), which is hereby incorporated by reference in its entirety). In this language, the evolution and optimization of nucleotide sequence motifs is driven by the interplay between selective and entropic forces. The latter randomize motif frequencies in a genome under constraints while the former are largely Darwinian, optimizing for functions enhancing viral replication and spreading. However, ncRNAs mostly transcribed in cancerous cells would not be exposed to the same selective and entropic forces as coding and ncRNA transcribed in normal cells. Based on motif usage patterns, it is predicted that many ncRNA may have immunogenic properties, presenting danger-associated molecular patterns.

HSATII and murine GSAT were focused on experimentally, as they are preferentially and highly expressed in carcinogenic processes and exhibit abnormal patterns of motif usage. In particular, human HSATII is enriched in CpG motifs in AU-rich contexts avoided in genomes of humans and human adapted viruses. It is demonstrated that their computationally predicted immunogenic properties lead to the induction of inflammatory cytokines in human and murine innate cells (FIGS. 8A-B). These observations, together with previous work by Leonova et al., “P53 Cooperates with DNA Methylation and a Suicidal Interferon Response to Maintain Epigenetic Silencing of repeats and Noncoding RNAs,” Proc. Natl. Acad. Sci. 110:E89-E98 (2013), which is hereby incorporated by reference in its entirety, strongly suggest that these endogenous i-ncRNA are recognized as DAMPs by cellular nucleic acid pattern recognition receptors.

A key role for MYD88 and UNC93b as regulators of GSAT immunogenicity was identified, but without evidence for the common endosomal nucleic acid sensors typically regulated by UNC93b or associated with the MYD88 adaptor (TLRs 2, 4, 7, and 9). These results indicate that in the murine imBM background there is potent induction of TNFalpha. Further studies will be required to elucidate whether TLR13, identified in murine cells and which recognizes ribosomal bacterial and viral RNA, is involved or whether there exist intracellular sensors of i-ncRNA associated with MYD88 (Li et al., Sequence Specific Detection of Bacterial 23S Ribosomal RNA by TLR13,” eLife 1:e00102 (2012); Oldenburg et al., “TLR13 Recognizes Bacterial 23S rRNA Devoid of Erythromycin Resistance-Forming Modification,” Science 337:1111-1115 (2012); Shi et al., “A novel Toll-like Receptor That Recognizes Vesicular Stomatitis Virus,” J. Biol. Chem. 286:4517-4524 (2012), which are hereby incorporated by reference in their entirety), as there are for dsDNA (DHX-9 or -36) (Kim et al., “Aspartate-Glutamate-Alanine-Histidine Box Motif (DEAH)/RNA Helicase A Helicases Sense Microbial DNA in Human Plasmacytoid Dendritic Cells,” Proc. Natl. Acad. Sci. 107:15181-15186 (2010), which is hereby incorporated by reference in its entirety). Interestingly, it is found that alignment of GSAT contains a subsequence conserved in immunogenic RNA isolated from bacterial ribosomal RNA, which specifically activates murine TLR13 (Oldenburg et al., “TLR13 Recognizes Bacterial 23S rRNA Devoid of Erythromycin Resistance-Forming Modification,” Science 337:1111-1115 (2012), which is hereby incorporated by reference in its entirety).

Activation of innate immune signaling can contribute either to carcinogenesis or antitumoral immunity. Toll-like receptor signaling and MYD88 have been associated with tumor development (Wang et al., “Toll-like Receptors and Cancer: MYD88 Mutation and Inflammation,” Frontiers in Immunology 5(367):1-10 (2014), which is hereby incorporated by reference in its entirety). Given that HSATII and GSAT expression has been found to be pervasive in many tumor types and induces responses that differ by species or cell type, the role of i-ncRNA in tumorigenesis is likely dependent on the particular RNA expressed and other properties of the tumor microenvironment. For instance, HSATII activates macrophages and monocytes in this study, suggesting it may be a mechanism for attraction and retention of tumor associated macrophages. These macrophages have consistently been shown to be a poor prognostic in cancer leading to increased tumorigenesis, metastasis, and immunoevasion (Noy et al., “Tumor-Associated Macrophages: From Mechanisms to Therapy,” Immunity 41:49-61 (2014), which is hereby incorporated by reference in its entirety). Under this hypothesis, HSATII is used by the tumor to keep macrophages in the tumor microenvironment while driving out T cells. Interestingly, the viral like behavior of HSATII transcripts is not only found in the immune response to these elements, but also their ability to reverse transcribe in cancer cells akin to retroviruses (Bersani et al., “Pericentromeric Satellite Repeat Expansions Through RNA-Derived DNA Intermediates in Cancer,” Proc. Natl. Acad. Sci. 112(49):15148-15153 (2015), which is hereby incorporated by reference in its entirety).

i-ncRNA, not subject to the same forces as ncRNA transcribed in steady state, may retain or evolve to mimic features of foreign RNA, as seen by comparing HSATII and GSAT to typical human ncRNA and foreign genomic material in FIG. 13 (Greenbaum et al., “Quantiative Theory of Entropic Forces Acting on Constrained Nucleotide Sequences Applied to Viruses,” Proc. Natl. Acad. Sci. 111:5054-5059 (2014) and Kent et al., “The Human Genome Browser at UCSC,” Genome Res. 12:996-1006 (2002), which are hereby incorporated by reference in their entirety). Indeed, HSATII and GSAT cluster more closely in terms of motif usage patterns, with bacterial rather than human RNA. Such RNA may have been selected for to identify and eliminate cells when their epigenetic state is disrupted. Essentially self “junk” RNA may have been maintained or evolved to mimic non-self pathogen associated patterns to create a danger signal. Such a mechanism would be a new aspect of “genetic mimicry” where the host is for all practical purposes mimicking pathogen-associated nucleic acid patterns. HSATII and GSAT emanate from the pericentromeres, which harbor new repetitive elements with no known function (Maumus et al., “Ancestral Repeats Have Shaped Epigenomic and Genome Composition for Millions of Years in Arabidopsis thaliana,” Nature Comm. 5:4014 (2014), which is hereby incorporated by reference in its entirety). This region, unlike centromeres or regions critical for structure or regulation, may dynamically produce unusual repetitive elements that can adapt to a particular organism's pattern recognition receptors. These studies indicate that under the “extraordinary” circumstances when these repetitive elements are expressed, they could play a critical role in the regulation of immune responses against cancer.

Example 5—Entropy of Nucleotide Sequences for a Given Motif

An RNA sequence of length L, hereafter called S₀, and a motif m (a series of contiguous nucleotides, e.g., CpG) is considered. L is the total sequence length, comprising the nucleotides A, C, G, and U, along with nucleotide bases that are not clearly defined. The objective is to define a probabilistic model over the set of the 4^Lsequences, S=(s₁s2 . . . s_i. . . s_L), such that the average value of the number, N_m(S), of occurrences of the motif m in S coincides with the number, N_m(S₀), of occurrences that motif in S₀. To do so, a random-nucleotide model is considered, where nucleotides are independently distributed according to the frequencies f^θ(s), where s=A, C, G, U, found in S₀(or where s=A, C, G, T when S₀is represented as an un-transcribed DNA sequence). The frequency of a nucleotide is calculated by counting the number of times that nucleotide occurs and dividing that number by the total length of the sequence, L (which may also occur for ambiguously defined bases that cannot be assigned as A, C, G, U, or T). For example, f^θ(A), the frequency of A nucleotides, would be the number of occurrences of the base, A, in S₀divided by L, the length of S₀, even when ambiguous bases are included.

The probability of a sequence S in this least-constrained, maximum entropy model is

$\begin{matrix} P (S | x, m) = \frac{1}{Z_{m} (x)} \prod_{i = 1}^{L} f^{0} (s_{i}) \exp ({xN}_{m} (S)) where & [EQUATION 1] \\ Z_{m} (x) = \sum_{sequence s} \prod_{i = 1}^{L} f^{0} (s_{i}) \exp ({xN}_{m} (S)) & [EQUATION 2] \end{matrix}$

ensures the probability is correctly normalized. Parameter x, referred to as a selective force (or just force) on the motif m, introduces a statistical bias over P (Greenbaum et al., “Quantiative Theory of Entropic Forces Acting on Constrained Nucleotide Sequences Applied to Viruses,” Proc. Natl. Acad. Sci. 111:5054-5059 (2014), which is hereby incorporated by reference in its entirety). The force quantifies the strength of statistical bias, which may be due to selection on a motif. In the absence of bias (x=0) the probability of S simplifies to the product its nucleotide frequencies, and the number of motifs is what one would expect in a typical sequence with nucleotide frequencies given by f^θ(s). Positive values for x push the distribution towards sequences with N_m(S) larger than what one would expect while negative x favor sequences with a smaller N_m(S) than expected.

The value of the force, x(S₀), is computed by maximizing the probability

P(S₀|x,m)

of the sequence S₀over x. This is equivalent to finding the value of x such that the average number of motifs

$\begin{matrix} N_{m}^{av} (x) = \sum_{sequence S} P (S | x, m) N_{m} (S) = \frac{\partial \log Z_{m}}{\partial x} (x) & [EQUATION 3] \end{matrix}$

equals N_m(S₀). By scanning the sequences S₀in the GENCODE database, the forces x(S₀) shown in FIGS. 5A-D are obtained.

The logarithm of the number of sequences having N_m(S) repetitions of m is bounded from above by the entropy of the random-nucleotide model; the equality is reached in the absence of bias only (x=0). The difference between those entropies is the entropy cost corresponding to the constraint on the average number of occurrences of m, and is denoted by σ_m. It is the Legendre transform of log Z_m(x), see EQUATION 2 and EQUATION 3 (supra).

σ_m=x(S₀)N_m(S₀)−log Z_m(x(S₀)) [EQUATION 4]

Efficient computational techniques allow calculation of the sum over the 4^Lsequences in EQUATION 2 in a time growing only linearly with L.

The aim is to find anomalous motif usage in a sequence where the number of motif occurrences is different from what is expected by chance in the random-nucleotide model, that is, associated to a significant nonzero force. The likelihood of observing the natural sequence S₀with a given motif count is expressed as

$\begin{matrix} P (S^{0} | m) = \max_{x} ⌈ P (S^{0} | x, m) ⌉ = e^{σ_{m}} \prod_{i} f^{0} (s_{i}^{0}) . & [EQUATION 5] \end{matrix}$

This likelihood is therefore directly related to the entropic cost: The larger the cost, the more likely is the motif to be statistically significant.

Example 6—Outlier Detection

GSAT and HSATII were demonstrated to be immunogenic, and were outliers relative to the distribution of strengths of statistical bias on CpG and UpA dinucleotides. Since GSAT was less of an outlier than HSATII, GSAT is used to define a minimal threshold of the strength of statistical bias for an immunogenic non-coding RNA. In the mouse GENCODE dataset, version 2 (which is hereby incorporated by reference in its entirety), of long non-coding RNA transcripts, the mean value of the strength of statistical bias on CpG dinucleotides is −1.3678 with a standard deviation of 0.5788, and the mean value of the strength of statistical bias on UpA dinucleotides is −0.5691 with a standard deviation of 0.2455. In the human GENCODE dataset, version 19 (which is hereby incorporated by reference in its entirety), of long-noncoding RNA transcripts, the mean value of the strength of statistical bias on CpG dinucleotides is −1.4341 with a standard deviation of 0.6505, and the mean value of the strength of statistical bias on UpA dinucleotides is −0.6152 with a standard deviation of 0.2834. The strength of statistical bias on GSAT is 0 for CpG dinucleotides and −0.8566 for UpA dinucleotides. This is 2.3629 standard deviations away from the mean of the mouse GENCODE distribution of strengths of statistical bias on CpG dinucleotides and 0.8831 standard deviations away from the mean for UpA dinucleotides. The strength of statistical bias on UpA dinucleotides was therefore not deemed necessary to define GSAT as an outlier as the strength of statistical bias of UpA dinucleotides is not significant for GSAT.

The CpG strength of statistical bias on GSAT is 2.3629 standard deviations from the mean of the distribution of strengths of statistical bias on CpG for the mouse GENCODE dataset and 2.2046 standard deviations away from the mean for the human GENCODE dataset. Therefore, an outlier in the human dataset was defined as a sequence whose strength of statistical bias on CpG dinucleotides has a Z-score (the strength of statistical bias on CpG minus the mean strength of statistical bias divided by the standard deviation) as greater than 2.2046 and for the mouse distribution as having a Z-score greater than 2.3629. This insures that the sequence is both an outlier and that CpG is over-represented relative to the GENCODE distribution.

Mouse repetitive elements meeting this threshold from mouse repeat sequences from the Repbase database are found in Table 3, and their corresponding nucleotide sequences are displayed in FIGS. 14A-S. For calculated values contained herein and throughout the present application, four significant digits are presented.

TABLE 3

Outlier Sequences from the Mouse Repeat Dataset

Showing Anomalous CpG Motif Usage

Strength of

Statistical

Repeat Name
Repeat Class
Conservation
Bias on CpG

(CCCGAA)n
Simple Repeat
Eukaryota
1.0173

(CG)n
Simple Repeat
Eukaryota
7.4253

(CGAA)n
Simple Repeat
Eukaryota
2.2781

(CGGA)n
Simple Repeat
Eukaryota
1.3857

(GCC)n
Simple Repeat
Eukaryota
1.3414

(GCCC)n
Simple Repeat
Eukaryota
0.6942

(GCCCC)n
Simple Repeat
Eukaryota
0.3504

(GCCCCC)n
Simple Repeat
Eukaryota
0.2198

(GCGCA)n
Simple Repeat
Eukaryota
0.4899

Charlie25
hAT
Mammalia
0.0738

Charlie26a
hAT
Mammalia
0.0000

Charlie27
hAT
Eutheria
0.0860

Eulor1
Transposable
Amniota
0.8481

Element

Eulor10
Transposable
Amniota
0.6064

Element

Eulor11
Transposable
Amniota
0.3561

Element

Eulor12
Transposable
Amniota
0.5295

Element

Eulor12_CM
Transposable
Amniota
0.2269

Element

Eulor2B
Transposable
Amniota
0.2852

Element

Eulor2C
Transposable
Amniota
0.7676

Element

Eulor4
Transposable
Tetrapoda
0.6067

Element

Eulor5A
Transposable
Tetrapoda
0.0000

Element

Eulor5B
Transposable
Tetrapoda
0.8474

Element

Eulor6A
Transposable
Tetrapoda
0.7466

Element

Eulor6C
Transposable
Tetrapoda
0.3571

Element

Eulor6D
Transposable
Tetrapoda
0.2866

Element

Eulor6E
Transposable
Tetrapoda
0.1268

Element

Eulor8
Transposable
Amniota
0.3416

Element

Eulor9A
Transposable
Amniota
0.3465

Element

Eulor9B
Transposable
Amniota
0.0000

Element

Eulor9C
Transposable
Amniota
0.2751

Element

GSAT_MM
SAT
Mus musculus
0.0000

IAPEY2_LTR
ERV2
Mus musculus
0.0783

IAPEY_LTR
ERV2
Mus
0.1998

Kanga11a
Mariner/Tc1
Mammalia
0.1891

LSU-rRNA_Cel
rRNA
Metazoa
0.0186

LSU-rRNA_Hsa
rRNA
Metazoa
0.0330

MamRep1894
hAT
Mammalia
0.4662

MER104
DNA transposon
Eutheria
0.1428

MER104C
DNA transposon
Eutheria
0.0370

MER121
hAT
Mammalia
0.0000

MER123
DNA transposon
Amniota
1.1039

MER125
DNA transposon
Amniota
0.0000

MER127
Mariner/Tc1
Amniota
0.2984

MER129
SINE
Amniota
0.2444

MER130
Transposable
Amniota
0.0000

Element

MER131
SINE
Amniota
0.6223

MER133A
Transposable
Amniota
0.4020

Element

MER133B
Transposable
Amniota
0.0000

Element

MER134
Transposable
Amniota
0.2786

Element

MER2
Mariner/Tc1
Eutheria
0.1577

MER44D
Mariner/Tc1
Eutheria
0.3211

MER47B
Mariner/Tc1
Eutheria
0.4518

MER47C
Mariner/Tc1
Eutheria
0.7929

MER58A
hAT
Eutheria
0.2006

MER58B
hAT
Eutheria
0.3657

MER58D
hAT
Eutheria
0.0802

MER5C1
hAT
Eutheria
0.4582

MER6
Mariner/Tc1
Eutheria
0.1783

MER6C
Mariner/Tc1
Eutheria
0.5667

MER97d
hAT
Eutheria
0.2939

MERX
Mariner/Tc1
Eutheria
0.2207

RICKSHA_0
MuDR
Eutheria
0.0000

Ricksha_a
MuDR
Eutheria
0.2607

RMER30
hAT
Muridae
0.1104

SSU-rRNA_Cel
rRNA
Metazoa
0.0830

SSU-rRNA_Hsa
rRNA
Metazoa
0.0464

Tigger12A
Mariner/Tc1
Mammalia
0.2170

Tigger2b
Mariner/Tc1
Rodentia
0.4588

TIGGER5A
Mariner/Tc1
Eutheria
0.4212

TIGGER5_B
Mariner/Tc1
Eutheria
0.1648

Tigger9b
Mariner/Tc1
Eutheria
0.1869

tRNA-Arg-CGA
tRNA
Vertebrata
0.0000

tRNA-Arg-CGG
tRNA
Vertebrata
0.2001

tRNA-Asp-
tRNA
Vertebrata
0.1489

GAY

tRNA-His-
tRNA
Vertebrata
0.2007

CAY_—

tRNA-Ile-ATA
tRNA
Vertebrata
0.1118

tRNA-Ile-ATT
tRNA
Vertebrata
0.1970

tRNA-Leu-CTA
tRNA
Vertebrata
0.0000

tRNA-Leu-CTG
tRNA
Vertebrata
0.0000

tRNA-Met_—
tRNA
Vertebrata
0.0000

tRNA-Pro-CCG
tRNA
Vertebrata
0.0000

tRNA-Ser-AGY
tRNA
Vertebrata
0.0000

tRNA-Ser-TCA
tRNA
Vertebrata
0.0000

tRNA-Ser-
tRNA
Vertebrata
0.2097

TCA_—

tRNA-Ser-TCY
tRNA
Vertebrata
0.1452

tRNA-Tyr-TAC
tRNA
Vertebrata
0.0000

UCON1
Transposable
Amniota
0.0841

Element

UCON15
Transposable
Amniota
0.3560

Element

UCON16
Transposable
Amniota
0.4436

Element

UCON21
Transposable
Amniota
0.9465

Element

UCON26
Transposable
Amniota
0.2985

Element

UCON27
Transposable
Amniota
0.0400

Element

UCON39
DNA transposon
Mammalia
0.4443

UCON63
Repetitive element
Mammalia
0.0000

UCON9
Transposable
Amniota
0.0979

Element

Zaphod3
hAT
Eutheria
0.0077

lncRNAs meeting this threshold from the Mouse ENCODE dataset are found in Table 4 and their corresponding nucleotide sequences are displayed in FIGS. 15A-F.

Human Repetitive elements meeting this threshold from the human repeat sequences from the Repbase database are found in Table 5 and their corresponding nucleotide sequences are displayed in FIGS. 16A-Y.

TABLE 5

Outlier Sequences from the Human Repeat Dataset Showing Anomalous

CpG Motif Usage

Repeat Name
Repeat Class
Conservation
Force on CpG

(CCCGAA)n
Simple Repeat
Eukaryota
1.0173

(CG)n
Simple Repeat
Eukaryota
7.4253

(CGAA)n
Simple Repeat
Eukaryota
2.2781

(CGGA)n
Simple Repeat
Eukaryota
1.3857

(GCC)n
Simple Repeat
Eukaryota
1.3414

(GCCC)n
Simple Repeat
Eukaryota
0.6942

(GCCCC)n
Simple Repeat
Eukaryota
0.3504

(GCCCCC)n
Simple Repeat
Eukaryota
0.2198

(GCGCA)n
Simple Repeat
Eukaryota
0.4899

Charlie25
hAT
Mammalia
0.0738

Charlie26a
hAT
Mammalia
0.0000

Charlie27
hAT
Eutheria
0.0860

Eulor1
Transposable Element
Amniota
0.8481

Eulor10
Transposable Element
Amniota
0.6064

Eulor11
Transposable Element
Amniota
0.3561

Eulor12
Transposable Element
Amniota
0.5295

Eulor12_CM
Transposable Element
Amniota
0.2269

Eulor2B
Transposable Element
Amniota
0.2852

Eulor2C
Transposable Element
Amniota
0.7676

Eulor4
Transposable Element
Tetrapoda
0.6067

Eulor5A
Transposable Element
Tetrapoda
0.0000

Eulor5B
Transposable Element
Tetrapoda
0.8474

Eulor6A
Transposable Element
Tetrapoda
0.7466

Eulor6C
Transposable Element
Tetrapoda
0.3571

Eulor6D
Transposable Element
Tetrapoda
0.2866

Eulor6E
Transposable Element
Tetrapoda
0.1268

Eulor8
Transposable Element
Amniota
0.3416

Eulor9A
Transposable Element
Amniota
0.3465

Eulor9B
Transposable Element
Amniota
0.0000

Eulor9C
Transposable Element
Amniota
0.2751

GGAAT
SAT

Homo sapiens

0.0000

GOLEM_A
Mariner/Tc1

Homo sapiens

0.1066

HSAT6
SAT

Homo sapiens

0.6156

HSATII
SAT
Primates
1.0360

HSMAR1
Mariner/Tc1

Homo sapiens

0.2397

Kanga11a
Mariner/Tc1
Mammalia
0.1891

LSU-rRNA_Cel
rRNA
Metazoa
0.0186

LSU-rRNA_Hsa
rRNA
Metazoa
0.0330

MacERV4_LTR1b
ERV2
Cercopithecidae
0.0000

MacERV4_LTR2
ERV2
Cercopithecidae
0.0455

MacERV5b_LTR
ERV1
Cercopithecidae
0.0000

MacERV6_LTR2a
ERV3
Cercopithecidae
0.0000

MacERV6_LTR2c
ERV3
Cercopithecidae
0.0307

MacERV6_LTR3
ERV3
Cercopithecidae
0.2404

MacERV6_LTR4
ERV3
Cercopithecidae
0.0373

MacERV6_LTR5
ERV3
Cercopithecidae
0.0305

MacERVK1_LTR1b
ERV2
Cercopithecidae
0.0000

MacERVK1_LTR1e
ERV2
Cercopithecidae
0.0000

MamRep1894
hAT
Mammalia
0.4662

MER104
DNA transposon
Eutheria
0.1428

MER104C
DNA transposon
Eutheria
0.0370

MER119
hAT

Homo sapiens

0.2794

MER121
hAT
Mammalia
0.0000

MER123
DNA transposon
Amniota
1.1039

MER125
DNA transposon
Amniota
0.0000

MER127
Mariner/Tc1
Amniota
0.2984

MER129
SINE
Amniota
0.2444

MER130
Transposable Element
Amniota
0.0000

MER131
SINE
Amniota
0.6223

MER133A
Transposable Element
Amniota
0.4020

MER133B
Transposable Element
Amniota
0.0000

MER134
Transposable Element
Amniota
0.2786

MER2
Mariner/Tc1
Eutheria
0.1577

MER44A
Mariner/Tc1

Homo sapiens

0.1388

MER44B
Mariner/Tc1

Homo sapiens

0.3536

MER44C
Mariner/Tc1

Homo sapiens

0.3439

MER44D
Mariner/Tc1
Eutheria
0.3211

MER45B
DNA transposon

Homo sapiens

0.1120

MER47B
Mariner/Tc1
Eutheria
0.4518

MER47C
Mariner/Tc1
Eutheria
0.7929

MER57A1
ERV1

Homo sapiens

0.0000

MER57B2
ERV1

Homo sapiens

0.2403

MER58A
hAT
Eutheria
0.2006

MER58B
hAT
Eutheria
0.3657

MER58D
hAT
Eutheria
0.0802

MER5C1
hAT
Eutheria
0.4582

MER6
Mariner/Tc1
Eutheria
0.1783

MER63D
hAT

Homo sapiens

0.0665

MER6A
Mariner/Tc1
Primates
0.0913

MER6B
Mariner/Tc1

Homo sapiens

0.9230

MER6C
Mariner/Tc1
Eutheria
0.5667

MER75
DNA transposon

Homo sapiens

0.4134

MER75A
piggyBac
Primates
0.0000

MER8
Mariner/Tc1

Homo sapiens

0.2669

MER97A
hAT

Homo sapiens

0.0315

MER97d
hAT
Eutheria
0.2939

MERX
Mariner/Tc1
Eutheria
0.2207

npiggy1_Mm
piggyBac

Microcebus murinus

0.3131

npiggy2_Mm
piggyBac

Microcebus murinus

0.3725

RICKSHA_0
MuDR
Eutheria
0.0000

Ricksha_a
MuDR
Eutheria
0.2607

SSU-rRNA_Cel
rRNA
Metazoa
0.0830

SSU-rRNA_Hsa
rRNA
Metazoa
0.0464

SUBTEL2_sat
SAT
Primates
0.2960

SUBTEL_sat
Satellite
Primates
0.3527

Tigger12A
Mariner/Tc1
Mammalia
0.2170

Tigger2b_Pri
Mariner/Tc1
Primates
0.3548

Tigger3c
Mariner/Tc1
Primates
0.1192

Tigger3d
Mariner/Tc1
Primates
0.4374

Tigger4a
Mariner/Tc1
Primates
0.3815

TIGGER5A
Mariner/Tc1
Eutheria
0.4212

TIGGER5_B
Mariner/Tc1
Eutheria
0.1648

Tigger9b
Mariner/Tc1
Eutheria
0.1869

tRNA-Arg-CGA
tRNA
Vertebrata
0.0000

tRNA-Arg-CGG
tRNA
Vertebrata
0.2001

tRNA-Asp-GAY
tRNA
Vertebrata
0.1489

tRNA-His-CAY_—
tRNA
Vertebrata
0.2007

tRNA-Ile-ATA
tRNA
Vertebrata
0.1118

tRNA-Ile-ATT
tRNA
Vertebrata
0.1970

tRNA-Leu-CTA
tRNA
Vertebrata
0.0000

tRNA-Leu-CTG
tRNA
Vertebrata
0.0000

tRNA-Met_—
tRNA
Vertebrata
0.0000

tRNA-Pro-CCG
tRNA
Vertebrata
0.0000

tRNA-Ser-AGY
tRNA
Vertebrata
0.0000

tRNA-Ser-TCA
tRNA
Vertebrata
0.0000

tRNA-Ser-TCA_—
tRNA
Vertebrata
0.2097

tRNA-Ser-TCY
tRNA
Vertebrata
0.1452

tRNA-Tyr-TAC
tRNA
Vertebrata
0.0000

TRNA_ALA
tRNA

Homo sapiens

0.0000

TRNA_ASN
tRNA

Homo sapiens

0.1580

TRNA_GLU
tRNA

Homo sapiens

0.0000

TRNA_VAL
tRNA

Homo sapiens

0.5721

U4B
snRNA

Homo sapiens

0.2960

U6
snRNA

Homo sapiens

0.3083

UCON1
Transposable Element
Amniota
0.0841

UCON15
Transposable Element
Amniota
0.3560

UCON16
Transposable Element
Amniota
0.4436

UCON21
Transposable Element
Amniota
0.9465

UCON26
Transposable Element
Amniota
0.2985

UCON27
Transposable Element
Amniota
0.0400

UCON39
DNA transposon
Mammalia
0.4443

UCON63
Repetitive element
Mammalia
0.0000

UCON9
Transposable Element
Amniota
0.0979

Zaphod3
hAT
Eutheria
0.0077

ZOMBI_A
Mariner/Tc1

Homo sapiens

0.1808

Human ENCODE elements meeting this threshold from the Human ENCODE dataset are found in Table 6 and their corresponding nucleotide sequences are displayed in FIG. 17A-L.

Example 7—Design of Experimental Controls

For HSATII and GSAT, negative controls were designed in two ways and both negative controls were compared to HSATII and GSAT for all experiments. First, full RNA sequences of both satellites were randomly permuted until scrambled sequences were generated that fell within one half of a standard deviation from the mean value of the strength of statistical bias against CpG and UpA dinucleotides for humans and mice, respectively. These sequences are denoted as HSATII-sc and GSAT-sc. In other words, these sequences had the same length and nucleotide content as HSATII and GSAT but fell within the inner ellipse in FIG. 5A (HSATII-sc) and FIG. 5B (GSAT-sc). In addition, it was checked that in both cases the minimum RNA folding energy was not lowered during the scrambling process so that the permutations did not seem to produce more RNA secondary structure thereby creating the possibility of innate immune stimulation via TLR3. The free energy was calculated using the MATLAB RNAfold routine (Matthews et al., “Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure,” J. Mol. Biol. 288:911-940 (1999) and Wuchty et al., “Complete Suboptimal Folding of RNA and the Stability of Secondary Structures,” Biopolymers 49:145-165 (1999), which are hereby incorporated by reference in their entirety). Endogenous negative controls were created by searching Repbase for the repetitive elements that fell within one standard deviation of the mean strength of statistical bias against CpG and UpA in humans and mice but were also closest in length to HSATII and GSAT. These were UCON38 for HSATII and RMER16A3 for GSAT.

Example 8—GSAT RNA Expression Level Detection

GSAT RNA expression levels were investigated by a custom Taqman Assay in normal mouse tissue versus mouse tumor tissue samples (FIGS. 4A-B). The tumor mouse models that were investigated were a model of testicular teratoma (p53−/−129/SvSL) and a model of liposarcoma (p53LoxP/LoxP; PtenLoxP/LoxP). In all instances, GSAT levels were increased in the tumor samples as compared to normal samples but to varying degrees. There was no significant difference in GSAT levels between tumors arising in females versus those arising in males in the liposarcoma model. Also, there was no difference in GSAT levels in p53−/−129/SvSL that developed teratomas at a young age (˜1 month old) versus at an older age (˜3-4 months old) (Harvey et al., “Genetic Background Alters the Spectrum of Tumors that Develop in p53-Deficient Mice,” The FASEB Journal 7:938-943 (1993) and Muller et al., “A Male Germ Cell Tumor Susceptibility Determining Locus pgct1 Identified on Murine Chromosome 13,” Proc. Natl. Acad. Sci. 97:8421-8426 (2000), which are hereby incorporated by reference in their entirety).

Example 9—i-ncRNA Generation

Sequences encoding for murine GSAT and human HSATII were generated by custom gene synthesis (Genscript) and cloned into a pCDNA3 backbone (EcoRI/EcoRV) that carries a T7 promoter on the + strand and a SP6 promoter on the—strand (Invitrogen). Sequences encoding for GSAT-sc, HSATII-sc, UCON38, and RMER16A3 were generated as minigenes and sub-cloned in a pIDT-blue backbone with a T7 promoter on the + strand and a T3 promoter on the—strand surrounding the sequence of interest (IDT). To produce high quality RNA, plasmids were digested by the restriction enzymes NotI/NdeI (pCDNA3) and ApaLI (pIDT blue) to isolate the fragment containing the sequence of interest by gel purification (Qiagen). Then the sequences of interest containing the T7 promoter were amplified by PCR (Accuprime-PFX Invitrogen) using the following primer pairs:

pIDT blue

Forward:

(SEQ ID NO: 320)

GCGCGTAATACGACTCACTATAGGCGA;

Reverse:

(SEQ ID NO: 321)

CGCAARRAACCCTCACTAAAGGGAACA

and

pCDNA.3

Forward:

(SEQ ID NO: 322)

GAAATTAATACGACTCAATAGG;

Reverse:

(SEQ ID NO: 323)

TCTAGCATTTAGGTGACACTATAGAATAG.

PCR products were purified by PCR-Cleanup (Qiagen) and controlled by electrophoresis (0.8% Agarose gel). RNAs were generated by in vitro transcription using the mMESSAGE mMACHINE T7 ultra kit (Ambion) followed by a capping and short polyA reaction. RNAs were then purified using RNA-cleanup (Qiagen), quantified using a nanodrop, and checked by electrophoresis after denaturation at 65° C. for 10 minutes (15% Agarose gel).

Example 10—Cell Stimulation

MoDCs and imBM were both stimulated by i-ncRNA in the same way. The culturing of these cells is described below. Briefly, cells were plated in 96 flat well plates at 200,000 cells per well for primary cells (MoDCs) and 100,000 cells per well for lines (IMBM). i-ncRNA were transfected via liposomes formed using DOTAP (Roche Life Science) at a ratio of 1 μg DNA per 6 μl DOTAP diluted in HBS following the user-guide recommendations. The cells were stimulated using 2 μg/ml of purified i-ncRNA versus 10 μg/ml total RNA. To stimulate the TLR4 pathway, 100 ng/ml Ultrapure LPS (Invivogen) was used for TLR2: 500 ng/ml Pam2CSK4 (Invivogen) for TLR3: 2 μg/ml HMW PolyIC (Invivogen) TLR7/8: 1 μg/ml CLO97 (Invivogen) and 100 ng/ml R848 (Invivogen) TLR9: CpG B-ODN 1826 3 μM or STING CDN 5 μg/ml (Aduro).

Example 11—Cell Culture

Human moDCs: Human monocyte derived DCs were differentiated as previously described (Frleta et al., “HIV-1 Infection-Induced Apoptotic Microparticles Inhibit Human DCs via CD44,” J. Clinical Invest. 122:4685 (2012), which is hereby incorporated by reference in its entirety). Briefly, PBMCs were prepared by centrifugation over Ficoll-Hypaque gradients (BioWhittaker) from healthy donor buffy coats (New York Blood Center). Monocytes were isolated from PBMCs by adherence and then treated with 100 U/ml GM-CSF (Leukine Sanofi Oncology) and 300 U/ml IL-4 (RandD) in RPMI plus 5% human AB serum (Gemini Bio Products). Differentiation media was renewed on day 2 and day 4 of culture. Mature moDCs were harvested for use on days 5 to 7. For all experiments, harvested DCs were washed and equilibrated in serum-free X-Vivo 15 media (Lonza).

Murine imBMs: Immortalized macrophages were immortalized by infecting bone marrow progenitors with oncogenic v-myc/vraf expressing J2 retrovirus as previously described (Blasi et al., “Selective Immortalization of Murine Macrophages from Fresh Bone Marrow by a raf/myc Recombinant Murine Retrovirus,” Nature 318:667-670 (1985), which is hereby incorporated by reference in its entirety) and differentiated in macrophage differentiated media containing MCSF. ImBM were maintained in 10% FCS PSN DMEM (Gibco). ImBM lines were provided by several collaborators and also obtained from the BEI resource: ICE (Casp1/Casp11), MAVs, IFN-R, IRF3-7, STING and their rescues, Unc93b1 3d/3d, TLR 3, 4, 7, 9, 2-9, 2-4, MYD88, TRIF, TRAM, and TRIF-TRAM.

Example 12—Investigation of Type I Interferon Pathway

To characterize whether this pathway could be modulated in the models, production of type I interferon in response to stimulation by the i-ncRNA using human and murine interferon stimulated response element (ISRE) reporter cell lines was evaluated and transcriptome regulation of a panel of immune genes related to the interferon pathway was monitored. Whereas the effect on the inflammatory response is significant in terms of TNFalpha, IL-6, or IL-12 production, the effect on the type I interferon pathway was less prominent.

Example 13—Additional Pathways Investigated

TLR2 or TLR4 were not required, indicating the observed effect was independent of contamination from bacterial products such as lipoproteins and endotoxins (FIGS. 12A-B). TRIF, TRIF/TRAM, and IRF3/IRF7, which participate downstream in the signaling of TLR3, TLR4, and TLR7, were also not obligatory (FIG. 13). A role for candidate molecules for sensing murine GSAT, such sensors related to cGAS-STING signaling or DEAD box RNA helicases such as RIG-I and MDAS (Atianand et al., “Molecular Basis of DNA Recognition in the Immune System,” J. Immunol. 190:1911-1918 (2013); Lee et al., “UNC93B1 Mediates Differential Trafficking of Endosomal TLRs,” eLife 2:e00291 (2013); Burdette et al., “STING and the Innate Immune Response to Nucleic Acids in the Cytosol,” Nature Immunol. 14:19-26 (2013); Vanaja et al., “Mechanisms of Inflammasome Activation: Recent Advance and Novel Insights,’ Trends Cell Biol. 25(5):308-15 (2015), which are hereby incorporated by reference in their entirety) was not identified. Inflammatory responses to GSAT did not depend upon the stimulator of interferon genes (STING), which induces type I interferon production when cells are infected with intracellular pathogens. RIG-I (retinoic acid-inducible gene 1) is a dsRNA helicase enzyme that senses RNA viruses through activation of the mitochondrial antiviral-signaling protein (MAVS) (Zeng et al., “MAVS cGAS and Endogenous Retroviruses in T-independent B cell Responses,” Science 346:1486-1492 (2014); Broz et al., “Newly Described Pattern Recognition Receptors Team up Against Intracellular Pathogens,” Nature Rev. Immunol. 13:551-565 (2103); Gajewski et al., “Innate and Adaptive Immune Cells in the Tumor Microenvironment,” Nature Immunol. 14:1014-1022 (2013), which are hereby incorporated by reference in their entirety). MAVS deficient imBMs failed to respond to GSAT stimulation ruling out a contribution of RIG-I in the i-ncRNA signaling (FIG. 11B). Finally, a role for inflammasome related pathways was ruled out using ICE-KO imBM that are essentially a knockout for Caspase 1 and which carry an inactive mutation for Caspase 11.

Although preferred embodiments have been depicted and described in detail herein, it will be apparent to those skilled in the relevant art that various modifications, additions, substitutions, and the like can be made without departing from the spirit of the invention and these are therefore considered to be within the scope of the invention as defined in the claims which follow.

RNA CONTAINING COMPOSITIONS AND METHODS OF THEIR USE

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Parent Case Info

PCT Information

Provisional Applications (1)