The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Oct. 25, 2019, is named SequenceListing.txt and is 67.2 MB in size.
This invention relates to methods of modulating RNA interactions with Polycomb Repressive Complex 1 (PRC1) using inhibitory nucleic acids that bind RNAs and inhibit the PRC1-RNA interaction, to modulate gene expression.
Polycomb group proteins play critically important roles in stem cell biology and mammalian development (Simon and Kingston, 2013). Polycomb proteins exist in at least two multi-subunit complexes, including Polycomb repressive complex 1 (PRC1) and Polycomb repressive complex 2 (PRC2). While PRC2 trimethylates histone H3 at lysine 27 (H3K27me3), PRC1 ubiquitylates histone H2A on lysine 119 (H2AK119Ub) through its RING-finger catalytic subunit, RING1a/1b, and compacts chromatin. Unlike PRC2, PRC1 has a heterogeneous composition in mammals. The “canonical” form of PRC1 is defined by inclusion of the chromobox homolog protein, CBX, which binds the H3K27me3 mark and is thereby partially dependent on PRC2 for chromatin binding. Canonical PRC1 has been associated with chromatin compaction through the CBX subunit (Grau et al., 2011). By contrast, the “noncanonical” form contains RING1 and YY1 Binding Protein (RYBP) and is associated predominantly with ubiquitylation of H2AK119 through the RING1 subunit. Noncanonical PRC1 binds chromatin independently of PRC2 and possibly helps direct PRC2 to chromatin through its H2AK119Ub mark (Aranda et al., 2015). In addition, PRC1 contains several other subunits, including Polycomb group RING finger protein 4 (PCGF4) (BMI1)/PCGF2 (MEL18); it also includes the polyhomeotic homolog (PHC) in the canonical (CBX) form, and PCGF1,2,4,5, and 6 in the non-canonical (RYBP) form. Together, PRC1 and PRC2 bind and regulate expression from thousands of genes in mammals (Blackledge et al., 2015).
The studies described herein demonstrated that PRC1 binds both noncoding RNA and coding RNA at identifieable sequence motifs, and that these motifs can be targeted to alter gene expression. The PRC1-interacting transcriptome includes antisense, intergenic, and promoter-associated transcripts, as well as many unannotated RNAs. A large number of transcripts occur within imprinted regions, oncogene and tumor suppressor loci, and stem-cell-related bivalent domains. Further evidence is provided that inhibitory oligonucleotides that specifically bind to these PRC1-interacting RNAs can successfully modulate gene expression in a variety of separate and independent examples, presumably by inhibiting PRC1-associated effects. PRC1 binding sites can be classified into several groups, including (i) 3′ untranslated region [3′ UTR], (ii) promoter-associated, (iii) gene body, (iv) antisense, and (v) intergenic. Inhibiting the PRC1-RNA interactions can lead to either activation or repression, depending on context.
In another aspect the invention features an inhibitory nucleic acid that specifically binds to, or is complementary to a region of an RNA comprising a motif as described herein that is known to bind to Polycomb repressive complex 1 (PRC1), wherein the sequence of the region is selected from the group consisting of SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse), which are identical to those as set forth in Tables 1-3 of WO 2016/149455. Without being bound by a theory of invention, these inhibitory nucleic acids are able to interfere with the binding of and function of PRC1, by preventing recruitment of PRC1 to a specific chromosomal locus. For example, data herein shows that a single administration of inhibitory nucleic acids designed to specifically bind a RNA can alter expression of a gene associated with the RNA. Data provided herein also indicate that putative ncRNA binding sites for PRC1 show no conserved primary sequence motif, making it possible to design specific inhibitory nucleic acids that will interfere with PRC1 interaction with a single ncRNA, without generally disrupting PRC1 interactions with other ncRNAs. Further, data provided herein support that RNA can recruit PRC1 in a cis fashion, repressing gene expression at or near the specific chromosomal locus from which the RNA was transcribed, thus making it possible to design inhibitory nucleic acids that inhibit the function of PRC1 and increase the expression of a specific target gene.
In some embodiments, the inhibitory nucleic acid is provided for use in a method of modulating expression of a “gene targeted by the PRC1-binding RNA” (e.g., an intersecting or nearby gene, as set forth in Tables 1-3 of WO 2016/149455), meaning a gene whose expression is regulated by the PRC1-binding RNA. The term “PRC1-binding RNA” or “RNA that binds PRC1” is used interchangeably with “PRC1-associated RNA” and “PRC1-interacting RNA”, and refers to an RNA transcript or a region thereof (e.g., a Peak as described below) that binds the PRC1 complex, directly or indirectly. Such binding may be determined by dCLIP-SEQ techniques described herein using a component of the PRC1 complex, e.g., PRC1 itself. SEQ ID NOs: 1 to 5893 represent human RNA sequences containing portions that have been experimentally determined to bind PRC1 using the dCLIP-seq method described in WO 2016/149455; SEQ ID NOs: 17416 to 36368 represent murine RNA sequences containing portions that have been experimentally determined to bind PRC1 using the dCLIP-seq method; and SEQ ID NOs: 5894 to 17415 represent or human RNA sequences corresponding to the murine RNA sequences.
Such methods of modulating gene expression may be carried out in vitro, ex vivo, or in vivo. Tables 1-3 display genes targeted by the PRC1-binding RNA; the SEQ ID NOS: of the PRC1-associated RNA are set forth in the same row as the gene name. In some embodiments, the inhibitory nucleic acid is provided for use in a method of treating disease, e.g. a disease category as described herein. The treatment may involve modulating expression (either up or down) of a gene targeted by the PRC1-binding RNA, preferably upregulating gene expression. The inhibitory nucleic acid may be formulated as a sterile composition for parenteral administration. It is understood that any reference to uses of compounds throughout the description contemplates use of the compound in preparation of a pharmaceutical composition or medicament for use in the treatment of a disease. Thus, as one nonlimiting example, this aspect of the invention includes use of such inhibitory nucleic acids in the preparation of a medicament for use in the treatment of disease, wherein the treatment involves upregulating expression of a gene targeted by the PRC1-binding RNA.
Diseases, disorders or conditions that may be treated according to the invention include cardiovascular, metabolic, inflammatory, bone, neurological or neurodegenerative, pulmonary, hepatic, kidney, urogenital, bone, cancer, and/or protein deficiency disorders.
In a related aspect, the invention features a process of preparing an inhibitory nucleic acid that modulates gene expression, the process comprising the step of synthesizing an inhibitory nucleic acid of between 5 and 40 bases in length, or about 8 to 40, or about 5 to 50 bases in length, optionally single stranded, that specifically binds, or is complementary to, a motif as described herein within an RNA sequence that has been identified as binding to PRC1, optionally an RNA of any of Tables 1-3 of WO 2016/149455 or any one of SEQ ID NOs: 1 to 5893, or 5894 to 17415, or 17416 to 36368. This aspect of the invention may further comprise the step of identifying the RNA sequence as binding to PRC1, optionally through the dCLIP-seq method described herein.
In a further aspect of the present invention a process of preparing an inhibitory nucleic acid that specifically binds to an RNA that binds to Polycomb repressive complex 1 (PRC1) is provided, the process comprising the step of designing and/or synthesizing an inhibitory nucleic acid of between 5 and 40 bases in length, or about 8 to 40, or about 5 to 50 bases in length, optionally single stranded, that specifically binds to a motif within an RNA sequence that binds to PRC1, optionally an RNA of any of Tables 1-3 of WO 2016/149455 or any one of SEQ ID NOs: 1 to 5893, or 5894 to 17415, or 17416 to 36368.
In some embodiments prior to synthesizing the inhibitory nucleic acid the process further comprises identifying an RNA that binds to PRC1.
In some embodiments the RNA has been identified by a method involving identifying an RNA that binds to PRC1.
In some embodiments the inhibitory nucleic acid is at least 80% complementary to a contiguous sequence of between 5 and 40 bases, or about 8 to 40, or about 5 to 50 bases comprising said motif in said RNA sequence that binds to PRC1. In some embodiments the sequence of the designed and/or synthesized inhibitory nucleic acid is based on a said motif in an RNA sequence that binds to PRC1, or a portion thereof, said portion having a length of from 5 to 40 contiguous base pairs, or about 8 to 40 bases, or about 5 to 50 bases.
In some embodiments the sequence of the designed and/or synthesized inhibitory nucleic acid is based on a nucleic acid sequence that is complementary to said motif in an RNA sequence that binds to PRC1, or is complementary to a portion thereof, said portion having a length of from 5 to 40 contiguous base pairs, or about 8 to 40 base pairs, or about 5 to 50 base pairs.
The designed and/or synthesized inhibitory nucleic acid may be at least 80% complementary to (optionally one of at least 90%, 95%, 96%, 97%, 98%, 99% or 100% complementary to) the portion of the RNA sequence to which it binds or targets, or is intended to bind or target. In some embodiments it may contain 1, 2 or 3 base mismatches compared to the portion of the target RNA sequence or its complement respectively. In some embodiments it may have up to 3 mismatches over 15 bases, or up to 2 mismatches over 10 bases.
The inhibitory nucleic acid or portion of RNA sequence that binds to PRC1 may have a length of one of at least 8 to 40, or 10 to 50, or 5 to 50, or 5 to 40 bases, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 bases. Where the inhibitory nucleic acid is based on an RNA motif sequence that binds to a PRC1, a nucleic acid sequence that is complementary to said RNA motif sequence that binds to PRC1 or a portion of such a sequence, it may be based on information about that sequence, e.g. sequence information available in written or electronic form, which may include sequence information contained in publicly available scientific publications or sequence databases.
In some embodiments, the isolated single stranded oligonucleotide is of 5 to 40 nucleotides in length and has a region of complementarity that is complementary with at least 5, 6, 7, 8, 9, or 10 contiguous nucleotides of a motif within the PRC1-binding RNA that inhibits expression of the target gene, e.g., as described in WO 2016/149455, wherein the oligonucleotide is complementary to and binds specifically within a motif in a PRC1-binding region of the PRC1-binding RNA and interferes with binding of PRC1 to the PRC1-binding region without inducing degradation of the PRC1-binding RNA (e.g., wherein the PRC1-binding region has a nucleotide sequence identified as a motif as described herein), and without interfering with binding of PRC2 to a PRC2-binding region of the RNA (as described in WO 2012/087983 or WO 2012/065143, wherein the PRC2-binding region has a nucleotide sequence protected from nucleases during an RNA immunoprecipitation procedure using an antibody directed against PRC2), optionally wherein the PRC1-binding RNA is transcribed from a sequence of the chromosomal locus of the target gene, and optionally wherein a decrease in recruitment of PRC1 to the target gene in the cell following delivery of the single stranded oligonucleotide to the cell, compared with an appropriate control cell to which the single stranded oligonucleotide has not been delivered, indicates effectiveness of the single stranded oligonucleotide.
Where the design and/or synthesis involves design and/or synthesis of a sequence that is complementary to a nucleic acid described by such sequence information the skilled person is readily able to determine the complementary sequence, e.g. through understanding of Watson-Crick base pairing rules which form part of the common general knowledge in the field.
In the methods described above the RNA that binds to PRC1 may be, or have been, identified, or obtained, by a method that involves identifying RNA that binds to PRC1, e.g., as described herein or in WO 2016/149455.
In one embodiment the method involves the dCLIP-Seq method described herein and in of WO 2016/149455.
In accordance with the above, in some embodiments the RNA that binds to PRC1 may be one that is known to bind PRC1, e.g. information about the sequence of the RNA and/or its ability to bind PRC1 is available to the public in written or electronic form allowing the design and/or synthesis of the inhibitory nucleic acid to be based on that information. As such, an RNA that binds to PRC1 may be selected from known sequence information and used to inform the design and/or synthesis of the inhibitory nucleic acid.
In other embodiments the RNA that binds to PRC1 may be identified as one that binds PRC1 as part of the method of design and/or synthesis.
In preferred embodiments design and/or synthesis of an inhibitory nucleic acid involves manufacture of a nucleic acid from starting materials by techniques known to those of skill in the art, where the synthesis may be based on a sequence of an RNA (or portion thereof) that has been selected as known to bind to Polycomb repressive complex 2.
Methods of design and/or synthesis of an inhibitory nucleic acid may involve one or more of the steps of:
Identifying and/or selecting a portion of an RNA sequence that binds to PRC1 (e.g., as shown in Tables 1-3 of WO 2016/149455 or as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse));
Designing a nucleic acid sequence having a desired degree of sequence identity or complementarity to a sequenc comprising a motif within an RNA sequence that binds to PRC1 or a portion thereof;
Synthesizing a nucleic acid to the designed sequence;
Mixing the synthesized nucleic acid with at least one pharmaceutically acceptable diluent, carrier or excipient to form a pharmaceutical composition or medicament.
Inhibitory nucleic acids so designed and/or synthesized may be useful in method of modulating gene expression as described herein.
As such, the process of preparing an inhibitory nucleic acid may be a process that is for use in the manufacture of a pharmaceutical composition or medicament for use in the treatment of disease, optionally wherein the treatment involves modulating expression of a gene targeted by the RNA binds to PRC1.
Methods for isolating RNA sequences that interact with a selected protein, e.g., with chromatin complexes, in a cell are further described in WO 2016/149455.
In yet another aspect, the invention features methods for increasing expression of a tumor suppressor in a mammal, e.g. human, in need thereof. The methods include administering to said mammal an inhibitory nucleic acid that specifically binds, or is complementary, to a sequence comprising a motif within a human PRC1-interacting RNA corresponding to a tumor suppressor locus of any of Tables 1-3 of WO 2016/149455or a human RNA corresponding to an imprinted gene of any of Tables 1-3 of WO 2016/149455 or as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse), or a related naturally occurring RNA that is othologous or at least 90%, (e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%%, or 100%) identical over at least 15 (e.g., at least 20, 21, 25, 30, 100) nucoleobases thereof, in an amount effective to increase expression of the tumor suppressor or growth suppressing gene. It is understood that one method of determining human orthologous RNA that corresponds to murine RNA is to identify a corresponding human sequence at least 90% identical (e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical) to at least 15 nucleobases of the murine sequence (or at least 20, 21, 25, 30, 40, 50, 60, 70, 80, 90 or 100 nucleobases).
In an additional aspect, the invention provides methods for inhibiting or suppressing tumor growth in a mammal, e.g. human, with cancer, comprising administering to said mammal an inhibitory nucleic acid that specifically binds, or is complementary, to a sequence comorising a motif within a human PRC1-interacting RNA corresponding to a tumor suppressor locus of any of Tables 1-3 of WO 2016/149455, or a human RNA corresponding to an imprinted gene of any of Tables 1-3 of WO 2016/149455 or as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse), or a related naturally-occurring RNA that is orthologous or at least 90%, (e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) identical over at least 15 (e.g., at least 20, 21, 25, 30, 50, 70, 100) nucleobases thereof, in an amount effective to suppress or inhibit tumor growth.
In another aspect, the invention features methods for treating a mammal, e.g., a human, with cancer comprising administering to said mammal an inhibitory nucleic acid that specifically binds, or is complementary, to a sequence comprising a motif within a human RNA corresponding to a tumor suppressor locus of any of Tables 1-3 of WO 2016/149455, or a human RNA corresponding to an imprinted gene of Tables 1-3 of WO 2016/149455, or as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse), or a related naturally occurring RNA that is orthologous or at least 90% (e.g.,91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) identical over at least 15 (e.g., at least 20, 21, 25, 30, 50, 70, 100) nucleobases thereof, in a therapeutically effective amount.
Also provided herein are inhibitory nucleic acids that specifically bind, or are complementary to, a region of an RNA that is known to bind to Polycomb repressive complex 1 (PRC1) comprising a motif as described herein, wherein the sequence of the region is selected from the group consisting of SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse), for use in the treatment of disease, wherein the treatment involves modulating expression of a gene targeted by the RNA, wherein the inhibitory nucleic acid is between 5 and 40 bases in length, and wherein the inhibitory nucleic acid is formulated as a sterile composition.
Further described herein are processs for preparing an inhibitory nucleic acid that specifically binds, or is complementary to, a sequence comprising a motif as described herein within an RNA that is known to bind to Polycomb repressive complex 1 (PRC1), selected from the group consisting of SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse); the processes include the step of designing and/or synthesizing an inhibitory nucleic acid of between 5 and 40 bases in length, optionally single stranded, that specifically binds to a region of the RNA that binds PRC1.
In some embodiments, the sequence of the designed and/or synthesized inhibitory nucleic acid is a nucleic acid sequence that is complementary to said RNA sequence that binds to PRC1, or is complementary to a portion thereof, said portion having a length of from 5 to 40 contiguous base pairs.
In some embodiments, the inhibitory nucleic acid is for use in the manufacture of a pharmaceutical composition or medicament for use in the treatment of disease, optionally wherein the treatment involves modulating expression of a gene targeted by the RNA binds to PRC1.
In some embodiments, the modulation is increasing expression of a gene and the region of the RNA that binds PRC1 can be in intergenic space mapping to a noncoding RNA, antisense to the coding gene, or in the promoter, 3′UTR, 5′UTR, exons, and introns of a coding gene.
In some embodiments, the modulation is decreasing expression of a gene and the region of the RNA that binds PRC1 can be in intergenic space mapping to a noncoding RNA, antisense to the coding gene, or in the promoter, 3′UTR, 5′UTR, exons, and introns of a coding gene.
In some embodiments,the modulation is to influence gene expression by altering splicing of a gene and the region of the RNA that binds PRC1 can be in intergenic space mapping to a noncoding RNA, antisense to the coding gene, or in the promoter, 3′UTR, 5′UTR, exons, and introns of a coding gene.
Also provided herein are sterile compositions comprising an inhibitory nucleic acid that specifically binds, or is complementary to, a sequence comprising a motif as described herein within an RNA sequence of any one of SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse) and is capable of modulating expression of a gene targeted by the RNA as set forth in Tables 1-3 of WO 2016/149455. In some embodiments, the composition is for parenteral administration. In some embodiments, the RNA sequence is in the 3′UTR of a gene, and the inhibitory nucleic acid is capable of upregulating or downregulating expression of a gene targeted by the RNA.
Also provided herein is an inhibitory nucleic acid for use in the treatment of disease, wherein said inhibitory nucleic acid specifically binds, or is complementary to, a sequence comprising a motif as described herein within an RNA sequence of any one of SEQ ID NOs:1 to 5893 (human) or 5894 to 17415 (human), and wherein the treatment involves modulating expression of a gene targeted by the RNA according to Tables 1-3 of WO 2016/149455.
The present disclosure also provides methods for modulating gene expression in a cell or a mammal comprising administering to the cell or the mammal an inhibitory nucleic acid that specifically binds, or is complementary to, a sequence comprising a motif as described herein within an RNA sequence of any one of SEQ ID NOs:1 to 5893 (human) or 5894 to 17415 (human) or 17416 to 36368 (mouse), in an amount effective for modulating expression of a gene targeted by the RNA according to Tables 1-3 of WO 2016/149455.
In addition, provided herein are inhibitory nucleic acids of about 5 to 50 bases in length that specifically bind, or are complementary to, at least 5, 6, 7, 8, 9 or 10 consecutive bases within a sequence comprising a motif as described herein within any of SEQ ID NOs:1 to 5893 (human) or 5894 to 17415 (human) or 17416 to 36368 (mouse), optionally for use in the treatment of disease, wherein the treatment involves modulating expression of a gene targeted by the RNA.
In addition, provided are methods for modulating expression of a gene comprising administering to a mammal an inhibitory nucleic acid as described herein in an amount effective for modulating expression of a gene targeted by the RNA as set forth in Tables 1-3 of WO 2016/149455.
In some embodiments, the modulation is upregulating gene expression, optionally wherein the gene targeted by the RNA is selected from the group set forth in Tables 1-3 of WO 2016/149455, and wherein the RNA sequence is listed in the same row as the gene.
In some embodiments, the inhibitory nucleic acid is 5 to 40 bases in length (optionally 12-30, 12-28, or 12-25 bases in length), and optionally the sequence that binds to the motif is centered in the nucleic acid.
In some embodiments, the inhibitory nucleic acid is 10 to 50 bases in length.
In some embodiments, the inhibitory nucleic acid comprises a base sequence at least 90% complementary to at least 10 bases of the RNA sequence.
In some embodiments, the inhibitory nucleic acid comprises a sequence of bases at least 80% or 90% complementary to, e.g., at least 5-30, 10-30, 15-30, 20-30, 25-30 or 5-40, 10-40, 15-40, 20-40, 25-40, or 30-40 bases of the RNA sequence.
In some embodiments, the inhibitory nucleic acid comprises a sequence of bases with up to 3 mismatches (e.g., up to 1, or up to 2 mismatches) in complementary base pairing over 10, 15, 20, 25 or 30 bases of the RNA sequence. In some embodiments, the mismatches are not in the motif-binding region
In some embodiments, the inhibitory nucleic acid comprises a sequence of bases at least 80% complementary to at least 10 bases of the RNA sequence.
In some embodiments,the inhibitory nucleic acid comprises a sequence of bases with up to 3 mismatches over 15 bases of the RNA sequence.
In some embodiments, the inhibitory nucleic acid is single stranded.
In some embodiments, the inhibitory nucleic acid is double stranded.
In some embodiments, the inhibitory nucleic acid comprises one or more modifications comprising: a modified sugar moiety, a modified internucleoside linkage, a modified nucleotide and/or combinations thereof.
In some embodiments, the inhibitory nucleic acid is an antisense oligonucleotide, LNA molecule, PNA molecule, ribozyme or siRNA.
In some embodiments, the inhibitory nucleic acid is double stranded and comprises an overhang (optionally 2-6 bases in length) at one or both termini.
In some embodiments, the inhibitory nucleic acid is selected from the group consisting of antisense oligonucleotides, ribozymes, external guide sequence (EGS) oligonucleotides, siRNA compounds, micro RNAs (miRNAs); small, temporal RNAs (stRNA), and single- or double-stranded RNA interference (RNAi) compounds.
In some embodiments, the RNAi compound is selected from the group consisting of short interfering RNA (siRNA); or a short, hairpin RNA (shRNA); small RNA-induced gene activation (RNAa); and small activating RNAs (saRNAs).
In some embodiments, the antisense oligonucleotide is selected from the group consisting of antisense RNAs, antisense DNAs, and chimeric antisense oligonucleotides.
In some embodiments, the modified internucleoside linkage comprises at least one of: alkylphosphonate, phosphorothioate, phosphorodithioate, alkylphosphonothioate, phosphoramidate, carbamate, carbonate, phosphate triester, acetamidate, carboxymethyl ester, or combinations thereof.
In some embodiments, the modified sugar moiety comprises a 2′-O-methoxyethyl modified sugar moiety, a 2′-methoxy modified sugar moiety, a 2′-O-alkyl modified sugar moiety, or a bicyclic sugar moiety. In some embodiments, the inhibitory nucleic acids include 2′-OMe, 2′-F, LNA, PNA, FANA, ENA or morpholino modifications.
Further provided are sterile compositions comprising an isolated nucleic acid as described herein.
Further, provided herein are methods of inducing expression of a target gene in a cell, the method comprising delivering to the cell a single stranded oligonucleotide of 5 to 40 nucleotides in length having a region of complementarity that is complementary with at least 5, 6, 7, 8, 9, or 10 contiguous nucleotides including a motif as described herein within a PRC1-binding RNA that inhibits expression of the target gene, wherein the oligonucleotide is complementary to and binds specifically to the PRC1-binding RNA, and wherein the PRC1-binding RNA is transcribed from a sequence of the chromosomal locus of the target gene.
In some embodiments, the RNA is a non-codingRNA.
In some embodiments, the methods include detecting expression of the PRC1-binding RNA in the cell, wherein expression of the PRC1-binding RNA in the cell indicates that the single stranded oligonucleotide is suitable for increasing expression of the target gene in the cell.
In some embodiments, the methods include detecting a change in expression of the target gene following delivery of the single stranded oligonucleotide to the cell, wherein an increase in expression of the target gene compared with an appropriate control cell indicates effectiveness of the single stranded oligonucleotide.
In some embodiments, the methods include detecting a change in recruitment of PRC1 to the target gene in the cell following delivery of the single stranded oligonucleotide to the cell, wherein a decrease in recruitment compared with an appropriate control cell indicates effectiveness of the single stranded oligonucleotide.
In some embodiments, the cell is in vitro.
In some embodiments, the cell is in vivo.
In some embodiments, at least one nucleotide of the oligonucleotide is a modified nucleotide.
In some embodiments, the PRC1-binding RNA is transcribed from the same strand as the target gene in a genomic region containing the target gene.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA comprising a motif as described herein and transcribed from a portion of the target gene corresponding to an exon.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA comprising a motif as described herein and transcribed from the same strand as the target gene within a chromosomal region within −2.0 kb to +0.001 kb of the transcription start site of the target gene.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA comprising a motif as described herein and transcribed from the opposite strand of the target gene within a chromosomal region within −0.5 to +0.1 kb of the transcription start site of the target gene.
In some embodiments, the oligonucleotide has complementarity to the PRC1-binding RNA in a region of the PRC1-binding RNA comprising a motif as described herein, that optionally forms a stem-loop structure.
In some embodiments, at least one nucleotide of the oligonucleotide is an RNA or DNA nucleotide.
In some embodiments, at least one nucleotide of the oligonucleotide is a ribonucleic acid analogue comprising a ribose ring having a bridge between its 2′-oxygen and 4′-carbon.
In some embodiments, the ribonucleic acid analogue comprises a methylene bridge between the 2′-oxygen and the 4′-carbon.
In some embodiments, at least one nucleotide of the oligonucleotide comprises a modified sugar moiety.
In some embodiments, the modified sugar moiety comprises a 2′-O-methoxyethyl modified sugar moiety, a 2′-methoxy modified sugar moiety, a 2′-O-alkyl modified sugar moiety, or a bicyclic sugar moiety.
In some embodiments, the oligonucleotide comprises at least one modified internucleoside linkage.
In some embodiments, the at least one modified internucleoside linkage is selected from phosphorothioate, phosphorodithioate, alkylphosphonothioate, phosphoramidate, carbamate, carbonate, phosphate triester, acetamidate, carboxymethyl ester, and combinations thereof.
In some embodiments, the oligonucleotide is configured such that hybridization of the single stranded oligonucleotide to the PRC1-binding RNA does not activate an RNAse H pathway in the cell.
In some embodiments, the oligonucleotide is configured such that hybridization of the single stranded oligonucleotide to the PRC1-binding RNA does not induce substantial cleavage or degradation of the PRC1-binding RNA in the cell.
In some embodiments, the oligonucleotide is configured such that hybridization of the single stranded oligonucleotide to the PRC1-binding RNA interferes with interaction of the RNA with PRC1 in the cell.
In some embodiments, the target gene is a protein-coding gene.
In some embodiments, the chromosomal locus of the target gene is an endogenous gene of an autosomal chromosome.
In some embodiments, the cell is a cell of a male subject.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA transcribed from a portion of the target gene corresponding to an intron-exon junction or an intron.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA transcribed from a portion of the target gene corresponding to a translation initiation region or a translation termination region.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA transcribed from a portion of the target gene corresponding to a promoter.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA transcribed from a portion of the target gene corresponding to a 5′-UTR.
In some embodiments, the oligonucleotide has complementarity to a region of the PRC1-binding RNA transcribed from a portion of the target gene corresponding to a 3′-UTR.
In some or any embodiments, the inhibitory nucleic acid is an oligomeric base compound or oligonucleotide mimetic that hybridizes to at least a portion of the target nucleic acid and modulates its function. In some or any embodiments, the inhibitory nucleic acid is single stranded or double stranded. A variety of exemplary inhibitory nucleic acids are known and described in the art. In some examples, the inhibitory nucleic acid is an antisense oligonucleotide, locked nucleic acid (LNA) molecule, peptide nucleic acid (PNA) molecule, ribozyme, siRNA, antagomirs, external guide sequence (EGS) oligonucleotide, microRNA (miRNA), small, temporal RNA (stRNA), or single- or double-stranded RNA interference (RNAi) compounds. It is understood that the term “LNA molecule” refers to a molecule that comprises at least one LNA modification; thus LNA molecules may have one or more locked nucleotides (conformationally constrained) and one or more non-locked nucleotides. It is also understood that the term “LNA” includes a nucleotide that comprises any constrained sugar that retains the desired properties of high affinity binding to complementary RNA, nuclease resistance, lack of immune stimulation, and rapid kinetics. Exemplary constrained sugars include those listed below. Similarly, it is understood that the term “PNA molecule” refers to a molecule that comprises at least one PNA modification and that such molecules may include unmodified nucleotides or internucleoside linkages.
In some or any embodiments, the inhibitory nucleic acid comprises at least one nucleotide and/or nucleoside modification (e.g., modified bases or with modified sugar moieties), modified internucleoside linkages, and/or combinations thereof. Thus, inhibitory nucleic acids can comprise natural as well as modified nucleosides and linkages. Examples of such chimeric inhibitory nucleic acids, including hybrids or gapmers, are described below.
In some embodiments, the inhibitory nucleic acid comprises one or more modifications comprising: a modified sugar moiety, and/or a modified internucleoside linkage, and/or a modified nucleotide and/or combinations thereof. In some embodiments, the modified internucleoside linkage comprises at least one of: alkylphosphonate, phosphorothioate, phosphorodithioate, alkylphosphonothioate, phosphoramidate, carbamate, carbonate, phosphate triester, acetamidate, carboxymethyl ester, or combinations thereof. In some embodiments, the modified sugar moiety comprises a 2′-O-methoxyethyl modified sugar moiety, a 2′-methoxy modified sugar moiety, a 2′-O-alkyl modified sugar moiety, or a bicyclic sugar moiety. Other examples of modifications include locked nucleic acid (LNA), peptide nucleic acid (PNA), arabinonucleic acid (ANA), optionally with 2′-F modification, 2′-fluoro-D-Arabinonucleic acid (FANA), phosphorodiamidate morpholino oligomer (PMO), ethylene-bridged nucleic acid (ENA), optionally with 2′-O,4′-C-ethylene bridge, and bicyclic nucleic acid (BNA). Yet other examples are described below and/or are known in the art.
In some embodiments, the inhibitory nucleic acid is 5-40 bases in length (e.g., 12-30, 12-28, 12-25). The inhibitory nucleic acid may also be 10-50, or 5-50 bases length. For example, the inhibitory nucleic acid may be one of any of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 bases in length. In some embodiments, the inhibitory nucleic acid is double stranded and comprises an overhang (optionally 2-6 bases in length) at one or both termini. In other embodiments, the inhibitory nucleic acid is double stranded and blunt-ended. In some embodiments, the inhibitory nucleic acid comprises or consists of a sequence of bases at least 80% or 90% complementary to, e.g., at least 5, 10, 15, 20, 25 or 30 bases of, or up to 30 or 40 bases of, the target RNA, or comprises a sequence of bases with up to 3 mismatches (e.g., up to 1, or up to 2 mismatches) over 10, 15, 20, 25 or 30 bases of the target RNA.
Thus, the inhibitory nucleic acid can comprise or consist of a sequence of bases at least 80% complementary to at least 10 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 15, or 15-30, or 15-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 20, or 20-30, or 20-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 25, or 25-30, or 25-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 30, or 30-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 40 contiguous bases of the target RNA comprising a motif as described herein. Moreover, the inhibitory nucleic acid can comprise or consist of a sequence of bases at least 90% complementary to at least 10 contiguous bases of the target RNA comprising a motif as described herein, or at least 90%complementary to at least 15, or 15-30, or 15-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 20, or 20-30, or 20-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 25, or 25-30, or 25-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 30, or 30-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 40 contiguous bases of the target RNA comprising a motif as described herein. Similarly, the inhibitory nucleic acid can comprise or consist of a sequence of bases fully complementary to at least 5, 10, or 15 contiguous bases of the target RNA comprising a motif as described herein.
In some or any embodiments, the inhibitory nucleic acid is 5 to 40, or 8 to 40, or 10 to 50 bases in length (e.g., 12-30, 12-28, 12-25, 5-25, or 10-25, bases in length), and comprises a sequence of bases with up to 3 mismatches in complementary base pairing over 15 bases of , or up to 2 mismatches over 10 bases.
In an additional aspect, the invention provides methods for enhancing pluripotency of a stem cell. The methods include contacting the cell with an inhibitory nucleic acid that specifically binds, or is complementary, to a nucleic acid sequence that is at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% homologous to a sequence comprising a motif as described herein within a PRC1-binding RNA, as referred to in Tables 1-3 of WO 2016/149455. PRC1-binding fragments of murine or orthologous RNAs, including human RNAs, are contemplated in the aforementioned method.
In a further aspect, the invention features methods for enhancing differentiation of a stem cell, the method comprising contacting the cell with an inhibitory nucleic acid that specifically binds, or is complementary, to a PRC1-binding RNA sequence as set forth in SEQ ID NOS. 17416 to 36368 [mouse Peaks] or 1 to 5893 [human Peaks] or 5894 to 17416 [human Peaks identified by LiftOver].
In some embodiments, the stem cell is an embryonic stem cell. In some embodiments, the stem cell is an iPS cell or an adult stem cell.
In an additional aspect, the invention provides sterile compositions including an inhibitory nucleic acid as described herein. In some embodiments, the inhibitory nucleic acid is selected from the group consisting of antisense oligonucleotides, ribozymes, external guide sequence (EGS) oligonucleotides, siRNA compounds, micro RNAs (miRNAs); small, temporal RNAs (stRNA), and single- or double-stranded RNA interference (RNAi) compounds. In some embodiments, the RNAi compound is selected from the group consisting of short interfering RNA (siRNA); or a short, hairpin RNA (shRNA); small RNA-induced gene activation (RNAa); and small activating RNAs (saRNAs).
In some embodiments, the antisense oligonucleotide is selected from the group consisting of antisense RNAs, antisense DNAs, chimeric antisense oligonucleotides, and antisense oligonucleotides.
In some embodiments, the inhibitory nucleic acid comprises one or more modifications comprising: a modified sugar moiety, a modified internucleoside linkage, a modified nucleotide and/or combinations thereof. In some embodiments, the modified internucleoside linkage comprises at least one of: alkylphosphonate, phosphorothioate, phosphorodithioate, alkylphosphonothioate, phosphoramidate, carbamate, carbonate, phosphate triester, acetamidate, carboxymethyl ester, or combinations thereof. In some embodiments, the modified sugar moiety comprises a 2′-O-methoxyethyl modified sugar moiety, a 2′-methoxy modified sugar moiety, a 2′-O-alkyl modified sugar moiety, or a bicyclic sugar moiety. Other examples of modifications include locked nucleic acid (LNA), peptide nucleic acid (PNA), arabinonucleic acid (ANA), optionally with 2′-F modification, 2′-fluoro-D-Arabinonucleic acid (FANA), phosphorodiamidate morpholino oligomer (PMO), ethylene-bridged nucleic acid (ENA), optionally with 2′-O,4′-C-ethylene bridge, and bicyclic nucleic acid (BNA). Yet other examples are described below and/or are known in the art.
Inhibitory nucleic acids that specifically bind to a sequence comprising a motif as described herein within any of the RNA peaks set forth in any one of SEQ ID NOs: 1 to 5893, 5894 to 17415, or 17416 to 36368, are also contemplated. In particular, the invention features uses of these inhibitory nucleic acids to upregulate expression of any of the genes set forth in Tables 1-3 of WO 2016/149455, for use in treating a disease, disorder, condition or association known in the art (whether in the “opposite strand” column or the “same strand”); upregulations of a set of genes grouped together in any one of the categories is contemplated. In some embodiments it is contemplated that expression may be increased by at least about 15-fold, 20-fold, 30-fold, 40-fold, 50-fold or 100-fold, or any range between any of the foregoing numbers. In other experiments, increased mRNA expression has been shown to correlate to increased protein expression.
Thus, in various aspects, the invention features inhibitory nucleic acids that specifically bind to motifs as described herien within any of the RNA sequences as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse) or of any of Tables 1-3 of WO 2016/149455, for use in modulating expression of a group of reference genes that fall within any one or more of the categories set forth in the tables, and for treating corresponding diseases, disorders or conditions.
In another aspect, the invention also features inhibitory nucleic acids that specifically bind, or are complementary, to motifs as described herien within any of the RNA sequences of SEQ ID NOS: 17416 to 36368 [mouse Peaks] or 1 to 5893 [human Peaks] or 5894 to 17416 [human Peaks identified by LiftOver], whether in the “opposite strand” column or the “same strand” column of Tables 1-3 of WO 2016/149455. In some embodiments, the inhibitory nucleic acid is provided for use in a method of modulating expression of a gene targeted by the PRC1-binding RNA (e.g., an intersecting or nearby gene, as set forth in any of Tables 1-4 of WO 2016/149455). Such methods may be carried out in vitro, ex vivo, or in vivo. In some embodiments, the inhibitory nucleic acid is provided for use in methods of treating disease, e.g. as described below. The treatments may involve modulating expression (either up or down) of a gene targeted by the PRC1-binding RNA, preferably upregulating gene expression. In some embodiments, the inhibitory nucleic acid is formulated as a sterile composition for parenteral administration. Thus, in one aspect the invention describes a group of inhibitory nucleic acids that specifically bind, or are complementary to, sequences comprising motifs as described herien within a group of RNA sequences, i.e., Peaks, in any one of Tables 1, 2, or 3 of WO 2016/149455. In particular, the invention features uses of such inhibitory nucleic acids to upregulate expression of any of the reference genes set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse) or in Tables 1-3 of WO 2016/149455, for use in treating a disease, disorder, or condition.
It is understood that inhibitory nucleic acids of the invention may be complementary to, or specifically bind to, motifs within Peaks, or regions adjacent to (within 100, 200, 300, 400, or 500 nts of) Peaks, as shown in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse) or in Tables 1-3 of WO 2016/149455.
Also provided herein are methods for treating a subject with MECP2 Duplication Syndrome. The methods include administering a therapeutically effective amount of an inhibitory nucleic acid targeting a motif within a PRC1-binding region on Mecp2 RNA, e.g., an inhibitory nucleic acid targeting a motif within a sequence within the 3′UTR of Mecp2. In some embodiments, the inhibitory nucleic acid comprises a sequence shown herein, and/or does not comprise any of SEQ ID NOs:36399 to 36404.
Further provided herein are methods for treating a subject with systemic lupus erythematosus. The methods include administering a therapeutically effective amount of an inhibitory nucleic acid targeting a motif within a PRC1-binding region on IRAK1 RNA, e.g., an inhibitory nucleci acid targeting a sequence a motif within within the 3′UTR of IRAK1. In some embodiments, the inhibitory nucleic acid comprises a sequence shown herein, and/or does not comprise any of SEQ ID NOs:36396 to 36398.
In some embodiments, the inhibitory nucleic acid comprises at least one locked nucleotide.
Also provided herein are inhibitory nucleic acids targeting a motif within a PRC1-binding region on Mecp2 RNA, preferably wherein the PRC1 binding region comprises SEQ ID NO:5876 or 5877, and/or preferably an inhibitory nucleic acid targeting a sequence comprising a motif within the 3′UTR of Mecp2, for use in treating a subject with MECP2 Duplication Syndrome. In some embodiments, the inhibitory nucleic acid comprises a sequence shown herein, and/or does not comprise any of SEQ ID NOs:36399 to 36404.
In addition, provided herein are inhibitory nucleic acids targeting a motif within a PRC1-binding region on IRAK1 RNA, preferably wherein the PRC1 binding region comprises SEQ ID NO:5874 or 5875, and/or preferably an inhibitory nucleic acid targeting a sequence comprising a motif within the 3′UTR of IRAK1, for use in treating a subject with systemic lupus erythematosus. In some embodiments, the inhibitory nucleic acid comprises a sequence shown herein, and/or does not comprise any of SEQ ID NOs:36396 to 36398.
In some or any embodiments, the inhibitory nucleic acids are, e.g., about 5 to 40, about 8 to 40, or 10 to 50 bases, or 5 to 50 bases in length. In some embodiments, the inhibitory nucleic acid comprises or consists of a sequence of bases at least 80% or 90% complementary to, e.g., at least 5, 10, 15, 20, 25 or 30 bases of, or up to 30 or 40 bases of, the target RNA (e.g., any one of SEQ ID NOs: 1 to 36,368), or comprises a sequence of bases with up to 3 mismatches (e.g., up to 1, or up to 2 mismatches) over 10, 15, 20, 25 or 30 bases of the target RNA comprising a motif as described herein.
Thus, as noted above, the inhibitory nucleic acid can comprise or consist of a sequence of bases at least 80% complementary to at least 10, or 10-30 or 10-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 15, or 15-30, or 15-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 20, or 20-30, or 20-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 25, or 25-30, or 25-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 30, or 30-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 80% complementary to at least 40 contiguous bases of the target RNA comprising a motif as described herein. Moreover, the inhibitory nucleic acid can comprise or consist of a sequence of bases at least 90% complementary to at least 5, or 5-30 or 5-40 or 8-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 10, or 10-30, or 10-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90%complementary to at least 15, or 15-30, or 15-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 20, or 20-30, or 20-40 contiguous bases of the target
RNA comprising a motif as described herein, or at least 90% complementary to at least 25, or 25-30, or 25-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 30, or 30-40 contiguous bases of the target RNA comprising a motif as described herein, or at least 90% complementary to at least 40 contiguous bases of the target RNA comprising a motif as described herein. Similarly, the inhibitory nucleic acid can comprise or consist of a sequence of bases fully complementary to at least 5, 10, or 15 contiguous bases of the target RNA comprising a motif as described herein. It is understood that some additional non-complementary bases may be included. It is understood that inhibitory nucleic acids that comprise such sequences of bases as described may also comprise other non-complementary bases. For example, an inhibitory nucleic acid can be 20 bases in total length but comprise a 15 base portion that is fully complementary to 15 bases of the target RNA comprising a motif as described herein. Similarly, an inhibitory nucleic acid can be 20 bases in total length but comprise a 15 base portion that is at least 80% complementary to 15 bases of the target RNA comprising a motif as described herein. Preferably the portion that is complementary to the motif sequence is 100% complementary.
Complementarity can also be referenced in terms of the number of mismatches in complementary base pairing, as noted above. Thus, the inhibitory nucleic acid can comprise or consist of a sequence of bases with up to 3 mismatches over 10 contiguous bases of the target RNA, or up to 3 mismatches over 15 contiguous bases of the target RNA, or up to 3 mismatches over 20 contiguous bases of the target RNA, or up to 3 mismatches over 25 contiguous bases of the target RNA, or up to 3 mismatches over 30 contiguous bases of the target RNA. Similarly, the inhibitory nucleic acid can comprise or consist of a sequence of bases with up to 2 mismatches over 10 contiguous bases of the target RNA, or up to 2 mismatches over 15 contiguous bases of the target RNA, or up to 2 mismatches over 20 contiguous bases of the target RNA, or up to 2 mismatches over 25 contiguous bases of the target RNA, or up to 2 mismatches over 30 contiguous bases of the target RNA. Similarly, the the inhibitory nucleic acid can comprise or consist of a sequence of bases with one mismatch over 10, 15, 20, 25 or 30 contiguous bases of the target RNA.
In some or any of the embodiments of inhibitory nucleic acids described herein (e.g. in the summary, detailed description, or examples of embodiments) or the processes for designing or synthesizing them, the inhibitory nucleic acids may optionally exclude (a) any LNA that disrupts binding of PRC2 to an RNA, e.g., as describe in WO 2012/087983 or WO 2012/065143; (b) any one or more of the specific inhibitory nucleic acids made or actually disclosed (i.e. specific chemistry, single or double-stranded, specific modifications, and specific base sequence), set forth in WO 2012/065143 or WO 2012/087983; and/or the general base sequence of any one or more of the inhibitory nucleic acids of (b); and/or (c) the group of inhibitory nucleic acids that specifically bind or are complementary to the same specific portion of RNA (a stretch of contiguous bases) as any one or more of the inhibitory nucleic acids of (a); as disclosed in any one or more of the following publications: as targeting ANRIL RNA (as described in Yap et al., Mol Cell. 2010 Jun. 11; 38(5):662-74) HOTAIR RNA (Rinn et al., 2007), Tsix, RepA, or Xist RNAs ((Zhao et al., 2008) [SEQ ID NOs: 936166-936170 of WO 2012/087983], or (Sarma et al., 2010) [SEQ ID NOs: 936177-936186 of WO 2012/087983] or (Zhao et al., 2010) [SEQ ID NOs: 936187-936188 of WO 2012/087983] or (Prasnath et al., 2005) [SEQ ID NOs: 936173-936176 of WO 2012/087983] or (Shamovsky et al., 2006) [SEQ ID NO: 936172 of WO 2012/087983] or (Mariner et al., 2008) [SEQ ID NO: 936171 of WO 2012/087983] or (Sunwoo et al., 2008) or (Bernard et al., 2010) [SEQ ID NO: 936189 of WO 2012/087983]; or as targeting short RNAs of 50-200 nt that are identified as candidate PRC2 regulators (Kanhere et al., 2010); or (Kuwabara et al., US 2005/0226848) [SEQ ID NOs: 936190-936191 of WO 2012/087983] or (Li et al., US 2010/0210707) [SEQ ID NOs: 936192-936227 of WO 2012/087983] or (Corey et al., U.S. Pat. No. 7,709,456) [SEQ ID NOs: 936228-936245] or (Mattick et al., WO 2009/124341), or (Corey et al., US 2010/0273863) [SEQ ID NOs: 936246-936265 of WO 2012/087983], or (Wahlstedt et al., US 2009/0258925) [SEQ ID NOs: 935060-935126 of WO 2012/087983], or BACE: US 2009/0258925 [SEQ ID NOs: 935060-935126 of WO 2012/087983]; ApoA1: US 2010/0105760/EP235283 [SEQ ID NOs: 935127-935299 of WO 2012/087983], P73, p53, PTEN, WO 2010/065787 A2/EP2370582 [SEQ ID NOs: 935300-935345 of WO 2012/087983]; SIRT1: WO 2010/065662 A2/EP09831068 [SEQ ID NOs: : 935346-935392 of WO 2012/087983]; VEGF: WO 2010/065671 A2/EP2370581 [SEQ ID NOs: 935393-935403 of WO 2012/087983]; EPO: WO 2010/065792 A2/EP09831152 [SEQ ID NOs: 935404-935412 of WO 2012/087983]; BDNF: WO2010/093904 [SEQ ID NOs: 935413-935423 of WO 2012/087983], DLK1: WO 2010/107740 [SEQ ID NOs: 935424-935430 of WO 2012/087983]; NRF2/NFE2L2: WO 2010/107733 [SEQ ID NOs: 935431-935438 of WO 2012/087983]; GDNF: WO 2010/093906 [SEQ ID NOs: 935439-935476 of WO 2012/087983]; SOX2, KLF4, Oct3A/B, “reprogramming factors: WO 2010/135329 [SEQ ID NOs: 935477-935493 of WO 2012/087983]; Dystrophin: WO 2010/129861 [SEQ ID NOs: 935494-935525 of WO 2012/087983]; ABCA1, LCAT, LRP1, ApoE, LDLR, ApoA1: WO 2010/129799 [SEQ ID NOs: 935526-935804 of WO 2012/087983]; HgF: WO 2010/127195 [SEQ ID NOs: 935805-935809 of WO 2012/087983]; TTP/Zfp36: WO 2010/129746[SEQ ID NOs: 935810-935824 of WO 2012/087983]; TFE3, IRS2: WO 2010/135695 [SEQ ID NOs: 935825-935839 of WO 2012/087983]; RIG1, MDA5, IFNA1: WO 2010/138806 [SEQ ID NOs: 935840-935878 of WO 2012/087983]; PON1: WO 2010/148065 [SEQ ID NOs: 935879-935885 of WO 2012/087983]; Collagen: WO/2010/148050 [SEQ ID NOs: 935886-935918 of WO 2012/087983]; Dyrk1A, Dscr1, “Down Syndrome Gene”: WO/2010/151674 [SEQ ID NOs: 935919-935942 of WO 2012/087983]; TNFR2: WO/2010/151671 [SEQ ID NOs: 935943-935951 of WO 2012/087983]; Insulin: WO/2011/017516 [SEQ ID NOs: 935952-935963 of WO 2012/087983]; ADIPOQ: WO/2011/019815 [SEQ ID NOs: 935964-935992 of WO 2012/087983]; CHIP: WO/2011/022606 [SEQ ID NOs: 935993-936004 of WO 2012/087983]; ABCB1: WO/2011/025862 [SEQ ID NOs: 936005-936014 of WO 2012/087983]; NEUROD1, EUROD1, HNF4A, MAFA, PDX, KX6, “Pancreatic development gene”: WO/2011/085066 [SEQ ID NOs: 936015-936054 of WO 2012/087983]; MBTPS1: WO/2011/084455 [SEQ ID NOs: 936055-936059 of WO 2012/087983]; SHBG: WO/2011/085347 [SEQ ID NOs: 936060-936075 of WO 2012/087983]; IRF8: WO/2011/082409 [SEQ ID NOs: 936076-936080 of WO 2012/087983]; UCP2: WO/2011/079263 [SEQ ID NOs: 936081-936093 of WO 2012/087983]; HGF: WO/2011/079261 [SEQ ID NOs: 936094-936104 of WO 2012/087983]; GH: WO/2011/038205 [SEQ ID NOs: 936105-936110 of WO 2012/087983]; IQGAP: WO/2011/031482 [SEQ ID NOs: 936111-936116 of WO 2012/087983]; NRF1: WO/2011/090740 [SEQ ID NOs: 936117-936123 of WO 2012/087983]; P63: WO/2011/090741 [SEQ ID NOs: 936124-936128 of WO 2012/087983]; RNAseH1: WO/2011/091390 [SEQ ID NOs: 936129-936140 of WO 2012/087983]; ALOX12B: WO/2011/097582 [SEQ ID NOs: 936141-936146 of WO 2012/087983]; PYCR1: WO/2011/103528 [SEQ ID NOs: 936147-936151 of WO 2012/087983]; CSF3: WO/2011/123745 [SEQ ID NOs: 936152-936157 of WO 2012/087983]; FGF21: WO/2011/127337 [SEQ ID NOs: 936158-936165 of WO 2012/087983]; SIRTUIN (SIRT): WO2011/139387 [SEQ ID NOs: 936266-936369 and 936408-936425 of WO 2012/087983]; PAR4: WO2011/143640 [SEQ ID NOs: 936370-936376 and 936426 of WO 2012/087983]; LHX2: WO2011/146675 [SEQ ID NOs: 936377-936388 and 936427-936429 of WO 2012/087983]; BCL2L11: WO2011/146674 [SEQ ID NO: 936389-936398 and 936430-936431 of WO 2012/087983]; MSRA: WO2011/150007 [SEQ ID NOs: 936399-936405 and 936432 of WO 2012/087983]; ATOH1: WO2011/150005 [SEQ ID NOs: 936406-936407 and 936433 of WO 2012/087983] of which each of the foregoing is incorporated by reference in its entirety herein. In some or any of the embodiments, optionally excluded from the invention are of inhibitory nucleic acids that specifically bind to, or are complementary to, any one or more of the following regions: Nucleotides 1-932 of SEQ ID NO: 935128 of WO 2012/087983; Nucleotides 1-1675 of SEQ ID NO: 935306 of WO 2012/087983; Nucleotides 1-518 of SEQ ID NO: 935307 of WO 2012/087983; Nucleotides 1-759 of SEQ ID NO: 935308 of WO 2012/087983; Nucleotides 1-25892 of SEQ ID NO: 935309 of WO 2012/087983; Nucleotides 1-279 of SEQ ID NO: 935310 of WO 2012/087983; Nucleotides 1-1982 of SEQ ID NO: 935311 of WO 2012/087983; Nucleotides 1-789 of SEQ ID NO: 935312 of WO 2012/087983; Nucleotides 1-467 of SEQ ID NO: 935313 of WO 2012/087983; Nucleotides 1-1028 of SEQ ID NO: 935347 of WO 2012/087983; Nucleotides 1-429 of SEQ ID NO: 935348 of WO 2012/087983; Nucleotides 1-156 of SEQ ID NO: 935349 of WO 2012/087983; Nucleotides 1-593 of SEQ ID NO:935350 of WO 2012/087983; Nucleotides 1-643 of SEQ ID NO: 935395 of WO 2012/087983; Nucleotides 1-513 of SEQ ID NO: 935396 of WO 2012/087983; Nucleotides 1-156 of SEQ ID NO: 935406 of WO 2012/087983; Nucleotides 1-3175 of SEQ ID NO: 935414 of WO 2012/087983; Nucleotides 1-1347 of SEQ ID NO: 935426 of WO 2012/087983; Nucleotides 1-5808 of SEQ ID NO: 935433 of WO 2012/087983; Nucleotides 1-237 of SEQ ID NO: 935440 of WO 2012/087983; Nucleotides 1-1246 of SEQ ID NO: 935441 of WO 2012/087983; Nucleotides 1-684 of SEQ ID NO: 935442 of WO 2012/087983; Nucleotides 1-400 of SEQ ID NO: 935473 of WO 2012/087983; Nucleotides 1-619 of SEQ ID NO: 935474 of WO 2012/087983;Nucleotides 1-813 of SEQ ID NO: 935475 of WO 2012/087983; Nucleotides 1-993 of SEQ ID NO: 935480 of WO 2012/087983; Nucleotides 1-401 of SEQ ID NO: 935480 of WO 2012/087983; Nucleotides 1-493 of SEQ ID NO: 935481 of WO 2012/087983; Nucleotides 1-418 of SEQ ID NO: 935482 of WO 2012/087983; Nucleotides 1-378 of SEQ ID NO: 935496 of WO 2012/087983; Nucleotides 1-294 of SEQ ID NO: 935497 of WO 2012/087983; Nucleotides 1-686 of SEQ ID NO: 935498 of WO 2012/087983; Nucleotides 1-480 of SEQ ID NO: 935499 of WO 2012/087983; Nucleotides 1-501 of SEQ ID NO: 935500 of WO 2012/087983; Nucleotides 1-1299 of SEQ ID NO: 935533 of WO 2012/087983; Nucleotides 1-918 of SEQ ID NO: 935534 of WO 2012/087983; Nucleotides 1-1550 of SEQ ID NO: 935535 of WO 2012/087983; Nucleotides 1-329 of SEQ ID NO: 935536 of WO 2012/087983; Nucleotides 1-1826 of SEQ ID NO: 935537 of WO 2012/087983; Nucleotides 1-536 of SEQ ID NO: 935538 of WO 2012/087983; Nucleotides 1-551 of SEQ ID NO: 935539 of WO 2012/087983; Nucleotides 1-672 of SEQ ID NO: 935540 of WO 2012/087983; Nucleotides 1-616 of SEQ ID NO: 935541 of WO 2012/087983; Nucleotides 1-471 of SEQ ID NO: 935542 of WO 2012/087983; Nucleotides 1-707 of SEQ ID NO: 935543 of WO 2012/087983; Nucleotides 1-741 of SEQ ID NO: 935544 of WO 2012/087983; Nucleotides 1-346 of SEQ ID NO: 935545 of WO 2012/087983; Nucleotides 1-867 of SEQ ID NO: 935546 of WO 2012/087983; Nucleotides 1-563 of SEQ ID NO: 935547 of WO 2012/087983; Nucleotides 1-970 of SEQ ID NO: 935812 of WO 2012/087983; Nucleotides 1-1117 of SEQ ID NO: 935913 of WO 2012/087983; Nucleotides 1-297 of SEQ ID NO: 935814 of WO 2012/087983; Nucleotides 1-497 of SEQ ID NO: 935827 of WO 2012/087983; Nucleotides 1-1267 of SEQ ID NO: 935843 of WO 2012/087983; Nucleotides 1-586 of SEQ ID NO: 935844 of WO 2012/087983; Nucleotides 1-741 of SEQ ID NO: 935845 of WO 2012/087983; Nucleotides 1-251 of SEQ ID NO: 935846 of WO 2012/087983; Nucleotides 1-681 of SEQ ID NO: 935847 of WO 2012/087983; Nucleotides 1-580 of SEQ ID NO: 935848 of WO 2012/087983; Nucleotides 1-534 of SEQ ID NO: 935880 of WO 2012/087983; Nucleotides 1-387 of SEQ ID NO: 935889 of WO 2012/087983; Nucleotides 1-561 of SEQ ID NO: 935890 of WO 2012/087983; Nucleotides 1-335 of SEQ ID NO: 935891 of WO 2012/087983; Nucleotides 1-613 of SEQ ID NO: 935892 of WO 2012/087983; Nucleotides 1-177 of SEQ ID NO: 935893 of WO 2012/087983; Nucleotides 1-285 of SEQ ID NO: 935894 of WO 2012/087983; Nucleotides 1-3814 of SEQ ID NO: 935921 of WO 2012/087983; Nucleotides 1-633 of SEQ ID NO: 935922 of WO 2012/087983; Nucleotides 1-497 of SEQ ID NO: 935923 Nucleotides 1-545 of SEQ ID NO: 935924 of WO 2012/087983; Nucleotides 1-413 of SEQ ID NO: 935950 of WO 2012/087983; Nucleotides 1-413 of SEQ ID NO: 935951 of WO 2012/087983; Nucleotides 1-334 of SEQ ID NO: 935962 of WO 2012/087983; Nucleotides 1-582 of SEQ ID NO: 935963 of WO 2012/087983; Nucleotides 1-416 of SEQ ID NO: 935964 of WO 2012/087983; Nucleotides 1-3591 of SEQ ID NO: 935990 of WO 2012/087983; Nucleotides 1-875 of SEQ ID NO: 935991 of WO 2012/087983; Nucleotides 1-194 of SEQ ID NO: 935992 of WO 2012/087983; Nucleotides 1-2074 of SEQ ID NO: 936003 of WO 2012/087983; Nucleotides 1-1237 of SEQ ID NO: 936004 of WO 2012/087983; Nucleotides 1-4050 of SEQ ID NO: 936013 of WO 2012/087983; Nucleotides 1-1334 of SEQ ID NO: 936014 of WO 2012/087983; Nucleotides 1-1235 of SEQ ID NO: 936048 of WO 2012/087983; Nucleotides 1-17,964 of SEQ ID NO: 936049 of WO 2012/087983; Nucleotides 1-50,003 of SEQ ID NO: 936050 of WO 2012/087983; Nucleotides 1-486 of SEQ ID NO: 936051 of WO 2012/087983; Nucleotides 1-494 of SEQ ID NO: 936052 of WO 2012/087983; Nucleotides 1-1992 of SEQ ID NO: 936053 of WO 2012/087983; Nucleotides 1-1767 of SEQ ID NO: 936054 of WO 2012/087983; Nucleotides 1-1240 of SEQ ID NO: 936059 of WO 2012/087983; Nucleotides 1-3016 of SEQ ID NO: 936074 of WO 2012/087983; Nucleotides 1-1609 of SEQ ID NO: 936075 of WO 2012/087983; Nucleotides 1-312 of SEQ ID NO: 936080 of WO 2012/087983; Nucleotides 1-243 of SEQ ID NO: 936092 of WO 2012/087983; Nucleotides 1-802 of SEQ ID NO: 936093 of WO 2012/087983; Nucleotides 1-514 of SEQ ID NO: 936102 of WO 2012/087983; Nucleotides 1-936 of SEQ ID NO: 936103 of WO 2012/087983; Nucleotides 1-1075 of SEQ ID NO: 936104 of WO 2012/087983; Nucleotides 1-823 of SEQ ID NO: 936110 of WO 2012/087983; Nucleotides 1-979 of SEQ ID NO: 936116 of WO 2012/087983; Nucleotides 1-979 of SEQ ID NO: 936123 of WO 2012/087983; Nucleotides 1-288 of SEQ ID NO: 936128 of WO 2012/087983; Nucleotides 1-437 of SEQ ID NO: 936137 of WO 2012/087983; Nucleotides 1-278 of SEQ ID NO: 936138 of WO 2012/087983; Nucleotides 1-436 of SEQ ID NO: 936139 of WO 2012/087983; Nucleotides 1-1140 of SEQ ID NO: 936140 of WO 2012/087983; Nucleotides 1-2082 of SEQ ID NO: 936146 of WO 2012/087983; Nucleotides 1-380 of SEQ ID NO: 936151 of WO 2012/087983; Nucleotides 1-742 of SEQ ID NO: 936157 of WO 2012/087983; Nucleotides 1-4246 of SEQ ID NO: 936165 of WO 2012/087983; Nucleotides 1-1028 of SEQ ID NO: 936408 of WO 2012/087983; Nucleotides 1-429 of SEQ ID NO: 936409 of WO 2012/087983; Nucleotides 1-508 of SEQ ID NO: 936410 of WO 2012/087983; Nucleotides 1-593 of SEQ ID NO: 936411 of WO 2012/087983; Nucleotides 1-373 of SEQ ID NO: 936412 of WO 2012/087983; Nucleotides 1-1713 of SEQ ID NO: 936413 of WO 2012/087983; Nucleotides 1-660 of SEQ ID NO:936414 of WO 2012/087983; Nucleotides 1-589 of SEQ ID NO: 936415 of WO 2012/087983; Nucleotides 1-726 of SEQ ID NO: 936416 of WO 2012/087983; Nucletides 1-320 of SEQ ID NO: 936417 of WO 2012/087983; Nucletides 1-616 of SEQ ID NO: 936418 of WO 2012/087983; Nucletides 1-492 of SEQ ID NO: 936419 to of WO 2012/087983; Nucletides 1-428 of SEQ ID NO: 936420 of WO 2012/087983; Nucletides 1-4041 of SEQ ID NO: 936421 of WO 2012/087983; Nucletides 1-705 of SEQ ID NO: 936422 of WO 2012/087983; Nucletides 1-2714 of SEQ ID NO: 936423 of WO 2012/087983; Nucletides 1-1757 of SEQ ID NO: 936424 of WO 2012/087983; Nucletides 1-3647 of SEQ ID NO: 936425 of WO 2012/087983; Nucleotides 1-354 of SEQ ID NO: 936426 of WO 2012/087983; Nucleotides 1-2145 of SEQ ID NO: 936427, Nucleotides 1-606 of SEQ ID NO: 936428 of WO 2012/087983; Nucleotides 1-480 of SEQ ID NO: 936429 of WO 2012/087983; Nucleotides 1-3026 of SEQ ID NO: 936430 of WO 2012/087983; Nucleotides 1-1512 of SEQ ID NO: 936431 of WO 2012/087983; Nucleotides 1-3774 of SEQ ID NO: 936432 of WO 2012/087983; Nucleotides 1-589 of SEQ ID NO: 936433.
In some of the embodiments of inhibitory nucleic acids described herein, or processes for designing or synthesizing them, the inhibitory nucleic acids will upregulate gene expression and may specifically bind or specifically hybridize or be complementary to a sequence comprising a motif as described herien within the PRC1-binding RNA that is transcribed from the same strand as a protein coding reference gene. The inhibitory nucleic acid may bind to a region of the PRC1-binding RNA, that originates within or overlaps an intron, exon, intron-exon junction, 5′ UTR, 3′ UTR, a translation initiation region, or a translation termination region of a protein-coding sense-strand of a reference gene (refGene).
In some or any of the embodiments of inhibitory nucleic acids described herein, or processes for designing or syntheisizing them, the inhibitory nucleic acids will upregulate gene expression and may specifically bind or specifically hybridize or be complementary to a sequence comprising a motif as described herien within a PRC1 binding RNA that transcribed from the opposite strand (the antisense-strand) of a protein-coding reference gene.
The inhibitory nucleic acids described herein may be modified, e.g. comprise a modified sugar moiety, a modified internucleoside linkage, a modified nucleotide and/or combinations thereof. In addition, the inhibitory nucleic acids can exhibit one or more of the following properties: do not induce substantial cleavage or degradation of the target RNA; do not cause substantially complete cleavage or degradation of the target RNA; do not activate the RNAse H pathway; do not activate RISC; do not recruit any Argonaute family protein; are not cleaved by Dicer; do not mediate alternative splicing; are not immune stimulatory; are nuclease resistant; have improved cell uptake compared to unmodified oligonucleotides; are not toxic to cells or mammals; may have improved endosomal exit; do interfere with interaction of ncRNA with PRC1, preferably the Ezh2 subunit but optionally the Suz12, Eed, RbAp46/48 subunits or accessory factors such as Jarid2; do decrease histone H3-lysine27 methylation and/or do upregulate gene expression.
In some or any of the embodiments of inhibitory nucleic acids described herein, or processes for designing or synthesizing them, the inhibitory nucleic acids may optionally exclude those that bind DNA of a promoter region, as described in Kuwabara et al., US 2005/0226848 or Li et al., US 2010/0210707 or Corey et al., U.S. Pat. No. 7,709,456 or Mattick et al., WO 2009/124341, or those that bind DNA of a 3′ UTR region, as described in Corey et al., US 2010/0273863.
Inhibitory nucleic acids that are designed to interact with RNA to modulate gene expression are a distinct subset of base sequences from those that are designed to bind a DNA target (e.g., are complementary to the underlying genomic DNA sequence from which the RNA is transcribed).
This application incorporates by reference the entire disclosures of U.S. provisional No. 61/425,174 filed on Dec. 20, 2010, and 61/512,754 filed on Jul. 28, 2011, and International Patent Appliation Nos. PCT/US2011/060493, filed Nov. 12, 2011, and PCT/US2011/065939, filed on Dec. 19, 2011.
In some embodiments, the motif as described herein is a motif as shown in
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
(Panel A) Schematic workflow for dCLIP assay.
(Panel B) Representative dCLIP experiment. Left panel, autoradiography of dCLIP experiment. Right panel, Western blot with anti-CBX7 antibody for streptavidin pull-down samples. Lanes which contained input samples have been omitted for clarity. Red arrows, Biotagged-CBX7 signal. 3E and 6F are two clonal cell lines expressing physiological levels of Biotagged-CBX7. 3E and 6F are used as biological replicates for CBX7 dCLIP-seq libraries.
(Panel C) Representative CBX7 dCLIP and ChIP profiles for selected genes. DHS, DNAseI-hypersensitive sites from Vierstra et al (Vierstra et al., 2014). Orange boxes, LNA ASO cocktails. Red stars, primer pairs for ChIP-qPCR. Green hexagons, primer pairs for FAIRE-qPCR.
(Panel D) Strand-specific enriched peaks (called by “PeakRanger”) from three individual CLIP libraries were pooled and overlapped peaks were merged into longer regions in a strand-specific manner. Length distribution frequency of the enriched CLIP peaks, as well as mean, median, and standard deviation were calculated.
(Panel E) Comparison of metagene profiles for CBX7 dCLIP-seq peaks and CBX7 ChIP-seq peaks. TSS, transcriptional start site. TTS, transcriptional termination site.
(Panel F) Correlation between gene expression levels and CLIP signal. Black, expressed RefSeq genes with reproducible dCLIP signal. Green, genes with a highest CLIP signals. Red, expressed genes with no reproducible CLIP signals.
(Panel A) Bioinformatics pipeline: Schematic workflow of search algorithm for CBX7 biding motifs.
(Panel B) Families of binding motifs identified for CBX7 dCLIP. Groups of motifs arranged into families according to similarity.
(Panel C) Abundance of motif families across different transcript features.
(Panel D) Box plot for FAM-occupancy ratios between number of CBX7 binding sites predicted by motif analysis and confirmed by a dCLIP data to a total number of putative binding sites predicted by motif analysis. Occupancy ratio of 1 indicates that all putative binding sites in a specific gene's genomic feature were validated as CBX7-binding sites based on CLIP-seq analysis. Black line depicts median. CDs, coding DNA sequences (coding exons).
(Panel A) “Nearest neighbor” analysis for motif families. Note the strong tendency for FAM1 motifs to congregate next to each other.
(Panel B) Distance distribution between motif pairs. While certain motif pairs such as FAM3-FAM3 and FAM4-FAM4 occurred in a very close proximity, other motif pairs such as FAM4-FAM2 exhibited much broader spectra of inter-motif distances.
(Panel C) Histogram plotting the number of CBX7 footprints (dCLIP fibers) with indicated adjacent FAM motifs in the same footprint. Note a tendency of motifs to congregate.
(Panel D) The FAM occupancy ratio on CBX7 footprints with a single FAM (upper graph) versus those with clustered FAMs (lower graph). Congregation of motifs in the 3′UTR regions is positively correlated with higher occupancy ratios, suggesting the possibility of cooperative binding.
(Panel A) CBX7 binding motifs bear a significant similarity to the binding motifs of known RNA binding proteins. hPDI motifs were adopted from Xie et al (Xie et al., 2010). RNAcompete motifs were adopted from Ray et al (Ray et al., 2013).
(Panel B) Effect of RNA secondary structure probed in vivo and in vitro on CBX7 RNA binding. IcSHAPE profiles centered on the genomic sequences predicted as carrying CBX7 binding motifs. IcSHAPE analysis was adopted from Spitale et al (Spitale et al., 2015). Purified RNA molecules were subjected to treatment with icSHAPE reagent in vitro or isolated from cells exposed to icSHAPE reagent in cell culture (in vivo). Extent of RNA folding in particular region is determined by its accessibility to modification by icSHAPE reagent with higher icSHAPE signal representing more open structure. RealFAM—depicting sequences predicted by motif analysis and confirmed as actual CBX7 binding sites by dCLIP. PredictedFAM—depicting sequences predicted by motif analysis but lacking a dCLIP signal. Note that despite a marked similarity of the curves between binding and non-binding FAM sequences, average icSHAPE signal in actual binding sequences is significantly higher than in non-binding FAM sequences, reflecting more open overall structures.
(Panel A) Validation of CBX7 dCLIP data for selected transcripts by nRIP-qPCR. Average fold-enrichment over IgG control is plotted, with standard devations (error bars). U1 small nuclear RNA, negative control.
(Panel B) RNA-EMSAs with purified CBX7 (5.6 μM) and in vitro transcribed RNAs demonstrate direct RNA-protein interactions. Concentrations of RNA: 26.5 nM for Dcaf12l1, 62 nM for Dusp9, 37.9 nM for Calm2. Green arrows, unbound RNA probes. Red asterisks, bound and shifted RNA-protein complexes. Blue arrows, LNA-shifted probes. Red arrowheads, supershifted complexes after gene-specific LNA addition.
(Panel C) Representative EMSA showing titration of CBX7 protein against fixed concentration (40 nM) of the 3′UTR fragment of Dcaf12l1. Red arrows, unbound probe. Black arrows, shifted CBX7-RNA complexes of different mobilities.
(Panel D) Competition assay: Shift of 40 nM Dcaf12l1 3′UTR probe by 2.8 uM of CBX7 is competed away by excess cold Dcaf12l1. Red arrows, unbound probe. Black arrows, shifted CBX7-RNA complexes of different mobilities.
(Panel E) Binding curves for CBX7-3′UTR interactions for selected transcripts. Kd's and Hill coefficients determined by fitting datapoints to sigmoidal plots by non-linear regression (STAR Methods).
(Panel F) RNA-EMSAs with purified CBX7 (5.6 μM) and 100 nM of in vitro-transcribed wildtype oligos bearing a single FAM motif versus their mutated versions. Green arrows, unbound RNA probes. Red arrows, bound and shifted RNA-protein complexes.
(Panel G) FAM3 competition EMSA using 2.8 uM CBX7 and 100 nM labelled Nuck1-FAM3 RNA probe. Increasing concentrations of Nucks1-FAM3 cold competitor were added, as indicated. Green arrows, unbound RNA probes. Red arrows, bound and shifted RNA-protein complexes.
(Panel A) LNA administration resulted in gene upregulation, as shown by RT-qPCR. LNA cocktails are used for each transcript (see
(Panel B) ChIP-qPCR for CBX7 localization H2AK119Ub levels after LNA administration as indicated. IgG for control ChIP pulldown. Data presented is average+/−S.D of at least three biological replicates.
(Panel C) FAIRE-qPCR analysis of chromatin compaction in the promoter regions (DNAse-sensitive regions) versus regions corresponding to CBX7 ChIP peaks (DNAse-resistant regions) following LNA treatment. Values were normalized to the β-actin locus (constant). Average+/−S.D of at least three biological replicates shown.
(Panel D) RT-qPCR to determine effect of LNA administration on nascent transcripts (intronic primer pairs) compared to total mRNA (inter-exonic primer pairs) levels. Expression levels are relative to those of cells treated with scrambled LNA. P values determined by t-test from 3 biological replicates.
(Panel E) Dcaf12l1 upregulation after LNA treatment depends on CBX7. Relative Dcaf12l1 expression in Cbx7−/− versus wildtype ES cells. RT-pPCR of nascent (intronic primer pairs) versus processed mRNA (inter-exonic primer pairs) is shown. P values determined by t-test from 3 biological replicates.
(Panel F). Probability density functions for CBX7-bound versus unbound transcripts. Relative FPKM values are determined from RNA-seq of the ES cells in which dCLIP was performed. Note that bound transcripts have a tendency towards higher expression.
(Panel G) Western immunoblot for DCAF12L1 and loading control CTCF protein. Western analysis is quantitative and showed linear response between 2.5-20.0 ul of extract for both proteins. Standard curve for the Western analysis displayed Squared correlation coefficients (R2) of approximately 1.0, suggesting an excellent fit of the curve to observed values.
(Panel H) One example of quantitative Western blot analysis for expression of DCAF12L1 protein following treatment with LNA oligomers. Densitometric analysis was performed and values are normalized to control-LNA-treated samples. Western immunoblots appearing in panels G and H, which were part of images generated by Chemidoc MP Imaging System (as described under STAR Methods) were cropped from their original context and recomposed into separate panels for presentation purposes.
(Panel I) Average of three biological replicates of quantitative Western blot analysis for DCAF12L1 protein. Values are fold-changes in protein signal compared to cells treated with scrambled LNA. P values determined by t-test from 3 biological replicates.
(Panel A) Length distribution frequency of the enriched hCBX7 dCLIP peaks, as well as mean, median, and standard deviation.
(Panel B) Metagene profiles for hCBX7 dCLIP-seq peaks shows enrichment at the 3′ end of mRNAs. TSS, transcriptional start site. TTS, transcriptional termination site.
(Panel C) Representative hCBX7 dCLIP profile. BMI1 analysis was performed on previous GRIP dataset (Ray et al., 2016).
(Panel D) Similarity analysis for families of binding motifs identified for hCBX7 and mCBX7 dCLIP. Groups of motifs arranged according to similarity. Note partial clustering of human and mouse motifs.
(Panel E) Validation of CBX7 dCLIP data for selected transcripts by dCLIP-qPCR. Average fold-enrichment over GFP control is plotted, with standard deviations (error bars). PES1 served as a negative control that did not exhibit significant binding to CBX7. P values were determined by t-test from 3 biological replicates.
(Panel F) hCBX7 motifs bear significant similarity to motifs of known RNA binding proteins. hPDI motifs were adopted from Xie et. al. (Xie et al., 2010). RNAcompete motifs were adopted from Ray et al (Ray et al., 2013).
(Panel A) Representative autoradiography of CLIP experiment using specific antibodies against CBX7 and RYBP. Rabbit IgG and anti-Sox2 antibodies were used as a control. Expected sizes of CBX7 and RYBP proteins were marked by red and green arrowheads respectively. Note a strong background around 40 kDa, which were observed for both CBX7 and RYBP proteins and was not removable up to 1M salt washes as outlined in STAR Methods.
(Panel B) Representative autoradiography of CLIP experiment with anti-HA tag antibody, 6C and 12D are two clonal cell lines expressing physiological levels of HA-tagged-CBX7. Red arrowhead—HA-CBX7 related signal. Note the presence of strong background with anti-HA antibody similar to anti-CBX7 and anti-RYBP CLIP in (A).
(Panel A) Representative dCLIP experiment for RYBP protein. Left panel, autoradiography of dCLIP experiment. Right panel, Western blot with anti-RYBP antibody. Red arrows, Biotagged-RYBP signal. 1A and 3H are two clonal cell lines expressing physiological levels of Biotagged-RYBP.
(Panel B) Representative dCLIP experiment performed simultaneously for CBX7 and RYBP proteins. Note a much weaker signal for RYBP (green asterisk) compared to CBX7 (red asterisk).
(Panel C) Radioactively labeled RNA from dCLIP experiment was extracted out of nitrocellulose membrane and subjected to DNAse or RNAse treatment. Subsequently, denaturing PAGE electrophoresis was performed and resulting gel exposed to phosphoimaging screen. Note that radioactive signal was specifically eliminated by RNAse treatment, with DNAse treatment having no visible effect.
Dusp9 RNA dCLIP profile in (Panel A) and Nucks1 RNA dCLIP profile in (Panel B) were examined to assess differences between two RNA extraction methods—elution directly from beads vs. SDS-PAGE purification, nitrocellulose membrane transfer, with elution of RNA from membrane. Note that in both cases, RYBP presented a weaker signal compared to CBX7.
(Panel A) A scatter plot of gene expression values derived from RNA-seq data of two control lines and two Biotag-CBX7 expressing lines. Note the lack of significant change in overall gene expression. Average FPKM values for endogenous CBX7 expression are 44.38 for control cells versus 46.03 for Biotag-CBX7 cells.
(Panel B) A genome-wide pairwise comparisons of enriched dCLIP peaks over 1 kb bins per three biological replicates (see STAR Methods for details). Note a positive correlation between individual replicates.
(Panel C) Probability density functions for CBX7-bound versus the bulk of expressed transcripts. Relative FPKM values are determined from RNA-seq of the ES cells in which dCLIP was performed. Note that bound transcripts have a tendency towards higher expression.
Calm2 (Top Panel) mRNAs represents high binders (green,
(Panel A) CEAS analysis for CBX7 dCLIP-seq peaks (right pie) with enrichment for each genomic feature shown relative to the overall ES transcriptome profile (left pie).
(Panel B) CEAS analysis for CBX7 ChIP-seq peaks (right pie) with enrichment for each genomic feature shown relative to the overall ES genomic profile (left pie).
(Panel C) Comparison between CBX7 enrichment in distinct genomic features in for CBX7 dCLIP-seq vs CBX7 ChIP-seq.
(Panel D) To assess the relationship between CBX7 binding to RNA vs. DNA, we determined the number of CBX7-bound loci and the number of CBX7-bound transcripts in ES cells. Among the 1,333 transcripts with CBX7 binding sites, only 12% were associated with a CBX7 ChIP peak in the same RefSeq locus, inclusive of promoter region. For bulk expressed transcripts in ES cells, the percentage was significantly greater. To compare the 1,333 transcripts to bulk transcripts, we performed 1,000 rounds of random sampling in each cohort. CBX7 binding to target transcripts is inversely correlated with recruitment of CBX7 to chromatin.
IcSHAPE profiles centered on the genomic sequences predicted as carrying CBX7 binding motifs. IcSHAPE analysis was adopted from Spitale et al (Spitale et al., 2015). Purified RNA molecules were subjected to treatment with icSHAPE reagent in vitro or isolated from cells exposed to icSHAPE reagent in cell culture (in vivo). Extent of RNA folding in particular region is determined by its accessibility to modification by icSHAPE reagent with higher icSHAPE signal representing more open structure. FAM_Sing—depicting single motif per a dCLIP fiber. FAM Mult—depicting multiple motifs per a dCLIP fiber. Note that despite a marked similarity of the curves between single and multiple FAM sequences, average icSHAPE signal in FAM1 and FAM4 is significantly higher than in clustered motif sequences, reflecting more open overall structures.
(Panel A) Western blotting with specific CBX7 antibody to confirm the absence of CBX7 protein in the knockout line. Beta-tubulin served as loading control.
(Panel B) Gene expression analysis Cbx7 knockout cells. RT-qPCR experiments with fold-change expression values in Cbx7−/− cells compared to expression in Cbx7+/+ cells. While 5′ region of Cbx7 mRNA was still expressed, the 3′ region was absent, consistent with the knockout scheme described previously (Cheng et al., 2014). Cbx8 is a positive control. Notably, it is known that CBX8 is upregulated in ES cells when CBX7 is depleted, in order to maintain stem cell self-renewal (Morey et al., 2012; O'Loghlen et al., 2012) (
(Panel A) Schematic domain structure of CBX7 protein. CD depicted chromodomain, which is involved in binding to methylated lysines and RNA. Note addition of 58aa between CD and PC-box domains in human isoform.
(Panel B) Clustal Omega protein sequence alignment between mouse and human CBX7 isoforms. Note that besides addition of 58aa to human CBX7 in the course of evolution, a very high degree of similarity in CD and PC-box domains still persisted.
(Panel C, Panel D) CEAS analysis for CBX7 CLIP-seq peaks (right pie) with enrichment for each genomic feature shown relative to the overall ES transcriptome profile (left pie).
Table 1—Matrix of FBPs presented in
While it is now established that many chromatin-modifying complexes interact with RNA (Magistri et al., 2012), a major obstacle in understanding the regulation and function of such interactions has been the difficulty of identifying specific RNA motifs. For instance, interactions between RNA and Polycomb repressive complexes have served as a leading model in our understanding of RNA-protein interactions at the chromatin interface (Khalil et al., 2009; Zhao et al., 2010), but definitive RNA motifs have yet to be identified. Such motifs could exist in the primary RNA sequence or as specific 3D structures. At present, proposed motifs have come from either in vitro binding studies and have yet to be validated in vivo (Wang et al., 2017), or have been deduced from in vivo binding data that yielded whole transcripts or very large footprints of >1 kb (Beltran et al., 2016; Hendrickson et al., 2016; Kaneko et al., 2014a; Kaneko et al., 2014b; Kaneko et al., 2013).
Revealing binding motifs would require a high-fidelity method of generating RNA-binding footprints at a transcriptome-wide level—footprints that represent the protein-binding site on the RNA. While current methodologies have been excellent for highly abundant proteins, including cytoplasmic RNA-binding proteins (Marchese et al., 2016), nuclear epigenetic complexes have presented a greater challenge because of their chromatin association and (hence) a less soluble nature, Such proteins also tend to exist in multi-subunit complexes, with the potential to have several points of contact within a long transcript. New, highly stringent methods that complement existing techniques are therefore much needed in order to obtain a well-rounded view of specific RNA-protein networks.
A major limitation of most existing methodologies is the reliance on antibodies for specific purification of protein-RNA complexes. The relatively low nanomolar affinities of antibody-antigen methods have direct consequences for antibody-based CLIP methods, as they constrain the stringency of washes during the purification step. Because washes must not disrupt the antibody-antigen interaction, nonspecific RNAs cannot be removed efficiently prior to elution. To solve this problem, here we develop “dCLIP” (denaturing CLIP) and provide proof-of-concept in two systems. We show that dCLIP can be applied to both mouse and human CBX7 protein to reveal specific RNA footprints, from which consensus motifs and functionally relevant binding sites can be deduced. We chose the CBX7 subunit of canonical PRC1 for its biological importance. CBX7 is highly expressed in embryonic stem (ES) cells and plays an essential role in maintaining stem cell pluripotency (Morey et al., 2012; O'Loghlen et al., 2012). Existing studies have hinted that CBX7's RNA-binding activity may be critical to its epigenomic function. It is known that CBX7 localization to chromatin depends on its RNA-binding domain, and one RNA (ANRIL) is known to negatively regulate the INK4a locus through CBX7 (Bernstein et al., 2006; Yap et al., 2010). Below we demonstrate that CBX7 interacts with a large family of messenger RNAs (mRNAs), identify short RNA footprints, and develop a bioinformatic pipeline to uncover specific functional motifs.
Here we have developed the denaturing CLIP (dCLIP) methodology and identified a large RNA interactome for CBX7 in human and mouse cells. Interestingly, CBX7 interacts predominantly with mRNA—a somewhat unexpected finding given that previous work with the BMI1 subunit indicated a preference for noncoding RNA (Ray et al., 2016). However, CBX7 is unlike the other CBX isoforms (CBX2, 4, 6, 8) in that it lacks the signature polynucleosome compaction function (Grau et al., 2011). Indeed, our present analysis indicates that CBX7, when associated with the 3′UTR of mRNAs, does not compact or modulate chromatin. Rather, CBX7 is paradoxically associated with a gene upregulatory function. Thus, the RNA-bound CBX7-containing form of PRC1 may not operate as a repressive complex in the same way as PRC1 complexes that contain compaction-competent CBX isoforms. Together, these observations raise the possibility that the immensely heterogeneous PRC1 complexes (as defined by their distinct subunit compositions) may bind different types of transcripts and serve diverse gene regulatory functions, both positive and negative in nature. Recent work with the EZH2 subunit of PRC2 has also revealed direct positive effects on gene regulation (Zovoilis et al., 2016). Thus, although Polycomb proteins have largely been associated with gene-repressive activities, they can serve gene-upregulatory functions in specific instances.
Our current work provides proof-of-concept for the dCLIP methodology. We suggest that dCLIP can complement a number of existing methods, each offering various pro's and con's. A recent popular method is eCLIP (Van Nostrand et al., 2016), which relies on antibody-antigen interactions for RNA precipitation and can be applied to any endogenous protein with good antibodies. Similarly, nRIP and fRIP can also be applied to a wide range of proteins without the need for construction of affinity tags (Hendrickson et al., 2016; Ray et al., 2016; Zhao et al., 2010). These methods have all provided valuable information regarding nuclear RNA-protein networks. What dCLIP offers is a complementary view with certain advantages. One key feature is the highly stringent conditions that enable separation (through denaturation) of tightly associated protein complexes into individual components, which therefore makes possible the assessment of RNA binding activities of a single component within the complex.
Another major advantage of dCLIP method is that it yields a high signal-to-noise ratio and generates reproducible footprints with median sizes of 171 nt (mouse) and 183 nt (human). The small footprints enabled us to identify consensus binding motifs in the RNA that are concordant between two species. We identified families of motifs that tend to co-cluster in the 3′UTR and that share significant similarities between species (mCBX7, hCBX7). While the overall binding affinity of CBX7 to any one FAM is relatively low (Kd in the micromolar range), our data suggest a potential for positive cooperativity that could considerably boost binding dynamics in cells. First, icSHAPE analysis showed that FAM clustering predisposes to an open RNA conformation in vivo (
The mRNA upregulation following the administration of FAM-targeted LNAs is reminiscent of the RNA-upregulation seen after targeting PRC2-RNA interactions with LNAs against the long noncoding RNA, SMN-AS1, for human spinal muscular atrophy locus (Woo et al., 2017). In the case of SMN-AS1, the LNA blocked PRC2 from binding to the antisense regulatory transcript for SMN2 and thereby prevented the deposition of the repressive H3K27me3 mark. Interestingly, however, chromatin assays suggest that our CBX7-mediated upregulation was not due to reduced levels of the repressive H2AK119Ub mark, nor was it due to increased chromatin accessibility. These findings suggested a co-transcriptional and/or post-transcriptional effect. Indeed, the gene-specific LNAs can increase the steady state levels of both nascent and processed mRNA. Furthermore, Western blot analysis indicated that mRNA upregulation was accompanied by increased protein expression. Potential mechanisms include enhanced transcriptional elongation, RNA splicing, mRNA stability, improved export, or increased translation. One possible hint may come from the paradoxical finding that the mixmer LNAs enhanced (rather than blocked) the CBX7-3′UTR interactions, producing a strong supershift in gel retardation assays. Thus, the binding of CBX7 to the 3′UTR may play a role in transcript stabilization and processing, rather than in chromatin modulation. Notably, our data show that mRNAs bound by CBX7 have a higher probability of expression than transcripts not associated with CBX7 (
Methods of Modulating Gene Expression
The inhibitory nucleic acids and small molecules targeting (e.g., complementary to) a PRC1 binding RNA can be used to modulate gene expression in a cell, e.g., a cancer cell, a stem cell, or other normal cell types for gene or epigenetic therapy. The cells can be in vitro, including ex vivo, or in vivo (e.g., in a subject who has cancer, e.g., a tumor).
In various related aspects, including with respect to the targeting of RNAs by LNA molecule, PRC1-binding RNAs can include endogenous coding and non-coding cellular RNAs, including but not limited to those RNAs that are greater than 60 nt in length, e.g., greater than 100 nt, e.g., greater than 200 nt, have no positive-strand open reading frames greater than 100 amino acids in length, are identified as ncRNAs by experimental evidence, and are distinct from known (smaller) functional-RNA classes (including but not limited to ribosomal, transfer, and small nuclear/nucleolar RNAs, siRNA, piRNA, and miRNA). See, e.g., Lipovich et al., “MacroRNA underdogs in a microRNA world: Evolutionary, regulatory, and biomedical significance of mammalian long non-protein-coding RNA” Biochimica et Biophysica Acta (2010) doi:10.1016/j.bbagrm.2010.10.001; Ponting et al., Cell 136(4):629-641 (2009), Jia et al., RNA 16 (8) (2010) 1478-1487, Dinger et al., Nucleic Acids Res. 37 1685 (2009) D122-D126 (database issue); and references cited therein. ncRNAs have also been referred to as, and can include, long non-coding RNA, long RNA, large RNA, macro RNA, intergenic RNA, and NonCoding Transcripts.
The methods described herein can be used to target both coding and non-coding RNAs. Known classes of RNAs include large intergenic non-coding RNAs (lincRNAs, see, e.g., Guttman et al., Nature. 2009 Mar. 12; 458(7235):223-7. Epub 2009 Feb. 1, which describes over a thousand exemplary highly conserved large non-coding RNAs in mammals; and Khalil et al., PNAS 106(28)11675-11680 (2009)); promoter associated short RNAs (PASRs; see, e.g., Seila et al., Science. 2008 Dec. 19; 322(5909):1849-51. Epub 2008 Dec. 4; Kanhere et al., Molecular Cell 38, 675-688, (2010)); endogenous antisense RNAs (see, e.g., Numata et al., BMC Genomics. 10:392 (2009); Okada et al., Hum Mol Genet. 17(11):1631-40 (2008); Numata et al., Gene 392(1-2):134-141 (2007); and Rosok and Sioud, Nat Biotechnol. 22(1):104-8 (2004)); and RNAs that bind chromatin modifiers such as PRC2 and LSD1 (see, e.g., Tsai et al., Science. 2010 Aug. 6; 329(5992):689-93. Epub 2010 Jul. 8; and Zhao et al., Science. 2008 Oct. 31; 322(5902):750-6).
Exemplary ncRNAs include XIST, TSIX, SRA1, and KCNQ1OT1. The sequences for more than 17,000 long human ncRNAs can be found in the NCode™ Long ncRNA Database on the Invitrogen website. Additional long ncRNAs can be identified using, e.g., manual published literature, Functional Annotation of Mouse (FANTOM3) project, Human Full-length cDNA Annotation Invitational (H-Invitational) project, antisense ncRNAs from cDNA and EST database for mouse and human using a computation pipeline (Zhang et al., Nucl. Acids Res. 35 (suppl 1): D156-D161 (2006); Engstrom et al., PLoS Genet. 2:e47 (2006)), human snoRNAs and scaRNAs derived from snoRNA-LBME-db, RNAz (Washietl et al. 2005), Noncoding RNA Search (Torarinsson, et al. 2006), and EvoFold (Pedersen et al. 2006).
A transcriptome of exemplary PRC1-binding RNAs that can be targeted with the present methods is described in WO 2016/149455, which is incorporated by reference herein in its entirety. See, e.g., Table 1 of WO 2016/149455: Human CBX7-RNA binding sites as determined by denaturing CLIP-seq analysis in Human 293 cells. All coordinates in hg19. The columns (c) correspond to: c1, SEQ ID Number. c2, Chromosome number. c3, Read start position. c4, Read end position. c5, chromosome strand that the transcript is made from (+, top or Watson strand; −, bottom or Crick strand of each chromosome). C6, nearest gene name. c7, gene categories as defined in Example 2.
See also Table 2 of WO 2016/149455: Human LiftOver sequences corresponding to CBX7-RNA binding sites as determined by denaturing CLIP-seq analysis in mouse ES cells shown. All coordinates in hg19. CBX7-binding sites derived from CLIP-seq performed in the mouse ES cell line, 16.7, as shown in Table 3, are translated from mouse mm9 to human hg19 coordinates.
In addition, see Table 3 of WO 2016/149455: Mouse CBX7-RNA binding sites as determined by denaturing CLIP-seq analysis in ES cells derived from Mus musculus. All coordinates in mm9. CLIP-seq performed in the mouse ES cell line, EL 16.7. CBX7 binding sites in the RNA are shown.
Calculations of homology or sequence identity between sequences (the terms are used interchangeably herein) are performed as follows.
To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein nucleic acid “identity” is equivalent to nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
For purposes of the present invention, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
The methods described herein can be used for modulating expression of oncogenes and tumor suppressors in cells, e.g., cancer cells. For example, to decrease expression of an gene (e.g., an oncogene or imprinted gene) in a cell, the methods include introducing into the cell an inhibitory nucleic acid or small molecule that specifically binds, or is complementary, to a PRC1-binding region of an RNA that increases expression of the gene, e.g., an oncogene and/or an imprinted gene, set forth in Tables 1-3. As another example, to increase expression of a gene, e.g., a tumor suppressor, in a cell, the methods include introducing into the cell an inhibitory nucleic acid or small molecule that specifically binds, or is complementary, to a PRC1-binding region of an RNA that decreases expression of the gene, e.g., of a tumor suppressor gene, set forth in Tables 1-3, e.g., in subjects with cancer, e.g., lung adenocarcinoma patients.
In general, the methods include introducing into the cell an inhibitory nucleic acid that specifically binds, or is complementary, to a region of an RNA that modulated expression of a gene as set forth in Tables 1-3.
In preferred embodiments, the inhibitory nucleic acid binds to a region within or near (e.g., within 100, 200, 300, 400, 500, 600, 700, 1K, 2K, or 5K bases of) a PRC1-binding region of the RNA as set forth in Tables 1-3. The empirically-identified “peaks,” which are believed to represent PRC1-binding regions are shown in Table 1, with 500 nts of sequence on each side, so that in some the methods can include targeting a sequence as shown in one of the sequences in Tables 1-3, or a sequence that is between 500 nts from the start and 500 nts of the end of a sequence shown in Tables 1-3, or between 400 nts from the start and 400 nts of the end, 300 nts from the start and 300 nts of the end, between 200 nts from the start and 200 nts of the end, or between 100 nts from the start and 100 nts of the end, of a sequence shown in Tables 1-3. A nucleic acid that binds “specifically” binds primarily to the target RNA or related RNAs to inhibit regulatory function of the RNA but not of other non-target RNAs. The specificity of the nucleic acid interaction thus refers to its function (e.g., inhibiting the PRC1-associated repression of gene expression) rather than its hybridization capacity. Inhibitory nucleic acids may exhibit nonspecific binding to other sites in the genome or other RNAs, without interfering with binding of other regulatory proteins and without causing degradation of the non-specifically-bound RNA. Thus this nonspecific binding does not significantly affect function of other non-target RNAs and results in no significant adverse effects.
These methods can be used to treat a cancer in a subject by administering to the subject a composition (e.g., as described herein) comprising a PRC1-binding fragment of an RNA as described herein and/or an inhibitory nucleic acid that binds to an RNA (e.g., an inhibitory nucleic acid that binds to an RNA that inhibits a tumor suppressor, or cancer-suppressing gene, or imprinted gene and/or other growth-suppressing genes in any of Tables 1-3). Examples of cellular proliferative and/or differentiative disorders include cancer, e.g., carcinoma, sarcoma, metastatic disorders or hematopoietic neoplastic disorders, e.g., leukemias. A metastatic tumor can arise from a multitude of primary tumor types, including but not limited to those of prostate, colon, lung, breast and liver origin.
As used herein, treating includes “prophylactic treatment” which means reducing the incidence of or preventing (or reducing risk of) a sign or symptom of a disease in a patient at risk for the disease, and “therapeutic treatment”, which means reducing signs or symptoms of a disease, reducing progression of a disease, reducing severity of a disease, in a patient diagnosed with the disease. With respect to cancer, treating includes inhibiting tumor cell proliferation, increasing tumor cell death or killing, inhibiting rate of tumor cell growth or metastasis, reducing size of tumors, reducing number of tumors, reducing number of metastases, increasing 1-year or 5-year survival rate.
As used herein, the terms “cancer”, “hyperproliferative” and “neoplastic” refer to cells having the capacity for autonomous growth, i.e., an abnormal state or condition characterized by rapidly proliferating cell growth. Hyperproliferative and neoplastic disease states may be categorized as pathologic, i.e., characterizing or constituting a disease state, or may be categorized as non-pathologic, i.e., a deviation from normal but not associated with a disease state. The term is meant to include all types of cancerous growths or oncogenic processes, metastatic tissues or malignantly transformed cells, tissues, or organs, irrespective of histopathologic type or stage of invasiveness. “Pathologic hyperproliferative” cells occur in disease states characterized by malignant tumor growth. Examples of non-pathologic hyperproliferative cells include proliferation of cells associated with wound repair.
The terms “cancer” or “neoplasms” include malignancies of the various organ systems, such as affecting lung (e.g. small cell, non-small cell, squamous, adenocarcinoma), breast, thyroid, lymphoid, gastrointestinal, genito-urinary tract, kidney, bladder, liver (e.g. hepatocellular cancer), pancreas, ovary, cervix, endometrium, uterine, prostate, brain, as well as adenocarcinomas which include malignancies such as most colon cancers, colorectal cancer, renal-cell carcinoma, prostate cancer and/or testicular tumors, non-small cell carcinoma of the lung, cancer of the small intestine and cancer of the esophagus.
The term “carcinoma” is art recognized and refers to malignancies of epithelial or endocrine tissues including respiratory system carcinomas, gastrointestinal system carcinomas, genitourinary system carcinomas, testicular carcinomas, breast carcinomas, prostatic carcinomas, endocrine system carcinomas, and melanomas. In some embodiments, the disease is renal carcinoma or melanoma. Exemplary carcinomas include those forming from tissue of the cervix, lung, prostate, breast, head and neck, colon and ovary. The term also includes carcinosarcomas, e.g., which include malignant tumors composed of carcinomatous and sarcomatous tissues. An “adenocarcinoma” refers to a carcinoma derived from glandular tissue or in which the tumor cells form recognizable glandular structures.
The term “sarcoma” is art recognized and refers to malignant tumors of mesenchymal derivation.
Additional examples of proliferative disorders include hematopoietic neoplastic disorders. As used herein, the term “hematopoietic neoplastic disorders” includes diseases involving hyperplastic/neoplastic cells of hematopoietic origin, e.g., arising from myeloid, lymphoid or erythroid lineages, or precursor cells thereof. Preferably, the diseases arise from poorly differentiated acute leukemias, e.g., erythroblastic leukemia and acute megakaryoblastic leukemia. Additional exemplary myeloid disorders include, but are not limited to, acute promyeloid leukemia (APML), acute myelogenous leukemia (AML) and chronic myelogenous leukemia (CML) (reviewed in Vaickus, L. (1991) Crit Rev. in Oncol./Hemotol. 11:267-97); lymphoid malignancies include, but are not limited to acute lymphoblastic leukemia (ALL) which includes B-lineage ALL and T-lineage ALL, chronic lymphocytic leukemia (CLL), prolymphocytic leukemia (PLL), hairy cell leukemia (HLL) and Waldenstrom's macroglobulinemia (WM). Additional forms of malignant lymphomas include, but are not limited to non-Hodgkin lymphoma and variants thereof, peripheral T cell lymphomas, adult T cell leukemia/lymphoma (ATL), cutaneous T-cell lymphoma (CTCL), large granular lymphocytic leukemia (LGF), Hodgkin's disease and Reed-Sternberg disease.
In some embodiments, specific cancers that can be treated using the methods described herein include, but are not limited to: breast, lung, prostate, CNS (e.g., glioma), salivary gland, prostate, ovarian, and leukemias (e.g., ALL, CML, or AML). Associations of these genes with a particular cancer are known in the art, e.g., as described in Futreal et al., Nat Rev Cancer. 2004; 4;177-83; and The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website, Bamford et al., Br J Cancer. 2004; 91;355-8; see also Forbes et al., Curr Protoc Hum Genet. 2008; Chapter 10; Unit 10.11, and the COSMIC database, e.g., v.50 (Nov. 30, 2010).
In addition, the methods described herein can be used for modulating (e.g., enhancing or decreasing) pluripotency of a stem cell and to direct stem cells down specific differentiation pathways to make endoderm, mesoderm, ectoderm, and their developmental derivatives. To increase, maintain, or enhance pluripotency, the methods include introducing into the cell an inhibitory nucleic acid that specifically binds to, or is complementary to, a motif as described herein within a PRC1-binding site on a non-coding RNA as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse) or in any of Tables 1-3 of WO 2016/149455. Stem cells useful in the methods described herein include adult stem cells (e.g., adult stem cells obtained from the inner ear, bone marrow, mesenchyme, skin, fat, liver, muscle, or blood of a subject, e.g., the subject to be treated); embryonic stem cells, or stem cells obtained from a placenta or umbilical cord; progenitor cells (e.g., progenitor cells derived from the inner ear, bone marrow, mesenchyme, skin, fat, liver, muscle, or blood); and induced pluripotent stem cells (e.g., iPS cells).
Furthermore, the present methods can be used to treat Systemic Lupus erythematosus (SLE), an autoimmune disease that affects 1.5 million Americans (16,000 new cases per year). Ages 10-50 are the most affected, with more sufferers being female than male. SLE is a multi-organ disease; the effects include arthritis, joint pain & swelling, chest pain, fatigue, general malaise, hair loss, mouth sores, sensitivity to light, skin rash, and swollen lymph nodes. Current treatments include corticosteroids, immunosuppressants, and more recently belimumab (an inhibitor of B cell activating factor).
The causes of SLE are probably multiple, including HLA haplotypes. The interleukin 1 receptor associated kinase 1 (IRAK1) has been implicated in some patients. IRAK1 is X-linked (possibly explaining the female predominance of the disease) and is involved in immune response to foreign antigens and pathogens. IRAK1 has been associated with SLE in both adult and pediatric forms. Overexpression of IRAK1 in animal models causes SLE, and knocking out IRAK1 in mice alleviates symptoms of SLE. See, e.g., Jacob et al., Proc Natl Acad Sci USA. 2009 Apr. 14; 106(15):6256-61. The present methods can include treating a subject with SLE by administering an inhibitory nucleic acid that is complementary to a PRC1-binding region on IRAK1 RNA, e.g., an LNA targeting the 3′ UTR as shown in
The present methods can also be used to treat MECP2 Duplication Syndrome in a subject. This condition is characterized by mental retardation, weak muscle tone, and feeding difficulties, as well as poor/absent speech, seizures, and muscle spasticity. There are more reported cases in males than in females; female carriers may have skewed XCI. There is a 50% mortality rate by age 25 associated with this condition, which accounts for 1-2% of X-linked mental retardation. The real rate of incidence is unknown, as many go undiagnosed. Genetically, the cause is duplication (even triplication) of MECP2 gene. There is no current treatment. The present methods can include treating a subject with MECP2 Duplication Syndrome by administering an inhibitory nucleic acid that is complementary to a motif as described herein within a PRC1-binding region on Mecp2 RNA, e.g., an LNA targeting the 3′UTR of Mecp2 as shown in
In some embodiments, the methods described herein include administering a composition, e.g., a sterile composition, comprising an inhibitory nucleic acid that is complementary to a motif as described herein within a PRC1-binding region on an RNA, e.g., as set forth in SEQ ID NOs:1 to 5893 (human), 5894 to 17415 (human), and 17416 to 36368 (mouse) or in any of Tables 1-3 of WO 2016/149455. Inhibitory nucleic acids for use in practicing the methods described herein can be an antisense or small interfering RNA, including but not limited to an shRNA or siRNA. In some embodiments, the inhibitory nucleic acid is a modified nucleic acid polymer (e.g., a locked nucleic acid (LNA) molecule).
Inhibitory nucleic acids have been employed as therapeutic moieties in the treatment of disease states in animals, including humans. Inhibitory nucleic acids can be useful therapeutic modalities that can be configured to be useful in treatment regimes for the treatment of cells, tissues and animals, especially humans.
For therapeutics, an animal, preferably a human, suspected of having cancer is treated by administering an RNA or inhibitory nucleic acid in accordance with this invention. For example, in one non-limiting embodiment, the methods comprise the step of administering to the animal in need of treatment, a therapeutically effective amount of an RNA or inhibitory nucleic acid as described herein.
Inhibitory Nucleic Acids
Inhibitory nucleic acids useful in the present methods and compositions include antisense oligonucleotides, ribozymes, external guide sequence (EGS) oligonucleotides, siRNA compounds, single- or double-stranded RNA interference (RNAi) compounds such as siRNA compounds, molecules comprising modified bases, locked nucleic acid molecules (LNA molecules), antagomirs, peptide nucleic acid molecules (PNA molecules), and other oligomeric compounds or oligonucleotide mimetics which hybridize to at least a portion of the target nucleic acid and modulate its function. In some embodiments, the inhibitory nucleic acids include antisense RNA, antisense DNA, chimeric antisense oligonucleotides, antisense oligonucleotides comprising modified linkages, interference RNA (RNAi), short interfering RNA (siRNA); a micro, interfering RNA (miRNA); a small, temporal RNA (stRNA); or a short, hairpin RNA (shRNA); small RNA-induced gene activation (RNAa); small activating RNAs (saRNAs), or combinations thereof. See, e.g., WO 2010040112.
In the present methods, the inhibitory nucleic acids are preferably designed to target a motif as described herein within a region of the RNA that binds to PRC1, e.g., as described in WO 2016/149455 (see Tables 1-3 thereof). The motifs are shown in
These “inhibitory” nucleic acids are believed to work by inhibiting the interaction between the RNA and PRC1, and as described herein can be used to modulate expression of a gene.
In some embodiments, the inhibitory nucleic acids are 10 to 50, 13 to 50, or 13 to 30 nucleotides in length. One having ordinary skill in the art will appreciate that this embodies oligonucleotides having antisense (complementary) portions of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length, or any range therewithin. It is understood that non-complementary bases may be included in such inhibitory nucleic acids; for example, an inhibitory nucleic acid 30 nucleotides in length may have a portion of 15 bases that is complementary to the targeted RNA. In some embodiments, the oligonucleotides are 15 nucleotides in length. In some embodiments, the antisense or oligonucleotide compounds of the invention are 12 or 13 to 30 nucleotides in length. One having ordinary skill in the art will appreciate that this embodies inhibitory nucleic acids having antisense (complementary) portions of 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length, or any range therewithin.
Preferably the inhibitory nucleic acid comprises one or more modifications comprising: a modified sugar moiety, and/or a modified internucleoside linkage, and/or a modified nucleotide and/or combinations thereof. It is not necessary for all positions in a given oligonucleotide to be uniformly modified, and in fact more than one of the modifications described herein may be incorporated in a single oligonucleotide or even at within a single nucleoside within an oligonucleotide.
In some embodiments, the inhibitory nucleic acids are chimeric oligonucleotides that contain two or more chemically distinct regions, each made up of at least one nucleotide. These oligonucleotides typically contain at least one region of modified nucleotides that confers one or more beneficial properties (such as, for example, increased nuclease resistance, increased uptake into cells, increased binding affinity for the target) and a region that is a substrate for enzymes capable of cleaving RNA:DNA or RNA:RNA hybrids. Chimeric inhibitory nucleic acids of the invention may be formed as composite structures of two or more oligonucleotides, modified oligonucleotides, oligonucleosides and/or oligonucleotide mimetics as described above. Such compounds have also been referred to in the art as hybrids or gapmers. Representative United States patents that teach the preparation of such hybrid structures comprise, but are not limited to, U.S. Pat. Nos. 5,013,830; 5,149,797; 5, 220,007; 5,256,775; 5,366,878; 5,403,711; 5,491,133; 5,565,350; 5,623,065; 5,652,355; 5,652,356; and 5,700,922, each of which is herein incorporated by reference.
In some embodiments, the inhibitory nucleic acid comprises at least one nucleotide modified at the 2′ position of the sugar, most preferably a 2′-O-alkyl, 2′-O-alkyl-O-alkyl or 2′-fluoro-modified nucleotide. In other preferred embodiments, RNA modifications include 2′-fluoro, 2′-amino and 2′ O-methyl modifications on the ribose of pyrimidines, abasic residues or an inverted base at the 3′ end of the RNA. Such modifications are routinely incorporated into oligonucleotides and these oligonucleotides have been shown to have a higher Tm (i.e., higher target binding affinity) than; 2′-deoxyoligonucleotides against a given target.
A number of nucleotide and nucleoside modifications have been shown to make the oligonucleotide into which they are incorporated more resistant to nuclease digestion than the native oligodeoxynucleotide; these modified oligos survive intact for a longer time than unmodified oligonucleotides. Specific examples of modified oligonucleotides include those comprising modified backbones, for example, phosphorothioates, phosphotriesters, methyl phosphonates, short chain alkyl or cycloalkyl intersugar linkages or short chain heteroatomic or heterocyclic intersugar linkages. Most preferred are oligonucleotides with phosphorothioate backbones and those with heteroatom backbones, particularly CH2—NH—O—CH2, CH, ˜N(CH3)˜O˜CH2 (known as a methylene(methylimino) or MMI backbone], CH2—O—N(CH3)—CH2, CH2—N(CH3)—N(CH3)—CH2 and O—N(CH3)—CH2—CH2 backbones, wherein the native phosphodiester backbone is represented as O—P—O—CH,); amide backbones (see De Mesmaeker et al. Ace. Chem. Res. 1995, 28:366-374); morpholino backbone structures (see Summerton and Weller, U.S. Pat. No. 5,034,506); peptide nucleic acid (PNA) backbone (wherein the phosphodiester backbone of the oligonucleotide is replaced with a polyamide backbone, the nucleotides being bound directly or indirectly to the aza nitrogen atoms of the polyamide backbone, see Nielsen et al., Science 1991, 254, 1497). Phosphorus-containing linkages include, but are not limited to, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates comprising 3′alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates comprising 3′-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3′-5′ to 5′-3′ or 2′-5′ to 5′-2′; see U.S. Pat. Nos. 3,687,808; 4,469,863; 4,476,301; 5,023,243; 5, 177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 5,321,131; 5,399,676; 5,405,939; 5,453,496; 5,455, 233; 5,466,677; 5,476,925; 5,519,126; 5,536,821; 5,541,306; 5,550,111; 5,563, 253; 5,571,799; 5,587,361; and 5,625,050.
Morpholino-based oligomeric compounds are described in Dwaine A. Braasch and David R. Corey, Biochemistry, 2002, 41(14), 4503-4510); Genesis, volume 30, issue 3, 2001; Heasman, J., Dev. Biol., 2002, 243, 209-214; Nasevicius et al., Nat. Genet., 2000, 26, 216-220; Lacerra et al., Proc. Natl. Acad. Sci., 2000, 97, 9591-9596; and U.S. Pat. No. 5,034,506, issued Jul. 23, 1991. In some embodiments, the morpholino-based oligomeric compound is a phosphorodiamidate morpholino oligomer (PMO) (e.g., as described in Iverson, Curr. Opin. Mol. Ther., 3:235-238, 2001; and Wang et al., J. Gene Med., 12:354-364, 2010; the disclosures of which are incorporated herein by reference in their entireties).
Cyclohexenyl nucleic acid oligonucleotide mimetics are described in Wang et al., J. Am. Chem. Soc., 2000, 122, 8595-8602.
Additional modifications are possible as described in WO 2016/149455.
The inhibitory nucleic acids useful in the present methods are sufficiently complementary to the target RNA, e.g., hybridize sufficiently well and with sufficient biological functional specificity, to give the desired effect. “Complementary” refers to the capacity for pairing, through base stacking and specific hydrogen bonding, between two sequences comprising naturally or non-naturally occurring (e.g., modified as described above) bases (nucleosides) or analogs thereof. For example, if a base at one position of an inhibitory nucleic acid is capable of hydrogen bonding with a base at the corresponding position of an RNA, then the bases are considered to be complementary to each other at that position. 100% complementarity is not required. As noted above, inhibitory nucleic acids can comprise universal bases, or inert abasic spacers that provide no positive or negative contribution to hydrogen bonding. Base pairings may include both canonical Watson-Crick base pairing and non-Watson-Crick base pairing (e.g., Wobble base pairing and Hoogsteen base pairing). It is understood that for complementary base pairings, adenosine-type bases (A) are complementary to thymidine-type bases (T) or uracil-type bases (U), that cytosine-type bases (C) are complementary to guanosine-type bases (G), and that universal bases such as such as 3-nitropyrrole or 5-nitroindole can hybridize to and are considered complementary to any A, C, U, or T. Nichols et al., Nature, 1994; 369:492-493 and Loakes et al., Nucleic Acids Res., 1994; 22:4039-4043. Inosine (I) has also been considered in the art to be a universal base and is considered complementary to any A, C, U, or T. See Watkins and SantaLucia, Nucl. Acids Research, 2005; 33 (19): 6258-6267.
In some embodiments, the location on a target RNA to which an inhibitory nucleic acids hybridizes is defined as a region to which a protein binding partner binds, as shown in Tables 1-3. Routine methods can be used to design an inhibitory nucleic acid that binds to this sequence with sufficient specificity. In some embodiments, the methods include using bioinformatics methods known in the art to identify regions of secondary structure, e.g., one, two, or more stem-loop structures, or pseudoknots, and selecting those regions to target with an inhibitory nucleic acid. For example, methods of designing oligonucleotides similar to the inhibitory nucleic acids described herein, and various options for modified chemistries or formats, are exemplified in Lennox and Behlke, Gene Therapy (2011) 18: 1111-1120, which is incorporated herein by reference in its entirety, with the understanding that the present disclosure does not target miRNA ‘seed regions’.
While the specific sequences of certain exemplary target segments are set forth herein, one of skill in the art will recognize that these serve to illustrate and describe particular embodiments within the scope of the present invention. Additional target segments are readily identifiable by one having ordinary skill in the art in view of this disclosure. Target segments 5-500 nucleotides in length comprising a stretch of at least five (5) consecutive nucleotides within the protein binding region, or immediately adjacent thereto, are considered to be suitable for targeting as well. Target segments can include sequences that comprise at least the 5 consecutive nucleotides from the 5′-terminus of one of the protein binding regions (the remaining nucleotides being a consecutive stretch of the same RNA beginning immediately upstream of the 5′-terminus of the binding segment and continuing until the inhibitory nucleic acid contains about 5 to about 100 nucleotides). Similarly preferred target segments are represented by RNA sequences that comprise at least the 5 consecutive nucleotides from the 3′-terminus of one of the illustrative preferred target segments (the remaining nucleotides being a consecutive stretch of the same RNA beginning immediately downstream of the 3′-terminus of the target segment and continuing until the inhibitory nucleic acid contains about 5 to about 100 nucleotides). One having skill in the art armed with the sequences provided herein will be able, without undue experimentation, to identify further preferred protein binding regions to target with complementary inhibitory nucleic acids.
In the context of the present disclosure, hybridization means base stacking and hydrogen bonding, which may be Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary nucleoside or nucleotide bases. For example, adenine and thymine are complementary nucleobases which pair through the formation of hydrogen bonds. Complementary, as the term is used in the art, refers to the capacity for precise pairing between two nucleotides. For example, if a nucleotide at a certain position of an oligonucleotide is capable of hydrogen bonding with a nucleotide at the same position of a RNA molecule, then the inhibitory nucleic acid and the RNA are considered to be complementary to each other at that position. The inhibitory nucleic acids and the RNA are complementary to each other when a sufficient number of corresponding positions in each molecule are occupied by nucleotides that can hydrogen bond with each other through their bases. Thus, “specifically hybridizable” and “complementary” are terms which are used to indicate a sufficient degree of complementarity or precise pairing such that stable and specific binding occurs between the inhibitory nucleic acid and the RNA target. For example, if a base at one position of an inhibitory nucleic acid is capable of hydrogen bonding with a base at the corresponding position of a RNA, then the bases are considered to be complementary to each other at that position. 100% complementarity is not required.
It is understood in the art that a complementary nucleic acid sequence need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable. A complementary nucleic acid sequence for purposes of the present methods is specifically hybridizable when binding of the sequence to the target RNA molecule interferes with the normal function of the target RNA to cause a loss of activity (e.g., inhibiting PRC1-associated repression with consequent up-regulation of gene expression) and there is a sufficient degree of complementarity to avoid non-specific binding of the sequence to non-target RNA sequences under conditions in which avoidance of the non-specific binding is desired, e.g., under physiological conditions in the case of in vivo assays or therapeutic treatment, and in the case of in vitro assays, under conditions in which the assays are performed under suitable conditions of stringency. For example, stringent salt concentration will ordinarily be less than about 750 mM NaCl and 75 mM trisodium citrate, preferably less than about 500 mM NaCl and 50 mM trisodium citrate, and more preferably less than about 250 mM NaCl and 25 mM trisodium citrate. Low stringency hybridization can be obtained in the absence of organic solvent, e.g., formamide, while high stringency hybridization can be obtained in the presence of at least about 35% formamide, and more preferably at least about 50% formamide. Stringent temperature conditions will ordinarily include temperatures of at least about 30° C., more preferably of at least about 37° C., and most preferably of at least about 42° C. Varying additional parameters, such as hybridization time, the concentration of detergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are well known to those skilled in the art. Various levels of stringency are accomplished by combining these various conditions as needed. In a preferred embodiment, hybridization will occur at 30° C. in 750 mM NaCl, 75 mM trisodium citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 37° C. in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100 μg/ml denatured salmon sperm DNA (ssDNA). In a most preferred embodiment, hybridization will occur at 42° C. in 250 mM NaCl, 25 mM trisodium citrate, 1% SDS, 50% formamide, and 200 μg/ml ssDNA. Useful variations on these conditions will be readily apparent to those skilled in the art.
For most applications, washing steps that follow hybridization will also vary in stringency. Wash stringency conditions can be defined by salt concentration and by temperature. As above, wash stringency can be increased by decreasing salt concentration or by increasing temperature. For example, stringent salt concentration for the wash steps will preferably be less than about 30 mM NaCl and 3 mM trisodium citrate, and most preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate. Stringent temperature conditions for the wash steps will ordinarily include a temperature of at least about 25° C., more preferably of at least about 42° C., and even more preferably of at least about 68° C. In a preferred embodiment, wash steps will occur at 25° C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 42° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 68° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additional variations on these conditions will be readily apparent to those skilled in the art. Hybridization techniques are well known to those skilled in the art and are described, for example, in Benton and Davis (Science 196:180, 1977); Grunstein and Hogness (Proc. Natl. Acad. Sci., USA 72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology, Wiley Interscience, New York, 2001); Berger and Kimmel (Guide to Molecular Cloning Techniques, 1987, Academic Press, New York); and Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York.
In general, the inhibitory nucleic acids useful in the methods described herein have at least 80% sequence complementarity to a target region within the target nucleic acid, e.g., 90%, 95%, or 100% sequence complementarity to the target region within an RNA. For example, an antisense compound in which 18 of 20 nucleobases of the antisense oligonucleotide are complementary, and would therefore specifically hybridize, to a target region would represent 90 percent complementarity. Percent complementarity of an inhibitory nucleic acid with a region of a target nucleic acid can be determined routinely using basic local alignment search tools (BLAST programs) (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656). Antisense and other compounds of the invention that hybridize to an RNA are identified through routine experimentation. In general the inhibitory nucleic acids must retain specificity for their target, i.e., either do not directly bind to, or do not directly significantly affect expression levels of, transcripts other than the intended target.
Target-specific effects, with corresponding target-specific functional biological effects, are possible even when the inhibitory nucleic acid exhibits non-specific binding to a large number of non-target RNAs. For example, short 8 base long inhibitory nucleic acids that are fully complementary to a RNA may have multiple 100% matches to hundreds of sequences in the genome, yet may produce target-specific effects, e.g. upregulation of a specific target gene through inhibition of PRC1 activity. 8-base inhibitory nucleic acids have been reported to prevent exon skipping with with a high degree of specificity and reduced off-target effect. See Singh et al., RNA Biol., 2009; 6(3): 341-350. 8-base inhibitory nucleic acids have been reported to interfere with miRNA activity without significant off-target effects. See Obad et al., Nature Genetics, 2011; 43: 371-378.
For further disclosure regarding inhibitory nucleic acids, please see WO 2016/149455 as well as US2010/0317718 (antisense oligos); US2010/0249052 (double-stranded ribonucleic acid (dsRNA)); US2009/0181914 and US2010/0234451 (LNA molecules); US2007/0191294 (siRNA analogues); US2008/0249039 (modified siRNA); and WO2010/129746 and WO2010/040112 (inhibitory nucleic acids).
Antisense
In some embodiments, the inhibitory nucleic acids are antisense oligonucleotides. Antisense oligonucleotides are typically designed to block expression of a DNA or RNA target by binding to the target and halting expression at the level of transcription, translation, or splicing. Antisense oligonucleotides of the present invention are complementary nucleic acid sequences designed to hybridize under stringent conditions to an RNA in vitro, and are expected to inhibit the activity of PRC1 in vivo. Thus, oligonucleotides are chosen that are sufficiently complementary to the target, i.e., that hybridize sufficiently well and with sufficient biological functional specificity, to give the desired effect.
Modified Base, Including Locked Nucleic Acids (LNAs)
In some embodiments, the inhibitory nucleic acids used in the methods described herein comprise one or more modified bonds or bases. Modified bases include phosphorothioate, methylphosphonate, peptide nucleic acids, or locked nucleic acids (LNAs). Preferably, the modified nucleotides are part of locked nucleic acid molecules, including [alpha]-L-LNAs. LNAs include ribonucleic acid analogues wherein the ribose ring is “locked” by a methylene bridge between the 2′-oxgygen and the 4′-carbon—i.e., oligonucleotides containing at least one LNA monomer, that is, one 2′-O,4′-C-methylene-β-D-ribofuranosyl nucleotide. LNA bases form standard Watson-Crick base pairs but the locked configuration increases the rate and stability of the basepairing reaction (Jepsen et al., Oligonucleotides, 14, 130-146 (2004)). LNAs also have increased affinity to base pair with RNA as compared to DNA. These properties render LNAs especially useful as probes for fluorescence in situ hybridization (FISH) and comparative genomic hybridization, as knockdown tools for miRNAs, and as antisense oligonucleotides to target mRNAs or other RNAs, e.g., RNAs as described herein.
The modified base/LNA molecules can include molecules comprising 10-30, e.g., 12-24, e.g., 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in each strand, wherein one of the strands is substantially identical, e.g., at least 80% (or more, e.g., 85%, 90%, 95%, or 100%) identical, e.g., having 3, 2, 1, or 0 mismatched nucleotide(s), to a target region in the RNA. The modified base/LNA molecules can be chemically synthesized using methods known in the art.
The modified base/LNA molecules can be designed using any method known in the art; a number of algorithms are known, and are commercially available (e.g., on the internet, for example at exiqon.com). See, e.g., You et al., Nuc. Acids. Res. 34:e60 (2006); McTigue et al., Biochemistry 43:5388-405 (2004); and Levin et al., Nuc. Acids. Res. 34:e142 (2006). For example, “gene walk” methods, similar to those used to design antisense oligos, can be used to optimize the inhibitory activity of a modified base/LNA molecule; for example, a series of oligonucleotides of 10-30 nucleotides spanning the length of a target RNA can be prepared, followed by testing for activity. Optionally, gaps, e.g., of 5-10 nucleotides or more, can be left between the LNAs to reduce the number of oligonucleotides synthesized and tested. GC content is preferably between about 30-60%. General guidelines for designing modified base/LNA molecules are known in the art; for example, LNA sequences will bind very tightly to other LNA sequences, so it is preferable to avoid significant complementarity within an LNA molecule. Contiguous runs of three or more Gs or Cs, or more than four LNA residues, should be avoided where possible (for example, it may not be possible with very short (e.g., about 9-10 nt) oligonucleotides). In some embodiments, the LNAs are xylo-LNAs.
For additional information regarding LNA molecules see U.S. Pat. Nos. 6,268,490; 6,734,291; 6,770,748; 6,794,499; 7,034,133; 7,053,207; 7,060,809; 7,084,125; and 7,572,582; and U.S. Pre-Grant Pub. Nos. 20100267018; 20100261175; and 20100035968; Koshkin et al. Tetrahedron 54, 3607-3630 (1998); Obika et al. Tetrahedron Lett. 39, 5401-5404 (1998); Jensen et al., Oligonucleotides 14:130-146 (2004); Kauppinen et al., Drug Disc. Today 2(3):287-290 (2005); and Ponting et al., Cell 136(4):629-641 (2009), and references cited therein.
As demonstrated herein and previously (see, e.g., WO 2012/065143 and WO 2012/087983, incorporated herein by reference), LNA molecules can be used as a valuable tool to manipulate and aid analysis of RNAs. Advantages offered by an LNA molecule-based system are the relatively low costs, easy delivery, and rapid action. While other inhibitory nucleic acids may exhibit effects after longer periods of time, LNA molecules exhibit effects that are more rapid, e.g., a comparatively early onset of activity, are fully reversible after a recovery period following the synthesis of new RNA, and occur without causing substantial or substantially complete RNA cleavage or degradation. One or more of these design properties may be desired properties of the inhibitory nucleic acids of the invention. Additionally, LNA molecules make possible the systematic targeting of domains within much longer nuclear transcripts. Although a PNA-based system has been described earlier, the effects on Xi were apparent only after 24 hours (Beletskii et al., Proc Natl Acad Sci USA. 2001; 98:9215-9220). The LNA technology enables high-throughput screens for functional analysis of non-coding RNAs and also provides a novel tool to manipulate chromatin states in vivo for therapeutic applications.
In various related aspects, the methods described herein include using LNA molecules to target RNAs for a number of uses, including as a research tool to probe the function of a specific RNA, e.g., in vitro or in vivo. The methods include selecting one or more desired RNAs, designing one or more LNA molecules that target the RNA, providing the designed LNA molecule, and administering the LNA molecule to a cell or animal. The methods can optionally include selecting a region of the RNA and designing one or more LNA molecules that target that region of the RNA.
Aberrant imprinted gene expression is implicated in several diseases including Long QT syndrome, Beckwith-Wiedemann, Prader-Willi, and Angelman syndromes, as well as behavioral disorders and carcinogenesis (see, e.g., Falls et al., Am. J. Pathol. 154:635-647 (1999); Lalande, Annu Rev Genet 30:173-195 (1996); Hall Annu Rev Med. 48:35-44 (1997)). LNA molecules can be created to treat such imprinted diseases. As one example, the long QT Syndrome can be caused by a K+ gated Calcium-channel encoded by Kcnq1. This gene is regulated by its antisense counterpart, the long noncoding RNA, Kcnq1ot1 (Pandey et al., Mol Cell. 2008 Oct. 24; 32(2):232-46). Disease arises when Kcnq1ot1 is aberrantly expressed. LNA molecules can be created to downregulate Kcnq1ot1, thereby restoring expression of Kcnq1. As another example, LNA molecules could inhibit RNA cofactors for polycomb complex chromatin modifiers to reverse the imprinted defect.
From a commercial and clinical perspective, the timepoints between about 1 to 24 hours potentially define a window for epigenetic reprogramming. The advantage of the LNA system is that it works quickly, with a defined half-life, and is therefore reversible upon degradation of LNAs, at the same time that it provides a discrete timeframe during which epigenetic manipulations can be made. By targeting nuclear long RNAs, LNA molecules or similar polymers, e.g., xylo-LNAs, might be utilized to manipulate the chromatin state of cells in culture or in vivo, by transiently eliminating the regulatory RNA and associated proteins long enough to alter the underlying locus for therapeutic purposes. In particular, LNA molecules or similar polymers that specifically bind to, or are complementary to, PRC1-binding RNA can prevent recruitment of PRC1 to a specific chromosomal locus, in a gene-specific fashion.
LNA molecules might also be administered in vivo to treat other human diseases, such as but not limited to cancer, neurological disorders, infections, inflammation, and myotonic dystrophy. For example, LNA molecules might be delivered to tumor cells to downregulate the biologic activity of a growth-promoting or oncogenic long nuclear RNA (e.g., Gtl2 or MALAT1 (Luo et al., Hepatology. 44(4):1012-24 (2006)), a RNA associated with metastasis and is frequently upregulated in cancers). Repressive RNAs downregulating tumor suppressors can also be targeted by LNA molecules to promote reexpression. For example, expression of the INK4b/ARF/INK4a tumor suppressor locus is controlled by Polycomb group proteins including PRC1 and PRC1 and repressed by the antisense noncoding RNA ANRIL (Yap et al., Mol Cell. 2010 Jun. 11; 38(5):662-74). PRC1-binding regions described herein in ANRIL can be targeted by LNA molecules to promote reexpression of the INK4b/ARF/INK4a tumor suppressor. Some ncRNAs may be positive regulators of oncogenes. Such “activating ncRNAs” have been described recently (e.g., Jpx (Tian et al., Cell. 143(3):390-403 (2010) and others (Ørom et al., Cell. 143(1):46-58 (2010)). Therefore, LNA molecules could be directed at these activating ncRNAs to downregulate oncogenes. LNA molecules could also be delivered to inflammatory cells to downregulate regulatory ncRNA that modulate the inflammatory or immune response. (e.g., LincRNA-Cox2, see Guttman et al., Nature. 458(7235):223-7. Epub 2009 Feb. 1 (2009)).
In still other related aspects, the LNA molecules targeting PRC1-binding regions in RNAs described herein can be used to create animal or cell models of conditions associated with altered gene expression (e.g., as a result of altered epigenetics).
The methods described herein may also be useful for creating animal or cell models of other conditions associated with aberrant imprinted gene expression, e.g., as noted above.
In various related aspects, the results described herein demonstrate the utility of LNA molecules for targeting RNA, for example, to transiently disrupt chromatin for purposes of reprogramming chromatin states ex vivo. Because LNA molecules stably displace RNA for hours and chromatin does not rebuild for hours thereafter, LNA molecules create a window of opportunity to manipulate the epigenetic state of specific loci ex vivo, e.g., for reprogramming of hiPS and hESC prior to stem cell therapy. For example, Gtl2 controls expression of DLK1, which modulates the pluripotency of iPS cells. Low Gtl2 and high DLK1 is correlated with increased pluripotency and stability in human iPS cells. Thus, LNA molecules targeting Gtl2 can be used to inhibit differentiation and increase pluripotency and stability of iPS cells.
See also PCT/US11/60493, which is incorporated by reference herein in its entirety.
Interfering RNA, Including siRNA/shRNA
In some embodiments, the inhibitory nucleic acid sequence that is complementary to an RNA can be an interfering RNA, including but not limited to a small interfering RNA (“siRNA”) or a small hairpin RNA (“shRNA”). Methods for constructing interfering RNAs are well known in the art. For example, the interfering RNA can be assembled from two separate oligonucleotides, where one strand is the sense strand and the other is the antisense strand, wherein the antisense and sense strands are self-complementary (i.e., each strand comprises nucleotide sequence that is complementary to nucleotide sequence in the other strand; such as where the antisense strand and sense strand form a duplex or double stranded structure); the antisense strand comprises nucleotide sequence that is complementary to a nucleotide sequence in a target nucleic acid molecule or a portion thereof (i.e., an undesired gene) and the sense strand comprises nucleotide sequence corresponding to the target nucleic acid sequence or a portion thereof Alternatively, interfering RNA is assembled from a single oligonucleotide, where the self-complementary sense and antisense regions are linked by means of nucleic acid based or non-nucleic acid-based linker(s). The interfering RNA can be a polynucleotide with a duplex, asymmetric duplex, hairpin or asymmetric hairpin secondary structure, having self-complementary sense and antisense regions, wherein the antisense region comprises a nucleotide sequence that is complementary to nucleotide sequence in a separate target nucleic acid molecule or a portion thereof and the sense region having nucleotide sequence corresponding to the target nucleic acid sequence or a portion thereof. The interfering can be a circular single-stranded polynucleotide having two or more loop structures and a stem comprising self-complementary sense and antisense regions, wherein the antisense region comprises nucleotide sequence that is complementary to nucleotide sequence in a target nucleic acid molecule or a portion thereof and the sense region having nucleotide sequence corresponding to the target nucleic acid sequence or a portion thereof, and wherein the circular polynucleotide can be processed either in vivo or in vitro to generate an active siRNA molecule capable of mediating RNA interference.
In some embodiments, the interfering RNA coding region encodes a self-complementary RNA molecule having a sense region, an antisense region and a loop region. Such an RNA molecule when expressed desirably forms a “hairpin” structure, and is referred to herein as an “shRNA.” The loop region is generally between about 2 and about 10 nucleotides in length. In some embodiments, the loop region is from about 6 to about 9 nucleotides in length. In some embodiments, the sense region and the antisense region are between about 15 and about 20 nucleotides in length. Following post-transcriptional processing, the small hairpin RNA is converted into a siRNA by a cleavage event mediated by the enzyme Dicer, which is a member of the RNase III family. The siRNA is then capable of inhibiting the expression of a gene with which it shares homology. For details, see Brummelkamp et al., Science 296:550-553, (2002); Lee et al, Nature Biotechnol., 20, 500-505, (2002); Miyagishi and Taira, Nature Biotechnol 20:497-500, (2002); Paddison et al. Genes & Dev. 16:948-958, (2002); Paul, Nature Biotechnol, 20, 505-508, (2002); Sui, Proc. Natl. Acad. Sd. USA, 99(6), 5515-5520, (2002); Yu et al. Proc Natl Acad Sci USA 99:6047-6052, (2002).
The target RNA cleavage reaction guided by siRNAs is highly sequence specific. In general, siRNA containing a nucleotide sequences identical to a portion of the target nucleic acid are preferred for inhibition. However, 100% sequence identity between the siRNA and the target gene is not required to practice the present invention. Thus the invention has the advantage of being able to tolerate sequence variations that might be expected due to genetic mutation, strain polymorphism, or evolutionary divergence. For example, siRNA sequences with insertions, deletions, and single point mutations relative to the target sequence have also been found to be effective for inhibition. Alternatively, siRNA sequences with nucleotide analog substitutions or insertions can be effective for inhibition. In general the siRNAs must retain specificity for their target, i.e., must not directly bind to, or directly significantly affect expression levels of, transcripts other than the intended target.
Ribozymes
In some embodiments, the inhibitory nucleic acids are ribozymes. Trans-cleaving enzymatic nucleic acid molecules can also be used; they have shown promise as therapeutic agents for human disease (Usman & McSwiggen, 1995 Ann. Rep. Med. Chem. 30, 285-294; Christoffersen and Marr, 1995 J. Med. Chem. 38, 2023-2037). Enzymatic nucleic acid molecules can be designed to cleave specific RNA targets within the background of cellular RNA. Such a cleavage event renders the RNA non-functional.
In general, enzymatic nucleic acids with RNA cleaving activity act by first binding to a target RNA. Such binding occurs through the target binding portion of a enzymatic nucleic acid which is held in close proximity to an enzymatic portion of the molecule that acts to cleave the target RNA. Thus, the enzymatic nucleic acid first recognizes and then binds a target RNA through complementary base pairing, and once bound to the correct site, acts enzymatically to cut the target RNA. Strategic cleavage of such a target RNA will destroy its ability to direct synthesis of an encoded protein. After an enzymatic nucleic acid has bound and cleaved its RNA target, it is released from that RNA to search for another target and can repeatedly bind and cleave new targets.
Several approaches such as in vitro selection (evolution) strategies (Orgel, 1979, Proc. R. Soc. London, B 205, 435) have been used to evolve new nucleic acid catalysts capable of catalyzing a variety of reactions, such as cleavage and ligation of phosphodiester linkages and amide linkages, (Joyce, 1989, Gene, 82, 83-87; Beaudry et al., 1992, Science 257, 635-641; Joyce, 1992, Scientific American 267, 90-97; Breaker et al, 1994, TIBTECH 12, 268; Bartel et al, 1993, Science 261 :1411-1418; Szostak, 1993, TIBS 17, 89-93; Kumar et al, 1995, FASEB J., 9, 1183; Breaker, 1996, Curr. Op. Biotech., 1, 442). The development of ribozymes that are optimal for catalytic activity would contribute significantly to any strategy that employs RNA-cleaving ribozymes for the purpose of regulating gene expression. The hammerhead ribozyme, for example, functions with a catalytic rate (kcat) of about 1 min−1 in the presence of saturating (10 MM) concentrations of Mg2+ cofactor. An artificial “RNA ligase” ribozyme has been shown to catalyze the corresponding self-modification reaction with a rate of about 100 min−1. In addition, it is known that certain modified hammerhead ribozymes that have substrate binding arms made of DNA catalyze RNA cleavage with multiple turn-over rates that approach 100 min−1.
Making and Using Inhibitory Nucleic Acids
The nucleic acid sequences used to practice the methods described herein, whether RNA, cDNA, genomic DNA, vectors, viruses or hybrids thereof, can be isolated from a variety of sources, genetically engineered, amplified, and/or expressed/generated recombinantly. If desired, nucleic acid sequences of the invention can be inserted into delivery vectors and expressed from transcription units within the vectors. The recombinant vectors can be DNA plasmids or viral vectors. Generation of the vector construct can be accomplished using any suitable genetic engineering techniques well known in the art, including, without limitation, the standard techniques of PCR, oligonucleotide synthesis, restriction endonuclease digestion, ligation, transformation, plasmid purification, and DNA sequencing, for example as described in Sambrook et al. Molecular Cloning: A Laboratory Manual. (1989)), Coffin et al. (Retroviruses. (1997)) and “RNA Viruses: A Practical Approach” (Alan J. Cann, Ed., Oxford University Press, (2000)).
Preferably, inhibitory nucleic acids of the invention are synthesized chemically. Nucleic acid sequences used to practice this invention can be synthesized in vitro by well-known chemical synthesis techniques, as described in, e.g., Adams (1983) J. Am. Chem. Soc. 105:661; Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett. 22:1859; U.S. Pat. No. 4,458,066; WO/2008/043753 and WO/2008/049085, and the refences cited therein.
Nucleic acid sequences of the invention can be stabilized against nucleolytic degradation such as by the incorporation of a modification, e.g., a nucleotide modification. For example, nucleic acid sequences of the invention includes a phosphorothioate at least the first, second, or third internucleotide linkage at the 5′ or 3′ end of the nucleotide sequence. As another example, the nucleic acid sequence can include a 2′-modified nucleotide, e.g., a 2′-deoxy, 2′-deoxy-2′-fluoro, 2′-O-methyl, 2′-O-methoxyethyl (2′-O-MOE), 2′-O-aminopropyl (2′-O-AP), 2′-O-dimethylaminoethyl (2′-O-DMAOE), 2′-O-dimethylaminopropyl (2′-O-DMAP), 2′-O-dimethylaminoethyloxyethyl (2′-O-DMAEOE), or 2′-O—N-methylacetamido (2′-O—NMA). As another example, the nucleic acid sequence can include at least one 2′-O-methyl-modified nucleotide, and in some embodiments, all of the nucleotides include a 2′-O-methyl modification. In some embodiments, the nucleic acids are “locked,” i.e., comprise nucleic acid analogues in which the ribose ring is “locked” by a methylene bridge connecting the 2′-O atom and the 4′-C atom (see, e.g., Kaupinnen et al., Drug Disc. Today 2(3):287-290 (2005); Koshkin et al., J. Am. Chem. Soc., 120(50):13252-13253 (1998)). For additional modifications see US 20100004320, US 20090298916, and US 20090143326.
It is understood that any of the modified chemistries or formats of inhibitory nucleic acids described herein can be combined with each other, and that one, two, three, four, five, or more different types of modifications can be included within the same molecule.
Techniques for the manipulation of nucleic acids used to practice this invention, such as, e.g., subcloning, labeling probes (e.g., random-primer labeling using Klenow polymerase, nick translation, amplification), sequencing, hybridization and the like are well described in the scientific and patent literature, see, e.g., Sambrook et al., Molecular Cloning; A Laboratory Manual 3d ed. (2001); Current Protocols in Molecular Biology, Ausubel et al., eds. (John Wiley & Sons, Inc., New York 2010); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); Laboratory Techniques In Biochemistry And Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, Tijssen, ed. Elsevier, N.Y. (1993).
Pharmaceutical Compositions
The methods described herein can include the administration of pharmaceutical compositions and formulations comprising inhibitory nucleic acid sequences designed to target an RNA.
In some embodiments, the compositions are formulated with a pharmaceutically acceptable carrier. The pharmaceutical compositions and formulations can be administered parenterally, topically, orally or by local administration, such as by aerosol or transdermally. The pharmaceutical compositions can be formulated in any way and can be administered in a variety of unit dosage forms depending upon the condition or disease and the degree of illness, the general medical condition of each patient, the resulting preferred method of administration and the like. Details on techniques for formulation and administration of pharmaceuticals are well described in the scientific and patent literature, see, e.g., Remington: The Science and Practice of Pharmacy, 21st ed., 2005.
The inhibitory nucleic acids can be administered alone or as a component of a pharmaceutical formulation (composition). The compounds may be formulated for administration, in any convenient way for use in human or veterinary medicine. Wetting agents, emulsifiers and lubricants, such as sodium lauryl sulfate and magnesium stearate, as well as coloring agents, release agents, coating agents, sweetening, flavoring and perfuming agents, preservatives and antioxidants can also be present in the compositions.
Formulations of the compositions of the invention include those suitable for intradermal, inhalation, oral/nasal, topical, parenteral, rectal, and/or intravaginal administration. The formulations may conveniently be presented in unit dosage form and may be prepared by any methods well known in the art of pharmacy. The amount of active ingredient (e.g., nucleic acid sequences of this invention) which can be combined with a carrier material to produce a single dosage form will vary depending upon the host being treated, the particular mode of administration, e.g., intradermal or inhalation. The amount of active ingredient which can be combined with a carrier material to produce a single dosage form will generally be that amount of the compound which produces a therapeutic effect, e.g., an antigen specific T cell or humoral response.
Pharmaceutical formulations of this invention can be prepared according to any method known to the art for the manufacture of pharmaceuticals. Such drugs can contain sweetening agents, flavoring agents, coloring agents and preserving agents. A formulation can be admixtured with nontoxic pharmaceutically acceptable excipients which are suitable for manufacture. Formulations may comprise one or more diluents, emulsifiers, preservatives, buffers, excipients, etc. and may be provided in such forms as liquids, powders, emulsions, lyophilized powders, sprays, creams, lotions, controlled release formulations, tablets, pills, gels, on patches, in implants, etc.
Pharmaceutical formulations for oral administration can be formulated using pharmaceutically acceptable carriers well known in the art in appropriate and suitable dosages. Such carriers enable the pharmaceuticals to be formulated in unit dosage forms as tablets, pills, powder, dragees, capsules, liquids, lozenges, gels, syrups, slurries, suspensions, etc., suitable for ingestion by the patient. Pharmaceutical preparations for oral use can be formulated as a solid excipient, optionally grinding a resulting mixture, and processing the mixture of granules, after adding suitable additional compounds, if desired, to obtain tablets or dragee cores. Suitable solid excipients are carbohydrate or protein fillers include, e.g., sugars, including lactose, sucrose, mannitol, or sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, hydroxypropylmethyl-cellulose, or sodium carboxy-methylcellulose; and gums including arabic and tragacanth; and proteins, e.g., gelatin and collagen. Disintegrating or solubilizing agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such as sodium alginate. Push-fit capsules can contain active agents mixed with a filler or binders such as lactose or starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, the active agents can be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycol with or without stabilizers.
Aqueous suspensions can contain an active agent (e.g., nucleic acid sequences of the invention) in admixture with excipients suitable for the manufacture of aqueous suspensions, e.g., for aqueous intradermal injections. Such excipients include a suspending agent, such as sodium carboxymethylcellulose, methylcellulose, hydroxypropylmethylcellulose, sodium alginate, polyvinylpyrrolidone, gum tragacanth and gum acacia, and dispersing or wetting agents such as a naturally occurring phosphatide (e.g., lecithin), a condensation product of an alkylene oxide with a fatty acid (e.g., polyoxyethylene stearate), a condensation product of ethylene oxide with a long chain aliphatic alcohol (e.g., heptadecaethylene oxycetanol), a condensation product of ethylene oxide with a partial ester derived from a fatty acid and a hexitol (e.g., polyoxyethylene sorbitol mono-oleate), or a condensation product of ethylene oxide with a partial ester derived from fatty acid and a hexitol anhydride (e.g., polyoxyethylene sorbitan mono-oleate). The aqueous suspension can also contain one or more preservatives such as ethyl or n-propyl p-hydroxybenzoate, one or more coloring agents, one or more flavoring agents and one or more sweetening agents, such as sucrose, aspartame or saccharin. Formulations can be adjusted for osmolarity.
In some embodiments, oil-based pharmaceuticals are used for administration of nucleic acid sequences of the invention. Oil-based suspensions can be formulated by suspending an active agent in a vegetable oil, such as arachis oil, olive oil, sesame oil or coconut oil, or in a mineral oil such as liquid paraffin; or a mixture of these. See e.g., U.S. Pat. No. 5,716,928 describing using essential oils or essential oil components for increasing bioavailability and reducing inter- and intra-individual variability of orally administered hydrophobic pharmaceutical compounds (see also U.S. Pat. No. 5,858,401). The oil suspensions can contain a thickening agent, such as beeswax, hard paraffin or cetyl alcohol. Sweetening agents can be added to provide a palatable oral preparation, such as glycerol, sorbitol or sucrose. These formulations can be preserved by the addition of an antioxidant such as ascorbic acid. As an example of an injectable oil vehicle, see Minto (1997) J. Pharmacol. Exp. Ther. 281:93-102.
Pharmaceutical formulations can also be in the form of oil-in-water emulsions. The oily phase can be a vegetable oil or a mineral oil, described above, or a mixture of these. Suitable emulsifying agents include naturally-occurring gums, such as gum acacia and gum tragacanth, naturally occurring phosphatides, such as soybean lecithin, esters or partial esters derived from fatty acids and hexitol anhydrides, such as sorbitan mono-oleate, and condensation products of these partial esters with ethylene oxide, such as polyoxyethylene sorbitan mono-oleate. The emulsion can also contain sweetening agents and flavoring agents, as in the formulation of syrups and elixirs. Such formulations can also contain a demulcent, a preservative, or a coloring agent. In alternative embodiments, these injectable oil-in-water emulsions of the invention comprise a paraffin oil, a sorbitan monooleate, an ethoxylated sorbitan monooleate and/or an ethoxylated sorbitan trioleate.
The pharmaceutical compounds can also be administered by in intranasal, intraocular and intravaginal routes including suppositories, insufflation, powders and aerosol formulations (for examples of steroid inhalants, see e.g., Rohatagi (1995) J. Clin. Pharmacol. 35:1187-1193; Tjwa (1995) Ann. Allergy Asthma Immunol. 75:107-111). Suppositories formulations can be prepared by mixing the drug with a suitable non-irritating excipient which is solid at ordinary temperatures but liquid at body temperatures and will therefore melt in the body to release the drug. Such materials are cocoa butter and polyethylene glycols.
In some embodiments, the pharmaceutical compounds can be delivered transdermally, by a topical route, formulated as applicator sticks, solutions, suspensions, emulsions, gels, creams, ointments, pastes, jellies, paints, powders, and aerosols.
In some embodiments, the pharmaceutical compounds can also be delivered as microspheres for slow release in the body. For example, microspheres can be administered via intradermal injection of drug which slowly release subcutaneously; see Rao (1995) J. Biomater Sci. Polym. Ed. 7:623-645; as biodegradable and injectable gel formulations, see, e.g., Gao (1995) Pharm. Res. 12:857-863 (1995); or, as microspheres for oral administration, see, e.g., Eyles (1997) J. Pharm. Pharmacol. 49:669-674.
In some embodiments, the pharmaceutical compounds can be parenterally administered, such as by intravenous (IV) administration or administration into a body cavity or lumen of an organ. These formulations can comprise a solution of active agent dissolved in a pharmaceutically acceptable carrier. Acceptable vehicles and solvents that can be employed are water and Ringer's solution, an isotonic sodium chloride. In addition, sterile fixed oils can be employed as a solvent or suspending medium. For this purpose any bland fixed oil can be employed including synthetic mono- or diglycerides. In addition, fatty acids such as oleic acid can likewise be used in the preparation of injectables. These solutions are sterile and generally free of undesirable matter. These formulations may be sterilized by conventional, well known sterilization techniques. The formulations may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents, e.g., sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like. The concentration of active agent in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight, and the like, in accordance with the particular mode of administration selected and the patient's needs. For IV administration, the formulation can be a sterile injectable preparation, such as a sterile injectable aqueous or oleaginous suspension. This suspension can be formulated using those suitable dispersing or wetting agents and suspending agents. The sterile injectable preparation can also be a suspension in a nontoxic parenterally-acceptable diluent or solvent, such as a solution of 1,3-butanediol. The administration can be by bolus or continuous infusion (e.g., substantially uninterrupted introduction into a blood vessel for a specified period of time).
In some embodiments, the pharmaceutical compounds and formulations can be lyophilized. Stable lyophilized formulations comprising an inhibitory nucleic acid can be made by lyophilizing a solution comprising a pharmaceutical of the invention and a bulking agent, e.g., mannitol, trehalose, raffinose, and sucrose or mixtures thereof. A process for preparing a stable lyophilized formulation can include lyophilizing a solution about 2.5 mg/mL protein, about 15 mg/mL sucrose, about 19 mg/mL NaCl, and a sodium citrate buffer having a pH greater than 5.5 but less than 6.5. See, e.g., U.S. 20040028670.
The compositions and formulations can be delivered by the use of liposomes. By using liposomes, particularly where the liposome surface carries ligands specific for target cells, or are otherwise preferentially directed to a specific organ, one can focus the delivery of the active agent into target cells in vivo. See, e.g., U.S. Pat. Nos. 6,063,400; 6,007,839; Al-Muhammed (1996) J. Microencapsul. 13:293-306; Chonn (1995) Curr. Opin. Biotechnol. 6:698-708; Ostro (1989) Am. J. Hosp. Pharm. 46:1576-1587. As used in the present invention, the term “liposome” means a vesicle composed of amphiphilic lipids arranged in a bilayer or bilayers. Liposomes are unilamellar or multilamellar vesicles that have a membrane formed from a lipophilic material and an aqueous interior that contains the composition to be delivered. Cationic liposomes are positively charged liposomes that are believed to interact with negatively charged DNA molecules to form a stable complex. Liposomes that are pH-sensitive or negatively-charged are believed to entrap DNA rather than complex with it. Both cationic and noncationic liposomes have been used to deliver DNA to cells.
Liposomes can also include “sterically stabilized” liposomes, i.e., liposomes comprising one or more specialized lipids. When incorporated into liposomes, these specialized lipids result in liposomes with enhanced circulation lifetimes relative to liposomes lacking such specialized lipids. Examples of sterically stabilized liposomes are those in which part of the vesicle-forming lipid portion of the liposome comprises one or more glycolipids or is derivatized with one or more hydrophilic polymers, such as a polyethylene glycol (PEG) moiety. Liposomes and their uses are further described in U.S. Pat. No. 6,287,860.
The formulations of the invention can be administered for prophylactic and/or therapeutic treatments. In some embodiments, for therapeutic applications, compositions are administered to a subject who is need of reduced triglyceride levels, or who is at risk of or has a disorder described herein, in an amount sufficient to cure, alleviate or partially arrest the clinical manifestations of the disorder or its complications; this can be called a therapeutically effective amount. For example, in some embodiments, pharmaceutical compositions of the invention are administered in an amount sufficient to decrease serum levels of triglycerides in the subject.
The amount of pharmaceutical composition adequate to accomplish this is a therapeutically effective dose. The dosage schedule and amounts effective for this use, i.e., the dosing regimen, will depend upon a variety of factors, including the stage of the disease or condition, the severity of the disease or condition, the general state of the patient's health, the patient's physical status, age and the like. In calculating the dosage regimen for a patient, the mode of administration also is taken into consideration.
The dosage regimen also takes into consideration pharmacokinetics parameters well known in the art, i.e., the active agents' rate of absorption, bioavailability, metabolism, clearance, and the like (see, e.g., Hidalgo-Aragones (1996) J. Steroid Biochem. Mol. Biol. 58:611-617; Groning (1996) Pharmazie 51:337-341; Fotherby (1996) Contraception 54:59-69; Johnson (1995) J. Pharm. Sci. 84:1144-1146; Rohatagi (1995) Pharmazie 50:610-613; Brophy (1983) Eur. J. Clin. Pharmacol. 24:103-108; Remington: The Science and Practice of Pharmacy, 21st ed., 2005). The state of the art allows the clinician to determine the dosage regimen for each individual patient, active agent and disease or condition treated. Guidelines provided for similar compositions used as pharmaceuticals can be used as guidance to determine the dosage regiment, i.e., dose schedule and dosage levels, administered practicing the methods of the invention are correct and appropriate.
Single or multiple administrations of formulations can be given depending on for example: the dosage and frequency as required and tolerated by the patient, the degree and amount of therapeutic effect generated after each administration (e.g., effect on tumor size or growth), and the like. The formulations should provide a sufficient quantity of active agent to effectively treat, prevent or ameliorate conditions, diseases or symptoms.
In alternative embodiments, pharmaceutical formulations for oral administration are in a daily amount of between about 1 to 100 or more mg per kilogram of body weight per day. Lower dosages can be used, in contrast to administration orally, into the blood stream, into a body cavity or into a lumen of an organ. Substantially higher dosages can be used in topical or oral administration or administering by powders, spray or inhalation. Actual methods for preparing parenterally or non-parenterally administrable formulations will be known or apparent to those skilled in the art and are described in more detail in such publications as Remington: The Science and Practice of Pharmacy, 21st ed., 2005.
Various studies have reported successful mammalian dosing using complementary nucleic acid sequences. For example, Esau C., et al., (2006) Cell Metabolism, 3(2):87-98 reported dosing of normal mice with intraperitoneal doses of miR-122 antisense oligonucleotide ranging from 12.5 to 75 mg/kg twice weekly for 4 weeks. The mice appeared healthy and normal at the end of treatment, with no loss of body weight or reduced food intake. Plasma transaminase levels were in the normal range (AST ¾ 45, ALT ¾ 35) for all doses with the exception of the 75 mg/kg dose of miR-122 ASO, which showed a very mild increase in ALT and AST levels. They concluded that 50 mg/kg was an effective, non-toxic dose. Another study by Krützfeldt J., et al., (2005) Nature 438, 685-689, injected anatgomirs to silence miR-122 in mice using a total dose of 80, 160 or 240 mg per kg body weight. The highest dose resulted in a complete loss of miR-122 signal. In yet another study, locked nucleic acid molecules (“LNA molecules”) were successfully applied in primates to silence miR-122. Elmen J., et al., (2008) Nature 452, 896-899, report that efficient silencing of miR-122 was achieved in primates by three doses of 10 mg kg-1 LNA-antimiR, leading to a long-lasting and reversible decrease in total plasma cholesterol without any evidence for LNA-associated toxicities or histopathological changes in the study animals.
In some embodiments, the methods described herein can include co-administration with other drugs or pharmaceuticals, e.g., compositions for providing cholesterol homeostasis. For example, the inhibitory nucleic acids can be co-administered with drugs for treating or reducing risk of a disorder described herein.
The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
Materials and Methods
The following materials and methods were used in the Examples set forth below.
Experimental Model and Subject Details
EL 16.7 (129/Cas) mouse female embryonic stem cells were described previously (Lee and Lu, 1999). Cbx7+/+ and Cbx7−/− mouse embryonic stem cell lines were generated by Dr. Bo Cheng in the laboratory of Dr. T. Kerppola (University of Michigan) as described in (Cheng et al., 2014), and kindly provided by Dr. Xiaojun Ren (University of Colorado). All stem cell lines were routinely maintained in 500 U/ml LIF, DME, and 15% FCS on gamma-irradiated mouse embryonic fibroblasts feeder layer. For differentiation, 7×105 cells were plated on pre-gelatinized 150 mm TC plates and grown in monolayer for 7 days in DME+15% FBS without LIF. HEK293 cells were routinely maintained in DME+10% FBS.
Stable Transfection
The following plasmid vectors were used for stable transfection into EL16.7 ES cells:
pCAGGS—mouse CBX7-Flag-HA-IRES-Puro-GFP plasmid was used for stable expression of HA-tagged CBX7 for ChIP-seq experiments. pCAGGS-IRES-Puro-GFP plasmid was a kind gift from Dr. Mitinori Saitou (Kyoto University, Japan).
pEF1aBirAV5His plasmid was utilized for stable expression of V5-His-tagged BirA bacterial biotinylase in EL16.7 ES cells.
pEF1a-Flag-biotag-PGKpuro-mCBX7 and pEF1a-Flag-biotag-PGKpuro-mRYBP plasmids were employed for stable transfection of mouse CBX7 and RYBP carrying biotinylation tag in EL16.7 cells expressing BirA biotinylase.
pCAG-Avi-GFP-hCBX7-IRES-Puro plasmid was employed for stable transfection of human CBX7 carrying biotinylation tag in HEK293 cells expressing BirA biotinylase.
pEF1aBirAV5His and pEF1-Flag-Biotag plasmid vectors were a kind gift from Dr. Stuart Orkin (Harvard Medical School) and have been described previously by Kim et al (Kim et al., 2009).
pCAG-Avi-GFP-IRES-Puro plasmid was a kind gift from Dr. Mitinori Saitou, Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University.
To create mouse ES cells with stable expression of recombinant proteins, EL 16.7 mouse ES cells were grown to 70% confluence on embryonic feeder layer in T75 flasks. Cells were trypsinized and 2×107 cells electroporated with 30 μg of linearized vector in PBS using GenePulser II (Bio-Rad). Positive cells were selected using growth media supplemented with 1 μg/ml Puromycin (Gibco) alone or in combination with 300 μg/ml G418. To create HEK293 cells with stable expression of recombinant proteins, cells were grown to 70% confluence in T75 flasks. Cells were trypsinized and 1×107 cells electroporated with 15 μg of linearized vector in PBS using GenePulser II (Bio-Rad). Positive cells were selected using growth media supplemented with 1 μg/ml Puromycin (Gibco) alone or in combination with 300 μg/ml G418. Stable transfection and expression of recombinant proteins was confirmed by PCR genotyping and Western blotting with specific antibodies.
CLIP Method—Small Scale
The conventional CLIP method was performed as described previously (Jeon and Lee, 2011). Cells were grown to full confluence in 15 cm tissue culture dishes. Medium was then aspirated and cells were washed with 10 ml ice-cold phosphate-buffered saline (PBS) (containing 8.1 mM Na2HPO4, 1.45 mM KH2PO4, 137 mM NaCl, 2.7 mM KCL, pH 7.4). To covalently cross-link protein-RNA complexes in vivo, ice-cold PBS (5 ml) was added to cells, lid was removed and cells were exposed to 400 mJ/cm2 irradiation in a wavelength of 254 nm. After adding 5 ml of ice-cold PBS, cross-linked cells were scraped and collected into 16 ml tubes. Cells were pelleted by 5 min centrifugation (1,000×G) in 4° C. Supernatant was removed and cell pellets were shock-frozen in liquid nitrogen and stored in −80° C. Protein G Dynabeads (Life Technologies) were utilized for pre-clearing and immunoprecipitation. Beads were thoroughly resuspended, and a volume of beads corresponding to 20 μl beads×number of samples+5 μl was transferred into a clean 1.5 ml tube. Beads were then captured on magnetic separator. Pre-clearing beads were washed 3 times with 1 ml lysis buffer (PBS supplemented with 1 mM MgCl2, 0.1 mM CaCl2, 0.5% Nonidet-P-40, and 0.5% Sodium Deoxycholate). Beads were resuspended in 100 μl lysis buffer per 20 μl beads and 100 μl portions transferred into 1.5 ml tubes. Beads for immunoprecipitation were washed 3 times with 1 ml lysis buffer (PBS supplemented with 1 mM MgCl2, 0.1 mM CaCl2, 0.5% Nonidet-P-40, and 0.5% Sodium Deoxycholate)+0.5% BSA. Beads were resuspended in 100 μl lysis buffer per 20 μl beads and 100 μl portions transferred into 1.5 ml tubes. 400 μl lysis buffer+0.5% BSA+5 μg of specific antibody were added and beads incubated 4 hrs in 4° C. on a rotatory wheel. To prepare cell lysate, cell pellets (1 pellet for each cell type) were resuspended in 1.25 ml of ice-cold lysis buffer supplemented with 1 tablet of Complete-mini EDTA-free tablet (Roche), 40 u/ml protector RNAse inhibitor (Roche), 1 mM Dithiothreitol (DTT), and transferred into 2 ml tube followed by 25 min incubation in 4° C. on rotatory wheel. After a brief spin down, 25 μl (50 U) of TurboDNAse (Life Technologies) were added to each tube. The entire content of each tube was then split equally between four 1.5 ml tubes. Two dilutions of RNAse I (Life Technologies) in lysis buffer containing additives were prepared: 10-fold (10 u/ml) and 100-fold (1 u/ml). Per each of the cell lines, three samples were prepared in growing concentrations of RNAse I: (1) undiluted RNAse I (×1) (2) 10-fold diluted, and (3) 100-fold diluted. Volume of RNAse I solution corresponded to 1/100th of total sample volume. The final dilutions of RNAse I were correspondingly 100-fold, 1,000-fold and 10,000-fold. In parallel, a fourth sample, untreated with RNAse I, was prepared and used as immunoprecipitation control for Western Blotting. Samples were thoroughly mixed, incubated for 15 min in a 37° C. water bath, and were gently mixed every 5 min. After a brief spin-down, each sample received 6 μl (12 U) of SuperRNAseIN (Life Technologies) 10-fold diluted in lysis buffer. Sodium dodecyl sulfate (SDS) concentrations per each sample were further brought up to 0.1% following by addition of 1/100th volume of 10% SDS. After 10 min 21,130×G centrifugation in 4° C., supernatant was transferred into a clean 1.5 ml tube and sample was centrifuged for another 10 min 21,130×G in 4° C. to remove remaining cell debris. 1.5 ml tubes supplemented with 100 μl of pre-clearing beads were put on magnetic separator and lysis buffer was removed. The entire supernatant from the previous step was placed on the beads and samples were further incubated 1 hr in 4° C. on a rotatory wheel. After capturing pre-clearing beads on magnetic separator, pre-cleared lysate samples were transferred into 1.5 ml tubes with protein G-antibody complex and incubated for 16 hrs in 4° C. on a rotatory wheel. Samples were placed on a magnetic separator and supernatant was removed. Samples were washed twice with 1 ml high-salt buffer (PBS supplemented with 750 mM NaCl, 1% Nonidet-P-40, 0.5% NaDeoxycholate, and 0.1% SDS) and three times with 1 ml low-salt buffer (PBS supplemented with 150 mM NaCl, 1% Nonidet-P-40, 0.5% NaDeoxycholate, and 0.1% SDS) for 5 min at 4° C. on a rotatory wheel per every wash, following by supernatant removal on a magnetic separator. IP control samples received 40 μl of NuPage 3× LDS sample buffer (Life Technologies), or alternatively, 40 μl SDS sample buffer ×1 (80 mM Tris pH 6.8, 2% SDS, 100 mM DTT, and 10% Glycerol). In the remaining RNAse-treated samples, beads were resuspended in 400 μl 1× DNAse buffer (Life Technologies) and incubated for 5 min at 4° C. on a rotatory wheel followed by subsequent supernatant removal on a magnetic separator. Beads were resuspended in 40 μl of DNAse mix (1× DNAse buffer, 0.1 u/μl Turbo DNAse, 0.1 u/μl SUPERasin (Life Technologies), 100-fold diluted EDTA-free protease inhibitors mix (Sigma), 0.4 u/μl Protector RNAse inhibitor) and incubated at 37° C. for 30 min. After a brief spin down, beads were placed on a magnetic rack and supernatant was removed. One wash with 0.5 ml low-salt washing buffer was performed for 5 min at 4° C. placed on a rotatory wheel. Supernatant was removed on a magnetic separator. For phosphorylation of 5′ ends, supernatant was removed on a magnetic separator and beads were washed once in 1 ml of PNK buffer (50 mM Tris pH 7.4, 10 mM MgCl2, 5 mM DTT, 0.5% Nonidet-P-40) for 5 min at 4° C. placed on a rotatory wheel. Beads were then resuspended in 20 μl of PNK mix (per sample, 20 μl PNK buffer, 1 μl 32P-gamma-ATP, 0.5 μl (5 U) T4 Polynucleotide Kinase (PNK) (NEB)) and incubated at 37° C. for 20 min. Beads were captured and supernatant was removed on a magnetic separator. Beads were instantly washed 3 times with 0.5 ml of ice-cold PNK washing buffer (50 mM Tris pH 7.4, 150 mM NaCl, 0.5% Nonidet-P-40, 10 mM EDTA). After the last wash, beads were resuspended in 40 μl of NuPage 3× LDS sample buffer (Life Technologies), or alternatively, in 40 μl of SDS sample buffer ×1 (80 mM Tris pH 6.8, 2% SDS, 100 mM DTT, 10% Glycerol). 3× LDS samples were incubated at 80° C. for 10 min. Samples with 1× SDS sample buffer were incubated at 95° C. for 5 min. Samples were then loaded on 1 mm NuPage 4%-12% Bis-Tris gradient gel (Life Technologies), and electrophoresis on 200V was performed using NuPage MOPS/SDS running buffer (50 mM Tris base, 50 mM MOPS, 0.1% SDS, 1 mM EDTA, pH 7.7). Transfer into nitrocellulose membranes was performed for 1 hr under 140 mA at 4° C., soaking in NuPage 1× transfer buffer (25 mM Bicine, 25 mM Bis-Tris (free base) 1 mM EDTA pH 7.2) supplemented with 12% methanol. Membrane was wrapped in saran plastic wrap and exposed to phosphoimager screen.
Denaturing CLIP Method—Small Scale
To perform the denaturing CLIP method, we raised by DNA vectors transfection, following by antibiotic selection, several cell lines expressing (1) a bacterial biotin ligase BirA vector carrying a neomycin-resistance marker, along with (2) a puromycin resistance expression vector of mouse CBX7 fused to a biotinylation tag. An empty biotinylation vector was alternatively transfected for generating control cell lines. Cells were grown to full confluence in 15 cm tissue culture dishes. Medium was then aspirated and cells were washed with 10 ml ice-cold phosphate-buffered saline (PBS) (containing 8.1 mM Na2HPO4, 1.45 mM KH2PO4, 137 mM NaCl, 2.7 mM KCL, pH 7.4). To covalently cross-link protein-RNA complexes in vivo, ice-cold PBS (5 ml) was added to cells, lid was removed and cells were exposed to 400 mJ/cm2 irradiation in a wavelength of 254 nm. Day 7 differentiated cells grown in monolayer as well as HEK293 cells were exposed to 150 mJ/cm2 irradiation in a wavelength of 254 nm. After adding 5 ml of ice-cold PBS, cross-linked cells were scraped and collected into 16 ml tubes. Cells were pelleted by 5 min centrifugation (1,000×G) in 4° C. Supernatant was removed and cell pellets were shock-frozen in liquid nitrogen and stored in −80° C. For performing protein pull-down, two types of magnetic beads were employed: (1) Protein G Dynabeads (for pre-clearing), and (2) Dynabeads® MyOne™ Streptavidin C1 (for biotinylated protein pull-down)—both bead types from Life Technologies. Beads were thoroughly resuspended, and a volume of beads corresponding to 20 μl beads×number of samples+5 μl was transferred into a clean 1.5 ml tube. Beads were then captured on magnetic separator. Pre-clearing beads were washed 3 times with 1 ml lysis buffer (PBS supplemented with 1 mM MgCl2, 0.1 mM CaCl2, 0.5% Nonidet-P-40, and 0.5% Sodium Deoxycholate). Streptavidin beads were washed 3 times with 1 ml lysis buffer containing 0.5% Bovine serum albumin (BSA). Beads were resuspended in 100 μl lysis buffer per 20 μl beads and 100 μl portions transferred into 1.5 ml tubes. Cell lysate was prepared the following manner: Cell pellets (1 pellet for each cell type) were resuspended in 1.25 ml of ice-cold lysis buffer (supplemented with 1 tablet of Complete-mini EDTA-free tablet (Roche), 40 u/ml protector RNAse inhibitor (Roche), 1 mM Dithiothreitol (DTT)), and transferred into 2 ml tube following by 25 min incubation in 4° C. on rotatory wheel. After a brief spin down, 25 μl (50 U) of TurboDNAse (Life Technologies) were added to each tube. The entire content of each tube was then split equally between four 1.5 ml tubes. Two dilutions of RNAse I (Life Technologies) in lysis buffer containing additives were prepared: 10-fold (10 u/ml) and 100-fold (1 u/ml). Per each of the cell lines, three samples were prepared in growing concentrations of RNAse I: (1) undiluted RNAse I (×1) (2) 10-fold diluted, and (3) 100-fold diluted. Volume of RNAse I solution corresponded to 1/100th of total sample volume. The final dilutions of RNAse I were correspondingly: 100-fold, 1,000-fold and 10,000-fold. In parallel, a fourth sample, untreated with RNAse I, was prepared and used as a pull-down control for Western Blotting. Samples were thoroughly mixed, incubated for 15 min in a 37° C. water bath, and were gently mixed every 5 min. After a brief spin-down, each sample received 6 μl (12 U) of SuperRNAseIN (Life Technologies) 10-fold diluted in lysis buffer. Sodium dodecyl sulfate (SDS) concentrations per each sample were further brought up to 0.1% following by addition of 1/100th volume of 10% SDS. After 10 min 21,130×G centrifugation in 4° C., supernatant was transferred into a clean 1.5 ml tube and samples were centrifuged for another 10 min 21,130×G in 4° C. to remove remaining cell debris. 1.5 ml tubes supplemented with 100 μl of pre-clearing beads were put on magnetic separator and lysis buffer was removed. The entire supernatant from the previous step was placed on the beads and samples were further incubated for 1 hr in 4° C. on a rotatory wheel. 1.5 ml tubes containing streptavidin beads were placed on a magnetic separator and any excess of lysis buffer was removed. After capturing pre-clearing beads on magnetic separator, pre-cleared lysate samples were transferred into 1.5 ml tubes supplemented with streptavidin beads and incubated for 2 hrs in 4° C. on a rotatory wheel. Samples were placed on a magnetic separator and supernatant was removed. Samples were washed twice with 0.5 ml wash buffer 1 (PBS containing 8M Urea and 0.1% SDS) for 5 min at room temperature swirling on rotatory wheel. Supernatant was removed by employing magnetic separator each time. Samples were washed twice with 0.5 ml Urea wash buffer (PBS+8M urea+0.1% SDS) and twice with 0.5 ml SDS wash buffer (PBS+2% SDS) for 5 min at room temperature and were swirled on a rotatory wheel. Supernatant was removed on magnetic separator per each cycle. One wash was performed with 0.5 ml high-salt buffer (PBS supplemented with 750 mM NaCl, 1% Nonidet-P-40, 0.5% NaDeoxycholate, and 0.1% SDS) and one time with 0.5 ml low-salt buffer (PBS supplemented with 150 mM NaCl, 1% Nonidet-P-40, 0.5% NaDeoxycholate, and 0.1% SDS) for 5 min at 4° C. on a rotatory wheel per every wash, following by supernatant removal on a magnetic separator. IP control samples received 40 μl of NuPage 3× LDS sample buffer (Life Technologies), or alternatively, 40 μl SDS sample buffer ×1 (80 mM Tris pH 6.8, 2% SDS, 100 mM DTT, and 10% Glycerol). In the remaining RNAse-treated samples, beads were resuspended in 400 μl 1× DNAse buffer (Life Technologies) and incubated for 5 min at 4° C. on a rotatory wheel followed by subsequent supernatant removal on a magnetic separator. Beads were resuspended in 40 μl of DNAse mix (1× DNAse buffer, 0.1 u/μl Turbo DNAse, 0.1 u/μl SUPERasin (Life Technologies), 100-fold diluted EDTA-free protease inhibitors mix (Sigma), 0.4 u/μl Protector RNAse inhibitor) and incubated at 37° C. for 30 min. After a brief spin down, beads were placed on a magnetic rack and supernatant was removed. One wash with 0.5 ml low-salt washing buffer was performed for 5 min at 4° C. placed on a rotatory wheel. Supernatant was removed on a magnetic separator. For phosphorylation of 5′ ends, supernatant was removed on a magnetic separator and beads were washed once in 0.5 ml of PNK buffer (50 mM Tris pH 7.4, 10 mM MgCl2, 5 mM DTT, 0.5% Nonidet-P-40) for 5 min at 4° C. placed on a rotatory wheel. Beads were then resuspended in 20 μl of PNK mix (per sample, 20 μl PNK buffer, 1 μl 32P-gamma-ATP, 0.5 μl (5 U) T4 Polynucleotide Kinase (PNK) (NEB)) and incubated at 37° C. for 20 min. Beads were captured and supernatant was removed on a magnetic separator. Beads were instantly washed 3 times with 0.5 ml of ice-cold PNK washing buffer (50 mM Tris pH 7.4, 150 mM NaCl, 0.5% Nonidet-P-40, 10 mM EDTA). After the last wash, beads were resuspended in 40 μl of NuPage 3× LDS sample buffer (Life Technologies), or alternatively, in 40 μl of SDS sample buffer ×1 (80 mM Tris pH 6.8, 2% SDS, 100 mM DTT, 10% Glycerol). 3× LDS samples were incubated at 80° C. for 10 min. Samples in 1× SDS sample were incubated at 95° C. for 5 min. Samples were then loaded on lmm NuPage 4%-12% Bis-Tris gradient gel (Life Technologies), and electrophoresis on 200V was performed using NuPage MOPS/SDS running buffer (50 mM Tris base, 50 mM MOPS, 0.1% SDS, 1 mM EDTA, pH 7.7). Transfer into nitrocellulose membranes was performed for 1 hr under 140 mA at 4° C., soaking in NuPage 1× transfer buffer (25 mM Bicine, 25 mM Bis-Tris (free base) 1 mM EDTA pH 7.2) supplemented with 12% methanol. Membrane was wrapped in saran plastic wrap and exposed to phosphoimager screen.
Denaturing CLIP Method—Large Scale for Library Preparation.
For large-scale denaturing CLIP method, UV treatment of cells was performed as described above for small scale dCLIP method. Two kinds of magnetic beads were used for the experiment: Protein G Dynabeads (for Pre-clearing) and Dynabeads® MyOne™ Streptavidin C1 (for a Pull-down of biotinylated protein). Beads were thoroughly resuspended and volume of beads corresponding to 80 μl beads×number of cell pellets+5 μl was transferred into clean 2 ml tubes. Beads were captured on magnetic separator. Pre-clearing beads were washed 3 times with 1 ml lysis buffer (PBS+1 mM MgCl2+0.1 mM CaCl2+0.5% Nonidet-P-40+0.5% Sodium Deoxycholate). Streptavidin beads were washed 3 times with 1 ml lysis buffer+0.5% Bovine serum albumin (BSA) Beads were resuspended in 150 μl lysis buffer per 80 μl beads and 150 μl portions transferred into 2 ml tubes. Lysate was prepared the following way. Cell pellets (2 pellets for each cell type) were resuspended each in 1.25 ml of ice-cold lysis buffer supplemented with 1 tablet of Complete-mini EDTA-free tablet (Roche)+40 u/ml Protector RNAse inhibitor (Roche)+1 mM Dithiothreitol (DTT), delivered into 2 ml tubes and incubated for 25 min at 4° C. on rotatory wheel. After brief spin down, 25 μl (50 u) of TurboDNAse (Life Technologies) were added to every tube. The entire content of the tube was transferred to the new 2 ml tube in order to estimate the volume. 2 dilutions of RNAse I (Life Technologies) in lysis buffer+additives were prepared: 100-fold (1 u/ml) and 500-fold (0.2 u/ml) For each cell type, one sample received 100-fold and one sample received 500-fold diluted RNAse I. Volume of RNAse I solution corresponded to 1/100th of total estimated sample. Samples were mixed well and incubated 15 min in 37° C. water bath with mixing up-and-down every 5 min. After brief spin-down, each sample received 24 μl (48 u) of SuperRNAseIN (Life Technologies) diluted 10-fold in lysis buffer. In addition, the sodium dodecyl sulfate (SDS) concentration in each sample was brought up to 0.1% following addition of + 1/100th volume of 10% SDS. After 10 min 21,130×G 4° C. centrifugation, sup was delivered into clean 2 ml tubes and samples centrifuged another 10 min 21,130×G 4° C. to remove remaining cell debris. 2 ml tubes with 150 μl of pre-clearing beads were placed on magnetic separator and lysis buffer removed. The entire sup from the previous step was placed on the beads and samples incubated 1 hr 4° C. on rotatory wheel. 2 ml tubes with Streptavidin beads were placed on magnetic separator and excess lysis buffer removed. After capturing pre-clearing beads on magnetic separator, pre-cleared lysate was transferred into 2 ml tubes with Streptavidin beads and incubated 2 hrs 4° C. on rotatory wheel. Samples were placed on magnetic separator and sup removed. Samples were washed 2 times with 1.2 ml Urea wash buffer (PBS+8M Urea+0.1% SDS) for 5 min on room temperature using rotatory wheel. Sup was removed on magnetic separator every time. Samples were washed 2 times with 1.2 ml SDS wash buffer (PBS+2% SDS) for 5 min on room temperature using rotatory wheel. Sup was removed on magnetic separator every time. One wash was performed with 1.2 ml high-salt buffer (PBS+750 mM NaCl+1% Nonidet-P-40+0.5% NaDeoxycholate+0.1% SDS) and one wash with low-salt buffer (PBS+150 mM NaCl+1% Nonidet-P-40+0.5% NaDeoxycholate+0.1% SDS), 5 min 4° C. on rotatory wheel for every wash with subsequent sup removal on magnetic separator. Beads were resuspended in 800 μl 1× DNAse buffer, transferred into 1.5 ml tubes and incubated for 5 min 4° C. on rotatory wheel with subsequent sup removal on magnetic separator. Beads were resuspended in 160 μl of DNAse mix (1× DNAse buffer, 0.1 u/μl Turbo DNAse, 0.1 u/μl SUPERasin (Life Technologies), 100-fold diluted EDTA-free protease inhibitors mix (Sigma), 0.4 u/μl Protector RNAse inhibitor) and incubated at 37° C. on rotatory wheel for 30 min. After brief spin down, beads were placed on magnetic rack and sup removed. One wash with 1 ml low-salt wash buffer was performed 5 min 4° C. on rotatory wheel. Sup removed on magnetic separator. For 3′ends dephosphorylation, beads were washed once in 1 ml of Low_pH_PNK buffer (70 mM Tris pH 6.5, 10 mM MgCl2, 5 mM DTT) 5 min 4° C. on rotatory wheel. Low-pH-PNK mix (per sample, 80 μl Low_pH_PNK buffer, 2 μl (20 u) T4 polynucleotide kinase (T4 PNK) (NEB), 2 μl (80 u) Protector RNAse inhibitor) was prepared. Beads resuspended in 80 μl of Low-pH-PNK mix and incubated at 37° C. for 20 min on Thermomixer, vortexing on 1,000RPM for 15 sec every 2 min. For subsequent phosphorylation of 5′ ends, sup was removed on magnetic separator and beads washed once in 1 ml of PNK buffer (50 mM Tris pH 7.4, 10 mM MgCl2, 5 mM DTT, 0.5% Nonidet-P-40) 5 min 4° C. on rotatory wheel. Beads were resuspended in 80 μl of PNK mix (per sample, 80 μl PNK buffer, 4 μl 32P-gamma-ATP, 3 μl (30 u) T4 PNK, 2 μl (80 u) Protector RNAse inhibitor) and incubated at 37° C. for 10 min. After adding 8 μl 10 mM ATP, samples were incubated additional 20 min at 37° C. Beads were captured and sup removed on magnetic separator. Beads were instantly washed 3 times with 1 ml of ice-cold PNK wash buffer (50 mM Tris pH 7.4, 150 mM NaCl, 0.5% Nonidet-P-40, 10 mM EDTA). After last wash, beads were resuspended in 85 μl of 1× SDS sample buffer (80 mM Tris pH 6.8, 2% SDS, 100 mM DTT, 10% Glycerol) and incubated at 95° C. for 5 min. Samples were loaded on 1.5 mm NuPage 4%-12% Bis-Tris gradient gels (Life Technologies)—40 μl per lane, and electrophoresis on 200V was performed using NuPage MOPS/SDS running buffer (50 mM Tris base, 50 mM MOPS, 0.1% SDS, 1 mM EDTA, pH 7.7). Transfer into nitrocellulose membranes was performed for 1.5 hrs 140 mA on 4° C. in NuPage 1× transfer buffer (25 mM Bicine, 25 mM Bis-Tris (free base) 1 mM EDTA pH 7.2)+12% methanol. Membrane was wrapped in Saran plastic wrap and briefly exposed to phosphoimager screen.
Samples that were subjected for beads elution protocol, were treated with PNK mix supplemented with 8 μl 10 mM ATP and incubated at 37° C. for 30 min. Samples were washed twice with 1 ml Urea wash buffer (PBS+8M Urea+0.1% SDS) and once with Proteinase K buffer (100 mM Tris pH 8.0, 200 mM NaCl, 5 mM EDTA, 0.1% SDS) for 5 min on room temperature using rotatory wheel. Beads were resuspended in 200 μl Proteinase K mix (100 mM Tris pH 8.0, 200 mM NaCl, 5 mM EDTA, 0.1% SDS, lmg/ml Proteinase K (PCR grade, Roche, 20 mg/ml)) and incubated 30 min at 55° C. using rotatory wheel. After brief spin-down and beads capture on magnetic separator, eluted RNA from 2 combined samples (400 μl total) was transferred into phase-lock gel 2 ml tubes (5 Prime) (pre-centrifuged 30 sec 16,000×G to pellet gel), and 400 μl acidic phenol-chloroform (Life Technologies) were added. After rigorous up-and-down shaking to mix the phases, samples were centrifuged 5 min 16,000×G on room temperature. Another 400 μl of acidic phenol-chloroform were added to the upper aqueous phase with subsequent rigorous up-and-down shaking to mix the phases and 5 min 16,000×G centrifugation on room temperature. Upper phase was transferred into clean 1.5 ml tubes with 40 μl 3M Sodium Acetate. After addition of 1 μl Glycoblue (Life Technologies) and 1 ml 100% ethanol, samples were mixed by up-and-down shaking and incubated at least 16 hrs on −20° C.
Elution of Protein-Bound RNAs from Membrane
For membrane elution, PK solution (100 mM Tris pH 7.4, 50 mM NaCl, 10 mM EDTA, 4 mg/ml Proteinase K (Roche)) was prepared and pre-incubated for 10-20 min at room temperature to eliminate possible RNAse contamination. All the solutions were filtered through 0.22 μm membrane filter before adding Proteinase K. Membrane pieces were excised using a sterile scalpel starting from protein of interest size+10 kda (corresponding to approximately 30 bases of RNA fragments covalently linked to the protein of interest) up to the end of visible radioactive signal specific to the protein of interest. Addition of 10 kDa to the original protein size allows for binding of roughly 30 bases of RNA in complex with the protein of interest. The goal was to avoid purifying CBX7 crosslinked to RNAs shorter than 30 bases, as the shorter RNAs would be more difficult to sequence and align to the genome with high confidence level. Membrane pieces were further cut into smaller pieces and placed into low-binding 1.5 ml tubes. After addition of 200 μl PK buffer, membrane pieces were incubated for 20 min at 55° C. in Thermomixer with constant vortexing on 1,200 RPM. Meanwhile, PK-urea solution was prepared (100 mM Tris pH 7.4, 50 mM NaCl, 10 mM EDTA, 7M urea). All the solutions except for urea were filtered through 0.22 μM membrane filter before preparation. 200 μl of PK-urea solution was further added to membrane pieces. Samples were incubated for 20 min at 55° C. in Thermomixer with constant vortexing on 1,200 RPM. After brief spin-down, the eluates were transferred into phase-lock gel 2 ml tubes (5 Prime) (pre-centrifuged 30 sec 16,000×G to pellet gel), and 400 μl acidic phenol-chloroform (Life Technologies) were added. After rigorous up-and-down shaking to mix the phases, samples were centrifuged 5 min 16,000×G on room temperature. Another 400 μl of acidic phenol-chloroform were added to the upper aqueous phase with subsequent rigorous up-and-down shaking to mix the phases and 5 min 16,000×G centrifugation on room temperature. Upper phase was transferred into clean 1.5 ml tubes with 40 μl 3M Sodium Acetate. After addition of 1 μl Glycoblue (Life Technologies) and lml 100% ethanol, samples were mixed by up-and-down shaking and incubated at least 16 hrs on −20° C.
Library Preparation from Membrane-Eluted and Beads-Eluted Samples
Membrane-eluted or beads-eluted samples were centrifuged for 30 min 13,523×G on 4° C. and sup removed. Pellets were washed once with 1 ml 75% ethanol in DEPC-treated water with subsequent 10 min 13,523×G centrifugation on 4° C. After sup removal and short spin down, the remaining ethanol solution was carefully removed and pellets incubated 5-10 min with open cup on room temperature inside a PCR workstation under constant airflow. Pellets were eluted in 25 μl DNAse mix and 2 samples that belonged to the same cell type with different RNAse concentrations combined into one sample (DNAse mix: per combined sample, 43 μl nuclease-free DDW, 5 μl 10× DNAse buffer, 1 μl (40 u) Protector RNAse inhibitor, 1 μl (2 u) of Turbo-DNAse (Life Technologies). Samples were incubated 30 min on 37° C. RNA was extracted using 950 μl Trizol reagent (Life Technologies) according to the manufacturer instructions. 0.5 μl Glycoblue were used for precipitation during Trizol extraction. RNA pellets were eluted with 8 μl nuclease-free DDW. Samples were incubated for 2 min on 70° C. to reduce the secondary structure and then kept on ice. Multiplex Compatible NEBNext Small RNA Library Prep Set for Illumina (NEB) was used for library preparation according to manufacturer instructions with the following modifications. 7 μl of eluted RNA were used for library preparation. All the adapters and primers used throughout a procedure were diluted 12-fold. SuperScript III Reverse Transcriptase (Life Technologies) and Protector RNAse inhibitor (Roche) replaced M-MuLV reverse transcriptase and Murine RNAse inhibitor respectively. 25 PCR amplification cycles were performed on the resulting cDNA using multiplexed primers with Illumina barcodes—distinct barcode for every cell type. Amplification was performed with LongAmp™ Taq 2× Master Mix (NEB). Amplified PCR products were subjected to PAGE electrophoresis on 6% TBE-acrylamide gel. The area between 160 bp and 520 bp was excised, gel pieces crushed into slurry with 1 ml syringe plunger and PCR products eluted by overnight incubation on room temperature in 400 μl Gel elution buffer (NEB) inside a 1.5 ml low-binding tubes. One glass filter (Whatman, 1823010) was placed into Costar SpinX column (Cornig, 8161). The suspension from the previous step was placed on the column and centrifuged on 15,871×G on room temperature for 1 min. Eluates were subjected to ethanol precipitation following addition of 40 μl 3M Sodium Acetate, 1 μl Glycoblue and lml 100% ethanol. After incubating for at least 30 min on −20° C., samples were centrifuged for 30 min 13,523×G on 4° C. and sup removed. Pellets were washed once with 1 ml 70% ethanol with subsequent 10 min 13,523×G centrifugation on 4° C. After sup removal and short spin down, the remaining ethanol solution was carefully removed and pellets incubated 5-10 min with open cap on room temperature inside a PCR workstation under constant airflow. Pellets were eluted in 12 μl nuclease-free water. Size distribution of PCR products was determined by Bioanalyzer run with 1 μl of each sample loaded on High Sensitivity DNA chip (Agillent). Quantification of PCR products was performed by Illumina Library Quantification kit (Kapa Biosystems). Equivalent amounts of multiplexed samples were pooled into final library—1.5 nM-2 nM per multiplexed sample. Total of 3 to 5 samples were pooled into one library.
Gene Expression
RNA was extracted from cells with Trizol reagent (Life Technologies) according to manufacturer instructions. cDNA libraries were constructed using Superscript III reverse-transcriptase (Life Technologies) and qPCR was performed with primers spanning exon-exon junctions. For studies involving intronic primers (
Native RNA Immunoprecipitation (Native-RIP)
EL16.7 cells were grown on T75 flask until ˜80% confluency. Cells were trypsinized and, after adding fresh growth media, counted and pelleted by 5 min 200×G centrifugation. Cell pellets were resuspended in PBS and divided into 1×107 cells aliquots. After another round of centrifugation, sup was removed and cells shock-frozen in LN2. After thawing, cells from single cell pellet were incubated in 1 ml of ice-cold hypothonic buffer A (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 10 mM KCl)+1 mM AEBSF. Cells were incubated on ice for 20 min and nuclei were pelleted by 15 min 2,500×G centrifugation on 4° C. Sup was removed and pellet resuspended in 1 ml of Polysomal lysis buffer (10 mM HEPES pH 7.0, 100 mM KCL, 5 mM MgCl2, 0.5% NP-40)+1 mM DTT+EDTA-free PI cocktail 1:100+100 u/ml RNAseIN (Promega). After adding 20 μl (40 u) of TurboDNAse (Life Technologies), cell nuclei were incubated 30 min 4° C. on orbital shaker. After 10 min 16,000×G 4° C. centrifugation, supernatant was transferred into 16 ml tube with 9 ml NT2 buffer (50 mM Tris pH 7.4, 150 mM NaCl, 1 mM MgCl2, 0.05% NP-40)+10 μl 1M DTT+10 μl RNAseIN (Promega)+1 tablet of Complete-mini EDTA-free protease inhibitors mix. On the same time, Protein G Dynabeads were prepared—20 μl per sample for pre-clearing×number of samples+20 μl per sample for immunoprecipitation×number of samples. Beads were thoroughly resuspended, captured on magnetic separator and washed 3×1 ml NT2 buffer (50 mM Tris pH 7.4, 150 mM NaCl, 1 mM MgCl2, 0.05% NP-40). After final resuspension—20 μl beads/100 μl NT2 volume, beads intended for immunoprecipitation were incubated with 5 μg of specific antibody (anti-CBX7 (P-15), Santa Cruz Biotechnologies) or Rabbit IgG control (Abcam) in total volume of 500 μl NT2 buffer. To prepare pre-cleared lysates, 1 ml aliquots of lysate were transferred into 1.5 ml tubes with 100 μl beads suspension and incubated 1 hr 4° C. on rotatory shaker. Input samples were prepared by taking 100 μl aliquots from lysate+900 μl Trizol reagent. After capturing beads on magnetic separator, pre-cleared lysates were transferred into 1.5 ml tubes with beads-antibody complex (unbound antibody fraction was removed on magnetic separator). After 3 hrs 4° C. incubation on rotatory shaker, sup was removed and beads washed 5×1 ml NT2 buffer. After the last wash solution was removed on magnetic separator, beads were resuspended in 1 ml Trizol reagent. RNA was extracted according to manufacturer protocol, eluted in 20 μl nuclease-free water and 2 μl of eluted RNA was subjected to reverse transcription using SuperScript III (Life technologies) according to manufacturer instructions. qPCR assays were performed on CFX96 real-time PCR system (Bio-Rad). Specific primers are listed in Table a. Threshold cycle values were translated into initial template amount for each sample based on the standard curve prepared from known quantities of EL16.7 cDNA. Enrichment of specific RNA species was expressed as percentage of total input RNA for each reaction.
Chromatin Immunoprecipitation
Before the experiment, cells were grown on 15 cm feeder plates up to 80-90% confluence. Medium was removed and cells washed once with 20 ml PBS. After 10 min incubation on 37° C. in 3 ml trypsin-EDTA (Gibco), cells were passed twice through 200 μl tip in 9 ml growth media using 13 ml pipette, transferred into 50 ml tubes with 18 ml growth media and counted. Cells were centrifuged 5 min 200×G on room temperature. Sup was removed and cells resuspended in 40 ml fresh growth media and split into 2×15 cm tissue culture dishes—20 ml per dish. Cells were incubated 45 min on 37° C. for feeder removal. Floating cells were collected into 50 ml tube and counted. Then, 1/10th volume of cross-linking solution (50 mM HEPES-KOH pH. 7.5, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 11% Formaldehyde) were added and cells incubated 20 min room temperature on rotatory shaker followed by quenching with 1/20th volume of 2.5M Glycine solution. After 5 min 700×G centrifugation on 4° C., cells were washed twice with 30 ml of ice-cold PBS and pellet resuspended in volume of ice-cold PBS according to 3 ml PBS/5×106 cells ratio. Cells were divided into 3 ml portions in 16 ml tubes and centrifuged 5 min 700×G 4° C. Sup was aspirated and pellets shock-frozen in liquid nitrogen and stored on −80° C. On the day of immunoprecipitation, cell pellets were pre-thawed on 4° C. and re-suspended in 1 ml Buffer#1 (50 mM HEPES-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% Glycerol, 0.5% NP-40, 0.25% Triton-X-100) supplemented with protease inhibitors mix (Sigma). After 10 min 4° C. incubation on rotatory shaker, cells were spun 5 min 1400×G 4° C. Sup was aspirated and pellets resuspended in 1 ml Buffer#2 (10 mM Tris pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA) supplemented with protease inhibitors mix (Sigma). After 10 min 4° C. incubation on rotatory shaker, cells were spun 5 min 1,400×G 4° C. Sup was aspirated and pellets resuspended in 1.2 ml of Buffer#3 (10 mM Tris pH 8.0, 1 mM EDTA, 0.5 mM EGTA) supplemented with protease inhibitors mix (Sigma). After 10 min 4° C. incubation on rotatory shaker, 70 μl 10% N-lauroyl-sarcosine were added, cell nuclei suspension transferred into screw-cap 15 mm×19 mm Covaris tubes and sonicated in Covaris E220 system with the following conditions: Duty—10%, Peak Incident Power—175, Cycles per burst—200, Duration—2400 sec (40 min). After sonication, cells were transferred into 1.5 ml tubes and centrifuged for 10 min 14,000×G on 4° C. Sup was transferred into 1.5 ml tubes, 20 μg of RNAse A (Roche) were added and samples incubated for 30 min on 37° C. After incubation, 55 μl aliquots were taken from each sample for input control and the rest divided into 2×550 μl portions in 1.5 ml tubes. Input samples were stored on −20° C. 275 μl of freshly prepared solution (3% Triton-X-100, 0.3% NaDeoxycholate, 3 mM EDTA)+protease inhibitors mix were added to 550 μl samples along with specific antibody or matched isotype controls and samples incubated 16 hrs 4° C. on rotatory shaker. For recombinant CBX7-Flag-HA, 5 μl of rabbit polyclonal anti-hemagglutinin tag antibody (H6908, Sigma) were used per reaction. For endogenous CBX7, 5 μg of rabbit polyclonal anti-CBX7 antibody (ab21873, Abcam) were used per reaction. 5 μg of Ubiquityl-Histone H2A (Lys119) (D27C4) #8240 antibody (Cell Signaling) were used for pull-down of ubiquitynated histone H2A. Meanwhile, magnetic protein G dynabeads (Life technologies)—40 μl per reaction, were washed twice with Buffer#1 using magnetic stand and blocked for 1 hr on 4° C. with 250 μg/ml salmon sperm DNA (Life technologies). After two washes with Buffer#1, beads were resuspended in buffer#1 according to 40 μl beads/100 μl buffer ratio and divided into 1.5 ml tubes. After removal of buffer, immunoprecipitated samples were transferred to 1.5 ml tubes with protein G dynabeads for additional 2-3 hrs 4° C. incubation on rotatory shaker. Then, sup was removed and beads washed 3×0.5 ml RIPA-1 buffer (50 mM HEPES-KOH, pH 7.5, 0.5 m LiCl, 0.7% NaDeoxycholate, 1% NP-40, 10 mM EDTA) and 3×0.5 ml RIPA-2 buffer (50 mM HEPES-KOH, pH 7.5, 0.25 m LiCl, 0.7% NaDeoxycholate, 1% NP-40, 10 mM EDTA). Beads were resuspended manually on each step. After one wash with 0.5 ml TEN buffer (10 mM Tris pH 8.0, 50 mM NaCl, 1 mM EDTA), beads were resuspended in 0.2 ml TES buffer (50 mM Tris pH8.0, 10 mM EDTA, 1% SDS) and incubated 15 min 65° C. with occasional vortexing and subsequent spin down. 145 μl of TES buffer were added to input samples. 40 μg of Proteinase K (Roche) were then added to all samples and samples incubated 16 hrs on 65° C. Then, after addition of 0.2 ml of TE buffer, the entire volume was transferred to Phase-Lock Gel Heavy 2 ml tubes (5 Prime GmbH) and extracted with 0.4 ml of phenol:chlorophorm:isoamyl alcohol solution (25:24:1, USB) according to manufacturer protocol. The aqueous phase was collected and ethanol precipitated by adding 40 μl of 3M NaAcetate, 25 μg GlycoBlue reagent (Life technologies) and 2.5 volumes of 100% ethanol. Elution was performed with 50 μl TE buffer pH 8.0. qPCR assays were performed on CFX96 real-time PCR system (Bio-Rad). Specific primers are listed in Table a. Threshold cycle values were translated into initial template amount for each sample based on the standard curve prepared from known quantities of genomic DNA. Enrichment of specific PCR amplicons was expressed as percentage of total input DNA for each reaction. ChIP-seq libraries were constructed using the NEBNext ChIP-Seq Library Prep Master Mix Set (NEB). Libraries were subjected to high-throughput sequencing using Illumina HiSeq 2000 apparatus according to manufacturer instructions. Approximately 40 million paired-end 50 bp reads were generated for every ChIP-seq sample.
LNA Nucleofection
LNA mixmers (Exiqon) were designed specifically against a CBX7 binding regions of selected genes (See Table a for the list of LNA oligomers) A total of 2×106 EL16.7 cells, after feeder removal, were resuspended in 100 μL of ES cell nucleofector solution (Lonza). LNA oligos were added to a final concentration of 2 μM. The cells were transfected using the A-013 program on Amaxa Nucleofector II. 0.5 mL of culture medium were added and cell suspension was divided equally between two wells in gelatinized 6-well dish. For RT-qPCR, cells were harvested in 1 ml Trizol reagent 24 hrs after nucleofection, RNA extraction was performed according to manufacturer instruction. For Western Blotting, cells were scraped in 300 μl of SDS sample buffer (50 mM Tris pH 6.8, 100 mM DTT, 2% SDS, 0.1% bromophenol blue, 10% glycerol) and resulting extracts boiled on 95° C. for 5 min. For ChIP and Formaldehyde-assisted Isolation of Regulatory Elements (FAIRE) assays, three nucleofection reactions were pooled into one gelatinized 10 cm dish and cells harvested for cross-linking according to the ChIP or FAIRE protocol.
Formaldehyde-assisted Isolation of Regulatory Elements (FAIRE) analysis FAIRE analysis of nucleofected cells was performed as described in Simon et al (Simon et al., 2012) with following modifications. 24 hours after nucleofection, cells growing on gelatin-coated 10 cm tissue culture dishes were trypsinized with 1 ml Trypsin-EDTA solution. After most of the cells detached from the surface, 9 ml growth media were added to the plate and cells passed 2 times through 200 μl pipette tip. Then, 1/10th volume of cross-linking solution (50 mM HEPES-KOH pH. 7.5, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 11% Formaldehyde) were added and cells subjected to 5 min incubation on room temperature with constant rotation followed by 5 min quenching with 1/20th volume of 2.5M Glycine solution. After 5 min 700×G centrifugations on 4° C., cells were washed 3 times with 10 ml of ice-cold PBS. Sup was aspirated and pellets shock-frozen in liquid nitrogen and stored on −80° C. On the day of the assay, cell pellets were pre-thawed on 4° C. and re-suspended in 1 ml lysis buffer Buffer A (10 mM HEPES-KOH pH 7.5, 100 mM NaCl, 1 mM EDTA, 1% SDS, 2% TX-100) After 10 min 4° C. incubation on rotatory shaker, cells were delivered into screw-capped 1.3 ml Covaris 15 mm×19 mm tubes and sonicated in Covaris E220 system in the following conditions: Duty—10%, Peak Incident Power—175, Cycles per burst—200, Duration—600 sec (10 min). After sonication, cells were transferred into 1.5 ml tubes and centrifuged for 5 min 20,000×G on 4° C. to pellet cell debris. Sup was transferred into 1.5 ml tubes. 100 μl aliquots from each sample were taken as input controls. Then, 2 aliquots of 300 μl from each lysate were transferred to Phase-Lock Gel Heavy 1.5 ml tubes (pre-centrifuged for 30 sec on 16,000×G), and extracted twice with 300 μl of phenol:chlorophorm:isoamyl alcohol solution (25:24:1, USB) after vigorous shaking and 5 min 16,000×G centrifugation on room temperature. The remaining phenol was removed by adding 150 μl of 24:1 chloroform:isoamyl alcohol solution, 5 min 16,000×G. The upper aqueous phase was transferred to 1.5 ml tube. Another 100 μl of EB buffer (Qiagen) were added to Phase-Lock Gel Heavy 1.5 ml tube to collect the remaining upper phase and transferred to the same 1.5 ml tube with the rest of the upper phase. After adding 40 μl of 3M Sodium Acetate, 1.5 μl of GlycoBlue reagent (Life technologies) and 800 μl ethanol, samples were incubated on −80° C. for at least 30 min. Samples were centrifuged 15 min 12,000×G 4° C. Sup removed and pellets washed 1×0.5 ml 70% ethanol, 5 min 12,000×G. Samples were eluted with 50 μl EB buffer (Qiagen). 1 μl of DNAse-free RNAse A (Sigma, 37 mg/ml) was added to every sample including input samples, 30 min 37° C. Then, 1 μl (20 μg) Proteinase K (Roche) were added and samples incubated for 1 hr on 55° C. and 16 hrs on 65° C. to reverse cross-linking. Then, samples were supplemented to 300 μl with EB buffer. Phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation were repeated exactly the same way it was performed in the first step. Samples eluted with 50 μl EB buffer (100 μl EB buffer for input samples). qPCR assays were performed on CFX96 real-time PCR system (Bio-Rad). Specific primers are listed in Table a. Threshold cycle values were translated into initial template amount for each sample based on the standard curve prepared from known quantities of genomic DNA. Enrichment of specific PCR amplicons was expressed as percentage of total input DNA for each reaction.
Protein Expression and Purification
Mouse CBX7 carrying Flag and HA tag on C-terminus was expressed in Sf9 insect cells using the bac-to-bac system (Invitrogen). Protein extract was prepared by resuspending cell pellet in lysis buffer F (20 mM HEPES-KOH [pH 7.9], 300 mM NaCl, 4 mM MgCl2, 1 mM DTT, 20% glycerol, and protease inhibitors mix [Sigma]). After 15 strikes with tight pestle on 15 ml Dounce homogenizer, cell suspension was supplemented with 0.1% Nonidet-P-40, 0.2% Triton-X-100, 5 u/ml TurboDNAse (Invitrogen) and 12.5 μg/ml Heparin. After 30 min 4° C. incubation on orbital shaker, cell lysate was subjected to 2 rounds of 15 min 30,000×G centrifugation on 4° C. Supernatant was collectedand snap-frozen in liquid nitrogen. M2 anti-FLAG beads (Sigma) were used for all purifications. Proteins were bound to M2 beads in lysis buffer for 2 hr at 4° C. Beads were washed twice with buffer F (500 mM NaCl), twice with buffer F (300 mM NaCl) and twice with elution buffer (50 mM Tris pH 7.4, 100 mM NaCl). Proteins were eluted twice by 1 hr incubations with 0.2 μg/ml 3×-FLAG peptide (Sigma). Protein concentrations were determined by SDS-page and Bradford assay using bovine serum albumin as a standard.
Electrophoretic Mobility Shift Assays
RNA-EMSA assays with CBX7 protein were performed as follows. Labeled RNAs were produced with MEGAscript® T7 Transcription Kit (Life Technologies) and purified from 6% acrylamide TBE-urea gels. Labeled RNAs were prefolded in buffer TE+300 mM NaCl by incubating for 2 min at 95° C., followed by 20 min incubation on ice. Binding reactions were assembled with 20 μl of binding mix (13 mM Tris pH 8.0, 0.2 mM EDTA, 68.8 mM NaCl, 20% Glycerol, 0.2 mg/ml Yeast tRNA, 4 mM DTT, 4 μl 2500 cpm/μl RNA probe). LNA oligonucleotides were added to binding mix at final concentration of 8 μM and samples were pre-incubated on ice for 10 min. After pre-incubation, binding mix samples were combined with 60 μl of purified protein in dialysis buffer (50 mM Tris pH 7.4, 5 mM MgCl2, 50 mM NaCl, 1 mM DTT, 10% Glycerol, 4 u/μl Protector RNAse inhibitor (Roche)). Control experiments were performed with dialysis buffer only or control proteins—Flag-GFP or GST-Flag-HA, dissolved in dialysis buffer at the highest protein concentration in the particular experiment. After 30 min on ice, the sample was loaded onto a 5% 37:1 acrylamide (Bio-Rad) gel in 0.5× TBE buffer (45 mM Tris-Borate, 1 mM disodium EDTA) and run for 90 min at 250 V at 4° C. Gels were exposed to phosphorimager screens. For validation of motif sequences, labeled RNA probes were produced with MEGAshortscript™ T7 Transcription Kit (Life Technologies) and gel purified using 15% TBE-urea gels (Life technologies) Similar RNA-EMSA conditions were applied except 8% 37:1 acrylamide (Bio-Rad) gels replaced 5% gels. Sequences of RNA probes are given in Table a.
Western Blotting
20 μl of protein extracts were resolved on 4%-20% gradient SDS-PAGE gels (Bio-Rad) and proteins were transferred for 1 hr on 100V in transfer buffer (48 mM Tris, 39 mM Glycine, 20% methanol) to Immobilon-P 0.45 μm PVDF membrane (Millipore) using Mini Protean Tetra transfer unit (Bio-Rad)). To detect CBX7 protein expression, Western blotting was performed with mouse monoclonal CBX7 Antibody (G-3) (Santa Cruz Biotechnologies, sc-376274) as primary antibody and goat-anti-mouse-HRP (Promega) as a secondary antibody. For quantitative Western blotting of DCAF12l1 protein, anti-WDR40B (Dcaf12l1) rabbit polyclonal antibody (Biorbit, orb155395) was used as a primary antibody along with anti-Ctcf rabbit polyclonal antibody (Cell Signaling Technologies, #2899) as a loading control. Goat-anti-rabbit-HRP (Promega) was employed as a secondary antibody. Protein bands were developed using Western Lightening Plus-ECL Kit (Perkin-Elmer) and the signal intensity was analyzed using Chemidoc MP Imaging System (Bio-Rad) and ImageLab Ver. 5.2.1 software (Bio-Rad). Exposures were captured on different times using ChemiDoc cumulative signal option to avoid signal saturation. Standard curves were prepared using increasing amounts of cell extract (
Quantification and Statistical Analysis of qPCR Data
Data represents the average±standard deviation for at least 3 biological replicates as stated in the figure legends. P values were determined by unpaired two-tailed student t-test unless otherwise stated.
Quantitative Analysis of RNA-EMSA
Gels were exposed to phosphoimager screens and scanned using Typhoon laser scanner (GE Healthcare). Radioactive signal intensity was quantified by Image Quant 5.2 software (GE Healthcare). Fraction of bound RNA (signal intensity of the shifted bands divided by the total signal intensity in the particular lane) was computed for every protein concentration and plotted against corresponding protein concentration. To determine dissociation constant (Kd), the resulting binding curves were fitted to sigmoidal plots by non-linear regression using “Prism” software (Graphpad Software inc).
Analysis of CLIP-Seq Data
Libraries were subjected to high-throughput sequencing using Illumina HiSeq 2000 apparatus according to manufacturer instructions. Approximately 40 million paired-end 50 bp reads were generated per every CLIP-seq sample. Adaptor sequences were trimmed with either Trim Galore! V0.3.3 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) (for CLIP-seq; stringency 15 and allowed error rate 0.2), or cutadapt (v1.0) (https://pypi.python.org/pypi/cutadapt/) Identical genomic sequences (PCR duplicates) were removed by custom program prior to alignment. To account for the M. mus (mus)/M. castaneus (cas) hybrid character of mouse EL 16.7 ES cell line that was employed in a CLIP-seq studies, reads were first aligned to custom mus/129 and cas genomes, and then mapped back to the reference mm9 genome (Pinter et al., 2012). All alignments were performed by utilizing Tophat (v2.0.11) (Trapnell et al., 2012). Post-processing of alignments was performed with custom scripts using SAMtools (Li et al., 2009), and BEDtools v2.17.0 (Quinlan and Hall, 2010). These included accounting, alignment file-type conversion, extracting and reads sorting (SAMtools), and obtaining wig coverage files (SAMtools depth).
Fragment per million (fpm) wig files were then created by scaling uniquely aligned wig files to total number of fragments per million in each library (determined by SAMtools flagstat combining reads “with itself and mate mapped” and “singletons”). CLIP-seq enriched tag density wig files were viewed in UCSC genome browser (Kent et al., 2002) or Integrated Genome Browser (IGB) (Nicol et al., 2009). Then consecutive wig entries of equal coverage were merged forming bed files that were used for peak calling. The peak caller software PeakRanger (v.16) (Feng et al., 2011) was used. The software requires an even distribution of watson/crick entries, thus, prior to peak calling, strand specific bed file entries were randomized per each strand. PeakRanger was called with arguments ranger -p 0.01-format bed -gene_annot_file (mm9), -d experiment and -c mock-transfected control, to identify narrow peaks with p-value 0.01 or less.
For assessing dCLIP fragments footprints PeakRanger-enriched CLIP fragments from 3 libraries were pooled and merged in a strand-specific manner to create continuous CLIP fragments (in case of overlapping peaks). Length-frequency histogram of enriched CLIP fragments was obtained, along with mean, median, and SD.
RNA-Seq Analysis
For RNA-seq, RNA was extracted from cells used for dCLIP experiments. Starting amount of Total RNA was 4 μg. RNA was depleted of ribosomal RNA using Ribominus kit (Life technologies). Strand-specific cDNA libraries were constructed using Superscript III reverse-transcriptase for first-strand synthesis, NEBNext mRNA Second Strand Synthesis Module supplemented with dUTP (NEB) for second-strand synthesis, and NEBNext ChIP-Seq Library Prep Master Mix Set for library preparation. Libraries were subjected to high-throughput sequencing using Illumina HiSeq 2000 apparatus according to manufacturer instructions. Approximately 40 million single-end 50 nt reads were generated for every RNA-seq sample. Data processing was performed essentially as described previously (Kung et al., 2015). Adaptor sequences were trimmed from libraries with Trim Galore! v0.3.3 (for dCLIP-seq and RNA-seq; stringency 15). PCR duplicates were removed by custom programs prior to alignment. To account for the M. mus (mus)/M. castaneus (cas) hybrid character of the ES cell lines, reads were first aligned to custom mus/129 and cas genomes, and then mapped back to the reference mm9 genome. RNA was aligned with Tophat (v2.0.8 or greater). Post-processing of mm9 alignments was performed with custom C and Perl programs and bash shell scripts, SAMtools v0.1.18, and BEDtools v2.17.0.
RNA-Seq vs CLIP-Seq Analysis
For RNA-seq analysis two CBX7 libraries were used and for CLIP-seq analysis three CBX7 libraries were used. We have performed the following analysis for both RNA-seq and CLIP libraries: By applying the Homer (http://homer.salk.edu/homer/motif/) suit's makeTagDirectory and makeUCSCfile algorithms we converted aligned SAM files into strand-specific bedGraph files. Reads were further filtered to eliminate mappable reads assigned to ribosomal DNA and mitochondrial DNA, and per each library read counts values were normalized to the corresponding 3rd quartile read counts value. Strand-specific wig files were then binned to 100 bp windows and subjected to Piranha peak analysis (http://smithlabresearch.org/software/piranha/) resulting in significantly (p<0.01) enriched peaks. Piranha CLIP peaks were further filtered to include only peaks that were considered enriched also based on the PeakRanger algorithm (see “PeakRanger” peak calling under STAR Methods' “Analysis of CLIP-Seq data” section). To compare between the resultant enriched CLIP signals and their corresponding enriched RNAseq signals, piranha peaks files (of both, CLIP libraries and RNAseq libraries) were processed by the Homer's makeTagDirectory algorithm, and subsequently parsed to genomic features by the Homer's analyzeRNA algorithm, generating a matrix of total read counts per gene normalized to the length of each gene. Next, a matrix holding only genes that manifested CLIP signals higher than zero in at least two out of three datasets was created. The log 2-transformed normalized read values of each of 2 or 3 CLIP libraries that had enriched signals were averaged to reflect an averaged CLIP signal per gene. The normalized values (per gene) of corresponding RNAseq libraries, similarly analyzed in parallel, were averaged in the same manner to reflect an average RNAseq signal per gene. This analysis resulted in a matrix containing 1,333 genes with positive CBX7 CLIP signal.
To focus on a gene cohort that represents genes with high CLIP and RNAseq signals we selected 10% of the CLIP'ed genes (135 genes circled by green ellipse, as shown in
To determine reproducibility among dCLIP peaks, we utilized deepTools (Ramirez et al., 2014) analysis, we averaged the significance values (−log(p-value)) of strand-specific peaks enriched in at least two out of three replicates, per bin. 1-kb bin size was applied. Pairwise-Pearson correlation (PPC) analysis was performed for the 3 replicates. Scatter plots were generated (
We further utilized a matrix of total read counts per gene normalized to the length of each gene (as described above), and per all genes that manifest CLIP signal in at least two replicates, conducted three paired comparisons as follows: Replicate #1 vs. Replicate #2; Replicate #1 vs. Replicate #3; Replicate #2 vs. Replicate #3. Normalized data points were plotted and correlative patterns are presented as three scatterplots (
To characterized and summarize whole-genome occupancy pattern of peaks, we pooled and merged all piranha peaks (overlapping PeakRanger peaks) from three libraries into one continuous track (containing 8,578 peaks), and employed CEAS analysis (Ji et al., 2006; Shin et al., 2009), using the mm9 KnownGenes database, and as background dataset we used a merged transcriptome coverage track obtained from two RNA-seq experiments conducted in the same cell line that was employed for conducting CLIP experiments.
Chip-Seq Analysis
Data processing was performed as described previously (Kung et al., 2015). Normalization to input libraries was performed with window size 500 and step size 100. To obtain highly significant ChIP peaks, we used the software macs2 (version 2.1.0.20150603) (Zhang et al., 2008) with highly stringent constraints (p 0.01) to identified ChIP peaks verses input for the IP. Additionally, we compared the IP verses a control IP using PeakSeq (version 1.3) (Rozowsky et al., 2009) with stringent constraints (Enrichment_fragment_length 200, Enrichment_mapped_fragment_length 200, Background_model Simulated, max_Qvalue 0.1, target_FDR 0.05, N_Simulations 10, Minimum_interpeak_distance 200). Then we constrained the macs called ChIP peaks to only those that were also called against the control experiment. Finally, we limited the resulting peaks to those that intersect with IP enriched regions six-fold over the input. IP and input regions are obtained using smoothed coverage data using a 500 nucleotide window with 100 nucleotide steps. To assess the relationship between CBX7s binding to RNA vs. DNA we counted the number highly significant ChIP peaks that overlapped dCLIP-bound transcripts versus all expressed transcripts using non-parametric techniques (1,000 random selections with replacement). Overlap was counted if ChIP peak was located inside an open reading frame or a promoter region (2,000 nt upstream to transcription start site) of the corresponding transcript.
Motif Analysis of CLIP-Seq Data
Basing on three separate CLIP experiments that were performed, three mouse CBX7 CLIP-seq datasets were raised, containing the following numbers of PeakRanger enriched CLIP regions aligned to positive and negative strands, respectively (after excluding rDNA and mtDNA sequences): #1: (4,262, 4,225), #2: (4,254, 3,979), #3: (5,021, 5,009). To thoroughly analyze the biological information embedded within the three independent CBX7 CLIP-seq libraries of mouse we have defined three grouping categories on the basis of regions redundancy (overlapping) existing between the three independent libraries. We dubbed these categories: (1) “Individuals”: a category containing original, unfiltered, enriched CLIP regions; (2) “OneOL”: a category containing enriched CLIP regions that their span intersects with the span of at least another enriched CLIP region that was raised from another independent library; (3) “TwoOL”: a category composed of enriched CLIP regions that their span intersects with the span of enriched CLIP regions raised from two independent libraries. This approach was based on the presumption that CBX7 could have more than one consensus motif and that each library may not have sufficient depth to capture all CBX7-binding sites. Basing on these three categories we opted to take a parallel branched approach by classifying the enriched regions raised from the three independent CBX7 CLIP-seq libraries into nine datasets, namely: Individuals #1, Individuals #2, Individuals #3, OneOL #1, OneOL #2, OneOL #3, and TwoOL #1, TwoOL #2, TwoOL #3. In addition to identifying the boundaries of enriched CLIP regions, PeakRanger algorithm determines the summit position of each region (harboring the topmost CLIP signal), which manifests the strongest binding affinity towards CBX7. Thus, in order to pinpoint the most significant CBX7-RNA binding events we referred to the summit point of each enriched region as an anchoring position and stretched a 100 bp region around it (±50 bp) (Ma et al., 2014). Per each of the nine datasets, summit-based 100 bp CLIP-enriched regions of positive and negative strands were combined into a single batch, resulting the following (number of enriched CLIP regions is indicated in parenthesis): (1) Individuals #1 (8,492); (2) Individuals #2 (8,237); (3) Individuals #3 (10,031); (4) OneOL #1 (3,422); (5) OneOL #2 (2,499); (6) OneOL #3 (3,182); (7) TwoOL #1 (1,125); (8) TwoOL #2 (1,083); (9) TwoOL #3 (1,088). In each of the nine datasets, enriched CLIP regions were sorted based on their FDR significance values as defined by PeakRanger. In order to discover novel binding motifs that may be enriched in each of the nine CLIP-seq datasets we employed MEME-ChIP tool that employs both, the MEME and DREME algorithms for identifying de-novo binding motifs (Bailey et al., 2009; Ma et al., 2014). Since the MEME-ChIP tool functions most efficiently when introduced with datasets containing up to 600 sequences (Ma et al., 2014), and due to the fact that 6 of our 9 datasets were 4-17 fold larger, we created a pipeline that receives a large-sized dataset of enriched CLIP regions, splits it into equal-sized batches (typically of 600 sequences per batch), and then, in parallel per each of the batches, fetched with Bedtools (Quinlan and Hall, 2010), the strand-specific FASTA sequences (100 bp around the summit point of each enriched region), and executes the MEME-ChIP tool in strand-specific mode (“-norc”). Given that enriched CLIP regions within each of the CLIP-seq datasets manifest a wide range of significance (FDR) values, each of the large-sized datasets was split based on equal-sized intervals across the FDR-sorted dataset, allowing an overall balanced representation of significance values of CLIP regions throughout all batches. Thus, the three “Individuals” CLIP-seq datasets (#1, #2, #3) were processed as 14, 13, and 16 batches, respectively, whereas the three “OneOL” CLIP-seq datasets (#1, #2, #3) were processed as 5, 4, and 5 batches, respectively. Each of the three “TwoOL” CLIP-seq datasets (containing 1,100 enriched regions) was processed as one batch.
Per each of the analyzed CLIP region batches MEME-ChIP tool determined the enrichment of several binding motifs. All novel motifs identified under each of the three categorical groups, namely, “Individuals”, “OneOL”, or “TwoOL”, were pooled together, yielding motif pools of 158, 48, and 19 motifs, respectively. Next, all de-novo motifs identified under each categorical group were subjected to multiple motif alignment analysis employed by the similarity-clustering tool, STAMP (Mahony and Benos, 2007). In each case, STAMP analysis, employed in strand-specific mode (“-forwardonly”), generated a phylogenetic newick tree that was constructed by comparing strand-specific similarity of binding motifs. Phylogenetic Newick trees were then depicted by employing the Molecular Evolutionary Genetics Analysis (MEGA) software (Tamura et al., 2013). In addition, our pipeline employed “SeqLogo” Bioconductor package (Bembom O. seqLogo: Sequence logos for DNA sequence alignments. R package version 1.34.0.) for generating a sequence logo for each of the enriched binding motifs. Next, we viewed the Newick tree of each of the categorical groups, and based on its branch structure grouped together neighboring motifs that share pattern similarity. We then re-subjected each of the groups containing similar motifs to STAMP analysis that generated a unique generalized FBP (Familial Binding Profile) model reflecting the general profile of all binding motifs within each group. FBP analysis was performed redundantly for each group—“Individuals”, “OneOL”, and “TwoOL”. The “individuals” FBPs were seen to fall within FBPs identified by the “OneOL” and “TwoOL” groups, strongly suggesting that the motifs from “Individuals” datasets (obtained from a single library) resembled those arising from the OneOL and TwoOL (more inclusive) datasets. Altogether, STAMP analysis of the three categorical groups yielded 24 FBPs, namely, 10 “Individuals” FBPs, 7 “OneOL” FBPs, and 7 “TwoOL” FBPs (
To statistically analyze over-representation of 24 mouse CBX7 FBPs in each of the three original mouse CBX7 libraries, we first assembled a motif library of 24 position weight matrices (PWMs) by combining the all FBPs from three datasets. To enable further downstream the tracking of dataset that originally yielded each of the de-novo FBPs, in addition to being labeled by a serial number, FBPs were labeled as either “Indiv.”, “OneOL”, or “TwoOL”.
By utilizing Bedtools, we fetched, per each of the three CLIP libraries, the FASTA sequences of the enriched CLIP region. Next, we used CLOVER (Frith et al., 2004) at a strand-specific mode (−z=1) to detect binding motifs that were enriched in mouse CBX7 CLIP regions. Per each of the three CLIP libraries CLOVER determined the statistical enrichment of each mouse FBPs relative to two background sets that were constructed from the entire transcriptome coverage obtained from two separate RNA-seq experiments conducted in the same cell line that was employed for conducting CLIP experiments. Each of the reported binding motifs was given a score (“raw score”) based on its predicted binding energy, and two p-value significance scores, each corresponding to the one background file (Frith et al., 2004). FBPs with raw scores higher than 6 and two significant enrichment score (p≤0.05) were selected for further analysis.
In addition, we assembled a library of 1,179 Known PWMs, by combining the RNA-binding motifs in the compendium of RBP recognition motifs (Ray et al., 2013), together with DNA binding motifs in the JASPAR database (Sandelin et al., 2004), and those in recently reported sets of PWMs for mouse transcription factors (Badis et al., 2009) (Wei et al., 2010; Xie et al., 2010). By applying the same parameters used for detecting enrichment of FBPs, we employed CLOVER for identifying enrichment of known motifs within the CBX7 CLIP regions. Per each of the enriched FBPs and Known motifs we summarized the number of binding sites hits identified within each library, and by dividing this number by the total number of library's CLIP regions, we obtained a “prevalence score” for each of the FBPs and Known motifs. We combined the output parameters obtained from CLOVER analysis of three CLIP libraries into one database, and sorted FBPs and Known motifs based on four scoring criteria: (1) Number of libraries in which a motif was significantly enriched (p≤0.05); (2) Average significance score (p-value); (3) Average prevalence score; (4) Average raw score. We discard all FBPs and Known motifs that manifested an inverse significance relative to background datasets (p≥0.95) in at least one dataset, and all motifs that their average prevalence score was under 5%. Based on this sorting procedure all qualified motifs were ranked (with motif #1 represent the motif with the best scores). Altogether, 11 FBPs (out of the originally introduced 24 mouse FBPs), and 80 Known motifs were met our criteria and found significantly enriched in at least one of the mouse CBX7 CLIP datasets (for known motifs a literature survey that determined their previously suggested role in RNA metabolism and function was additionally implemented as part of the filtration procedure). Among the 11 qualified mouse FBPs, 8 were significantly enriched in 3 CLIP libraries, while 3 were enriched in 2 libraries. Among 80 known motifs, 29 (36%), 29 (36%), and 22 (27%) motifs were enriched in three, two, and single CLIP libraries, respectively. Interestingly, 53 of these enriched Known motifs were RNA-binding motifs that previously reported as part of the compendium of RBP recognition motifs (Ray et al., 2013).
In order to determine whether specific FBPs could be grouped together into a higher-ordered motif family, also dubbed as “FAM” (FBPs Association Module), by employing STAMP analysis over the 11 qualified mouse FBPs we obtained a phylogenetic tree that identified the presence of four highly-ordered FAMs, which we named: FAM1 (composed of FBP2_Indiv., FBP5_TwoOL, FBP7_OneOL, FBP2_TwoOL); FAM2 (composed of FBP4_TwoOL, FBP3_TwoOL); FAM3 (composed of FBP5_OneOL, FBP9_Indiv.); FAM4 (composed of FBP7_TwoOL, FBP10_Indiv., FBP6_OneOL).
Basing on two separate CLIP experiments that were performed, two human CBX7 CLIP-seq datasets were raised, containing the following numbers of PeakRanger enriched CLIP regions aligned to positive and negative strands, respectively (after excluding rDNA and mtDNA sequences): #1: (2,552, 2,125), #2: (399, 490). As described above, Peak summits were used as anchoring positions for stretching 100 bp strand-specific regions around them (±50 bp). By applying identical computational tools and similar analytic steps as these described above for mouse CBX7-CLIP, we carried out identification of de novo binding motifs of Human-CBX7. Altogether, this analysis yielded 122 de novo binding motifs that were subsequently subjected to STAMP similarity-clustering analysis (see above), generating 27 human FBPs. We utilized CLOVER for determining the statistical enrichment of each of 27 human FBPs, in addition to 1,179 Known PWMs (see above) relative to two background sets that were constructed from the transcriptome of human HEK-293 cells. After filtering out motifs that were insignificantly enriched (p>0.05), or manifested presence lower than 5%, we obtained 9 human FBPs, and 50 Known motifs that met our criteria. Next, we utilized STAMP analysis (see above) for identifying motif similarities among 9 enriched Human FBPs and 11 enriched mouse FBPs. In parallel, we carried out STAMP matching analysis between 9 enriched Human FBPs and 50 Known binding motifs (Ray et al., 2013).
To define the global distribution of each of the four FAMs we extracted from CLOVER output data files of each of the three CLIP libraries, the genomic coordinates of all 11 qualified FBPs, grouped by FAMs. We then employed CEAS analysis (Ji et al., 2006; Shin et al., 2009), using the mm9 KnownGenes database, and as background dataset we used a merged transcriptome coverage track obtained from two RNA-seq experiments conducted in the same cell line that was employed for conducting CLIP experiments.
FAM-Occupancy in CLIP Regions Compare to Their Corresponding Full-Span Genomic Features
In order to determine the potential contribution of each of the four FAMs to transcripts binding to CBX7, we first pooled together the CLOVER output data of all three CLIP libraries and grouped them by FAM. By employing R packages (“GenomicFeatures”, “Bsgenome.Mmusculus.UCSC.mm9”, and NCBI37/mm9 knownGenes genome assembly), we extracted coordinates of genomic features (3′UTR, 5′UTR, coding sequences (CDs), and introns). We then annotated all FBPs (FAMs) overlapping with mouse genes to their corresponding genomic features. Next, we calculate per each CLIP transcript its “FAM occupancy score”. To this end, we aggregated per each gene, and per a given genomic feature, all FAM-hits that were detected within each of the transcripts that were obtained by CLIP. In case that the genomic feature was composed of multiple frames per a single gene (such in the case of introns that composed of multiple frames per a single gene), FBP-hits were aggregated from all frames that were retrieved by CLIP. Then, by dividing the total number of FAM-hits identified at a given CLIP fragment by the length of the genomic feature that CLIP FAM resides in, we generated per each gene a “CLIP-associated FAM-occupancy score”. Next, we calculated a “full-length genomic feature-associated FAM-occupancy score”. To this end, per each of the transcripts retrieved by CLIP, we mapped all putative FAM-hits across the entire span of the genomic feature. Thus, per a given genomic feature we mapped all real FAM-hits (overlapping with regions obtained by CLIP) in addition to predicted FAM-hits (excluded from regions obtained by CLIP) that reside within the full span of a genomic feature. Finally, we calculate “FAM-occupancy Ratio” per each CLIP transcripts by dividing CLIP FAM-occupancy score by genomic feature-associated FAM-occupancy score.
We summarized the results of this analysis by as a series of boxplots that describe the distribution of the FAM-occupancy ratio within the four FAM groups across each of the tested genomic features. The analysis indicated that FAM2, when integrated within CLIP transcripts provides in general higher potency for transcripts to bind CBX7, as compare to than all other FAMs. This potency of FAM2 was observed across all tested genomic features.
Analysis of FAMs that Reside within the Same CLIP Fragment
We noticed that some CLIP transcripts harbor more than one FAM per a fragment (as indicated by count histogram of number of FAMs residing adjacently on the same dCLIP fiber;
In the scope of a separated analysis we counted per all four FAMs that number of appearances within each genomic feature or outside any genomic feature (“No Feature”). We plotted these counts as a barplot, grouped by FAM type, and according to genomic features.
To assess the contribution of FAMs co-clustering on the same dCLIP transcript we split CLIP fragments into two batches, namely: Single FAMs per CLIP fiber (FAMs with zero adjacent FAMs on the same CLIP fiber), and Multiple FAMs per CLIP fiber (FAMs with one or more adjacent FAMs on the same CLIP fiber). Then, we analyzed separately the FAM-likelihood ratios per each of these two batches, for each of the four FAMs, within each genomic feature.
Metagene Analysis of FAM Pairs
To determine whether FAMs have a tendency to reside next to each other as pairs in preferential manner, we plotted per each of the four FAMs its distribution of distances from its center to the center of its paired FAM. We conducted this analysis on a single bp resolution, across a window of ±200 bp (X-axis), presenting the count number for each FAM-pair on the Y-axis.
Metagene Analysis of FAM Sites for Profiling icSHAPE Signals
To determine whether CBX7 CLIP transcripts may adopt specific secondary RNA structures, we took advantage of the publically available RNA structural signatures established via in vivo and in vitro click selective 2′-hydroxyl acetylation and profiling experiments (icSHAPE). The GRCm38/mm10 bigwig data files corresponding to in vivo and in vitro icSHAPE structural profiles were obtained from GEO database (record GSE60034) (Spitale et al., 2015), and converted to the NCBI37/mm9 assembly by employing UCSC tools, bigWigToBedGraph following by liftOver. Using in house codes, we calculated separate metagene structure profiles around an anchor position that was defined as the center of each of four FAM binding motifs. Individual structural profiles of in vivo and in vitro icSHAPE scores were generated, at single nucleotide resolution, by accumulating all icSHAPE scores detected within the limited scope of 25 nucleotides upstream and downstream. The average icSHAPE profile was then generated by division of the accumulative icSHAPE score per single nucleotide by the total number of ±25 bp FAM regions containing a total icSHAPE score higher than zero (>0). Thus, 50 bp regions that harbor no icSHAPE signal around the center of FAM motif were excluded from this analysis. For contrasting the profiles of FAM motifs that were identified in CLIP regions (“Real FAMs”), against a control cohort of FAM motifs that were not identified in CLIP regions (“Predicted FAMs”), we took advantage of the previously established motif binding sites database that contains both real and predicted motifs. As described above, per each of the enriched CLIP regions that were found by our analysis to harbor FBP binding site, we scanned for predicted FBPs throughout the entire span of the genomic feature in which the real CLIP FBP was reside in. Thus, by employing the database of predicted FAM binding sites we matched per each of the ±25 bp “Real FAM” regions an equivalent number of “Predicted FAM” regions (±25 bp) that proved to harbor icSHAPE signal within the 50 bp detection window. By employing these analytic criteria, we contrasted icSHAPE profile of “Real FBPs” cohort against icSHAPE profile of equal-size “Predicted FBPs” cohort (
To further determine the contribution of co-clustering of FAMs within the same dCLIP transcript we split CLIP fragments into two batches, namely: Single FAMs per CLIP fiber (FAMs with zero adjacent FAMs on the same CLIP fiber), and Multiple FAMs per CLIP fiber (FAMs with one or more adjacent FAMs on the same CLIP fiber), and performed the distribution analysis of icSHAPE reactivity per each of these batches as depicted in
The Denaturing CLIP (dCLIP) Methodology
Our original goal was to identify RNA interactomes for both canonical and non-canonical PRC1. We therefore initially used both CBX7 (canonical) and RYBP (non-canonical) as bait using conventional CLIP methodologies and CBX7-specific or RYBP-specific antibodies for the pulldown. However, all initial attempts failed due to high background, as evidenced by multiple bands that span the length of SDS PAGE gel (transferred to a CLIP membrane;
We introduced bio-tagged CBX7 and RYBP into ES cells stably expressing BirA biotinylase and performed “denaturing CLIP” or “dCLIP” with these features (
Because of highly stringent denaturing conditions made possible with dCLIP, we asked if it were possible to skip the SDS-PAGE and membrane transfer steps entirely, as these steps partially served to eliminate RNA-protein interactions sensitive to denaturing SDS conditions as well. Furthermore, in principle, the exclusion of the additional steps could improve recovery of extremely limited quantities of RNA that are typically associated with epigenetic complexes. To test the possibility, we eluted RNAs directly from streptavidin beads using proteinase K treatment. However, we found that the purification by SDS-PAGE and membrane transfer was absolutely necessary in the dO ES cell samples, as direct elution from beads resulted in high background for some cellular samples (
Elution of RYBP-interacting RNAs also produced a heterogeneous population, but lower levels of RNA were eluted overall (
Peak-calling using PeakRanger Software (Feng et al., 2011) revealed 8,000-10,000 statistically significant peaks in three biological replicates (
We compared the dCLIP tags to expression level of the respective RefSeq transcripts (input RNA-seq). Among transcripts without reproducible dCLIP binding, we identified a cohort of 2,078 transcripts that possessed similar expression levels as 1,333 transcripts with reproducible CLIP tags (green and black dots;
We next examined how CBX7-binding sites in the RNA (dCLIP-seq) relate to CBX7's chromatin binding sites (ChIP-seq). Previous work demonstrated that CBX7 tends to to bind large number of genomic loci in mouse ES cells {Morey, 2012 #1198}. Therefore, CBX7-RNA interactions identified by dCLIP method might theoretically arise from non-specific cross-linking between chromatin-bound CBX7 and RNAs transcribed in the vicinity. To rule out this possibility, we performed CBX7 ChIP-seq using the same ES cells (
This percentage was significantly lower than that for bulk expressed transcripts in the ES cells (
With a median footprint of 171 nt, the short and reproducible binding sites for CBX7 raised the possibility of defining consensus motifs for CBX7-containing PRC1 complexes. To deduce consensus motifs in the RNA, we performed comparative sequence analysis of CBX7-binding peaks from three dCLIP biological replicates (
If the deduced motifs represented a true CBX7 RNA-binding consensus, we should expect to see enrichment of the FAM motifs in the 3′UTR. Indeed, consistent with CEAS analysis of the dCLIP peaks (
Another consideration is that CBX7 could have multiple contact points within one transcript, potentially contacting different faces of the RNA via different motifs. To test the latter possibility, we asked whether the motifs have a tendency to congregate on the same CLIP fragment. Analysis of all pairwise combinations of the FAMs revealed that they co-clustered, creating motif-pairs separated by ≤50 nt (
Given that both 5′ and 3′ UTRs are typically bound by large number of proteins (Glisovic et al., 2008), we asked how the CBX7 motifs might be related to binding motifs of known RNA-binding proteins. A similarity matching analysis of the 4 FAMs against a panel of >1,000 known binding motifs uncovered significant overlap (
We also asked whether the binding sites possess structural features by taking advantage of structural profiles established in mouse ES cells via click selective 2′-hydroxyl acetylation and profiling experiments (icSHAPE) (Spitale et al., 2015). icSHAPE-seq allows probing of RNA secondary structure both in vivo and in vitro and favors single-stranded or flexible RNA regions. icSHAPE-seq also offers advantages over DMS-seq and Cirs-seq (Incarnato et al., 2014; Rouskin et al., 2014), as it is reactive to all four nucleotides, thereby enabling the capture of RNA secondary structures at a transcriptome-wide level at higher resolution (Spitale et al., 2015). For each of the four FAMs, icSHAPE profiles were markedly different from one another (
We then repeated the analysis for dCLIP fibers with clustered FAM motifs (
Next we turned to experimental systems to validate and understand the nature of the CBX7-3′UTR interactions. First, we sought to confirm select interactions using a different method of in vivo RNA pulldown and using antibodies to a different epitope of the tagged protein (as opposed to using the biotin tag to pull down CBX7). Native RIP with qPCR confirmed the enrichment of Dusp9, Calm2, and Tug1 RNAs in multiple independent biological replicates (
Second, to confirm direct interactions between CBX7 and various 3′UTRs, we performed RNA electrophoretic mobility shift assays (EMSA) using CBX7 protein purified from baculovirus and purified in vitro transcribed RNAs corresponding to dCLIP peaks. We tested three representative transcripts, Calm2, Dusp9, and Dcaf12l1 (
We next tested the relevance of the bioinformatically predicted FAM motifs. We turned to footprints with single FAMs in order to simplify the analysis. For the FAM3 motif in the 3′UTR of Nucks1 mRNA, CBX7 shifted the RNA fragment and the shift was reduced by Nucks1 cold competitors (
We explored potential functions of the CBX7-3′UTR interactions. Given that PRC1 is generally involved in gene repression (Simon and Kingston, 2013), we asked whether the RNA-binding activity of CBX7 may be involved in recruiting PRC1 to silence genes. To test this idea, we attempted to block the CBX7-3′UTR interactions and designed antisense oligonucleotides (ASO) comprising interspersed DNA bases and locked nucleic acids (LNA) bases to create “LNA mixmers” that are not subject to RNaseH-mediated target degradation and can therefore stably associate with target sequences (Sarma et al., 2010). For each transcript, we designed a pool of LNA mixmers to the corresponding 3′UTR peaks (
The repressive activity of PRC1 has been linked to both the H2AK119 ubiquitylation function and to chromatin compaction (Simon and Kingston, 2013). To understand how LNA treatment enhanced gene activity, we performed ChIP-qPCR to ask whether there were locus-specific changes to CBX7 recruitment and H2AK119Ub. Interestingly, we observed no changes in CBX7 recruitment and H2AK119 ubiquitylation at either Calm2 or Dcaf12l1 after treatment with corresponding gene-specific LNAs (
Next, we examined the effect of LNA oligomers on CBX7 binding to target RNAs in vitro. Intriguingly, while RNA EMSA showed that pre-incubating RNA with gene-specific LNAs resulted in an upward shift of the transcripts, as expected (blue arrows,
To determine whether the LNA-mediated gene upregulation depended on CBX7 in vivo, we introduced the LNAs into wildtype versus Cbx7−/− ES cells (Cheng et al., 2014; Zhen et al., 2016) (
The localization of CBX7 to the 3′UTR (
Next, we applied dCLIP methodology to human CBX7 protein to assess whether the human orthologue shares RNA binding potential and to determine whether consensus motifs can be independently deduced from the human RNA-protein interactions. Although hCBX7 and mouse CBX7 (mCBX7) share CD and PC boxes, hCBX7 is 58 amino acids longer than mCBX7 and is therefore epitopically different (
Finally, we examined the relationship between mCBX7/hCBX7 transcripts as defined by dCLIP and BMI1 transcripts as defined by gradient RNA immunoprecipitation (GRIP) in human HeLa cells (Ray et al., 2016). GRIP method involves formaldehyde cross-linking and gradient purification of chromatin fraction with subsequent immunoprecipitation of chromatin-bound RNAs using antibodies against the BMI1 subunit of PRC1. Despite substantial differences in methodology, there was considerable overlap, with 1,777 transcripts shared between hCBX7 and hBMI1 (Table C). This represented to nearly half of hCBX7-interacting transcripts—the 3′UTR of IRAK1 being one example (
Aranda, S., Mas, G., and Di Croce, L. (2015). Regulation of gene transcription by Polycomb proteins. Sci Adv 1, e1500737.
Badis, G., Berger, M. F., Philippakis, A. A., Talukder, S., Gehrke, A. R., Jaeger, S. A., Chan, E. T., Metzler, G., Vedenko, A., Chen, X., et al. (2009). Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720-1723.
Bag, J., and Bhattacharjee, R. B. (2010). Multiple levels of post-transcriptional control of expression of the poly (A)-binding protein. RNA Biol 7, 5-12.
Bailey, T. L., Boden, M., Buske, F. A., Frith, M., Grant, C. E., Clementi, L., Ren, J., Li, W. W., and Noble, W. S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic acids research 37, W202-208.
Beltran, M., Yates, C. M., Skalska, L., Dawson, M., Reis, F. P., Viiri, K., Fisher, C. L., Sibley, C. R., Foster, B. M., Bartke, T., et al. (2016). The interaction of PRC2 with RNA or chromatin is mutually antagonistic. Genome Res 26, 896-907.
Bernstein, E., Duncan, E. M., Masui, O., Gil, J., Heard, E., and Allis, C. D. (2006). Mouse polycomb proteins bind differentially to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Molecular and cellular biology 26, 2560-2569.
Blackledge, N. P., Rose, N. R., and Klose, R. J. (2015). Targeting Polycomb systems to regulate gene expression: modifications to a complex story. Nat Rev Mol Cell Biol 16, 643-649.
Cheng, B., Ren, X., and Kerppola, T. K. (2014). KAP1 represses differentiation-inducible genes in embryonic stem cells through cooperative binding with PRC1 and derepresses pluripotency-associated genes. Mol Cell Biol 34, 2075-2091.
Feng, X., Grossman, R., and Stein, L. (2011). PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC bioinformatics 12, 139.
Frith, M. C., Fu, Y., Yu, L., Chen, J. F., Hansen, U., and Weng, Z. (2004). Detection of functional DNA motifs via statistical over-representation. Nucleic acids research 32, 1372-1381.
Giresi, P. G., Kim, J., McDaniell, R. M., Iyer, V. R., and Lieb, J. D. (2007). FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17, 877-885.
Glisovic, T., Bachorik, J. L., Yong, J., and Dreyfuss, G. (2008). RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett 582, 1977-1986.
Grau, D. J., Chapman, B. A., Garlick, J. D., Borowsky, M., Francis, N. J., and Kingston, R. E. (2011). Compaction of chromatin by diverse Polycomb group proteins requires localized regions of high charge. Genes & development 25, 2210-2221.
Hendrickson, D., Kelley, D. R., Tenen, D., Bernstein, B., and Rinn, J. L. (2016). Widespread RNA binding by chromatin-associated proteins. Genome Biol 17, 28.
Incarnato, D., Neri, F., Anselmi, F., and Oliviero, S. (2014). Genome-wide profiling of mouse RNA secondary structures reveals key features of the mammalian transcriptome. Genome Biol 15, 491.
Jeon, Y., and Lee, J. T. (2011). YY1 tethers Xist RNA to the inactive X nucleation center. Cell 146, 119-133.
Ji, X., Li, W., Song, J., Wei, L., and Liu, X. S. (2006). CEAS: cis-regulatory element annotation system. Nucleic acids research 34, W551-554.
Kaneko, S., Bonasio, R., Saldana-Meyer, R., Yoshida, T., Son, J., Nishino, K., Umezawa, A., and Reinberg, D. (2014a). Interactions between JARID2 and noncoding RNAs regulate PRC2 recruitment to chromatin. Molecular cell 53, 290-300.
Kaneko, S., Son, J., Bonasio, R., Shen, S. S., and Reinberg, D. (2014b). Nascent RNA interaction keeps PRC2 activity poised and in check. Genes & development 28, 1983-1988.
Kaneko, S., Son, J., Shen, S. S., Reinberg, D., and Bonasio, R. (2013). PRC2 binds active promoters and contacts nascent RNAs in embryonic stem cells. Nature structural & molecular biology.
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. (2002). The human genome browser at UCSC. Genome Res 12, 996-1006.
Khalil, A. M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B. E., van Oudenaarden, A., et al. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proceedings of the National Academy of Sciences of the United States of America 106, 11667-11672.
Kim, J., Cantor, A. B., Orkin, S. H., and Wang, J. (2009). Use of in vivo biotinylation to study protein-protein and protein-DNA interactions in mouse embryonic stem cells. Nat Protoc 4, 506-517.
Kung, J. T., Kesner, B., An, J. Y., Ahn, J. Y., Cifuentes-Rojas, C., Colognori, D., Jeon, Y., Szanto, A., del Rosario, B. C., Pinter, S. F., et al. (2015). Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Molecular cell 57, 361-375.
Lee, J. T., and Lu, N. (1999). Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cell 99, 47-57.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
Ma, W., Noble, W. S., and Bailey, T. L. (2014). Motif-based analysis of large nucleotide data sets using MEME-ChIP. Nat Protoc 9, 1428-1450.
Magistri, M., Faghihi, M. A., St Laurent, G., 3rd, and Wahlestedt, C. (2012). Regulation of chromatin structure by long noncoding RNAs: focus on natural antisense transcripts. Trends in genetics : TIG 28, 389-396.
Mahony, S., and Benos, P. V. (2007). STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic acids research 35, W253-258.
Marchese, D., de Groot, N. S., Lorenzo Gotor, N., Livi, C. M., and Tartaglia, G. G. (2016). Advances in the characterization of RNA-binding proteins. Wiley interdisciplinary reviews. RNA 7, 793-810.
Morey, L., Pascual, G., Cozzuto, L., Roma, G., Wutz, A., Benitah, S. A., and Di Croce, L. (2012). Nonoverlapping functions of the Polycomb group Cbx family of proteins in embryonic stem cells. Cell stem cell 10, 47-62.
Nicol, J. W., Helt, G. A., Blanchard, S. G., Jr., Raja, A., and Loraine, A. E. (2009). The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25, 2730-2731.
O'Loghlen, A., Munoz-Cabello, A. M., Gaspar-Maia, A., Wu, H. A., Banito, A., Kunowska, N., Racek, T., Pemberton, H. N., Beolchi, P., Lavial, F., et al. (2012). MicroRNA regulation of Cbx7 mediates a switch of Polycomb orthologs during ESC differentiation. Cell stem cell 10, 33-46.
Pinter, S. F., Sadreyev, R. I., Yildirim, E., Jeon, Y., Ohsumi, T. K., Borowsky, M., and Lee, J. T. (2012). Spreading of X chromosome inactivation via a hierarchy of defined Polycomb stations. Genome Res 22, 1864-1876.
Quinlan, A. R., and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.
Ramirez, F., Dundar, F., Diehl, S., Gruning, B. A., and Manke, T. (2014). deepTools: a flexible platform for exploring deep-sequencing data. Nucleic acids research 42, W187-191.
Ray, D., Kazan, H., Cook, K. B., Weirauch, M. T., Najafabadi, H. S., Li, X., Gueroussov, S., Albu, M., Zheng, H., Yang, A., et al. (2013). A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172-177.
Ray, M. K., Wiskow, O., King, M.J., Ismail, N., Ergun, A., Wang, Y., Plys, A. J., Davis, C. P., Kathrein, K., Sadreyev, R., et al. (2016). CAT7 and cat71 long non-coding RNAs Tune Polycomb Repressive Complex 1 Function During Human and Zebrafish Development. J Biol Chem.
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M., and Weissman, J. S. (2014). Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701-705.
Rozowsky, J., Euskirchen, G., Auerbach, R. K., Zhang, Z. D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., and Gerstein, M. B. (2009). PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol 27, 66-75.
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004). JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic acids research 32, D91-94.
Sarma, K., Levasseur, P., Aristarkhov, A., and Lee, J. T. (2010). Locked nucleic acids (LNAs) reveal sequence requirements and kinetics of Xist RNA localization to the X chromosome. Proc Natl Acad Sci USA 107, 22196-22201.
Shin, H., Liu, T., Manrai, A. K., and Liu, X. S. (2009). CEAS: cis-regulatory element annotation system. Bioinformatics 25, 2605-2606.
Sigova, A. A., Abraham, B. J., Ji, X., Molinie, B., Hannett, N. M., Guo, Y. E., Jangi, M., Giallourakis, C. C., Sharp, P. A., and Young, R. A. (2015). Transcription factor trapping by RNA in gene regulatory elements. Science 350, 978-981.
Simon, J. A., and Kingston, R. E. (2013). Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Molecular cell 49, 808-824.
Simon, J. M., Giresi, P. G., Davis, I. J., and Lieb, J. D. (2012). Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA. Nat Protoc 7, 256-267.
Spassov, D. S., and Jurecic, R. (2003). The PUF family of RNA-binding proteins: does evolutionarily conserved structure equal conserved function? IUBMB Life 55, 359-366.
Spitale, R. C., Flynn, R. A., Zhang, Q. C., Crisalli, P., Lee, B., Jung, J. W., Kuchelmeister, H. Y., Batista, P. J., Torre, E. A., Kool, E. T., et al. (2015). Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486-490.
Taliaferro, J. M., Lambert, N. J., Sudmant, P. H., Dominguez, D., Merkin, J. J., Alexis, M. S., Bazile, C., and Burge, C. B. (2016). RNA Sequence Context Effects Measured In Vitro Predict In Vivo Protein Binding and Regulation. Molecular cell 64, 294-306.
Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30, 2725-2729.
Tavares, L., Dimitrova, E., Oxley, D., Webster, J., Poot, R., Demmers, J., Berstarosti, K., Taylor, S., Ura, H., Koide, H., et al. (2012). RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target sites independently of PRC2 and H3K27 me3. Cell 148, 664-678.
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., Pimentel, H., Salzberg, S. L., Rinn, J. L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562-578.
Van Nostrand, E. L., Pratt, G. A., Shishkin, A. A., Gelboin-Burkhart, C., Fang, M. Y., Sundararaman, B., Blue, S. M., Nguyen, T. B., Surka, C., Elkins, K., et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13, 508-514.
Vierstra, J., Rynes, E., Sandstrom, R., Zhang, M., Canfield, T., Hansen, R. S., Stehling-Sun, S., Sabo, P. J., Byron, R., Humbert, R., et al. (2014). Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science 346, 1007-1012.
Wang, J., and Bell, L. R. (1994). The Sex-lethal amino terminus mediates cooperative interactions in RNA binding and is essential for splicing regulation. Genes & development 8, 2072-2085.
Wang, X., Goodrich, K. J., Gooding, A. R., Naeem, H., Archer, S., Paucek, R. D., Youmans, D. T., Cech, T. R., and Davidovich, C. (2017). Targeting of Polycomb Repressive Complex 2 to RNA by Short Repeats of Consecutive Guanines. Molecular cell 65, 1056-1067 e1055.
Warzecha, C. C., Sato, T. K., Nabet, B., Hogenesch, J. B., and Carstens, R. P. (2009). ESRP1 and ESRP2 are epithelial cell-type-specific regulators of FGFR2 splicing. Molecular cell 33, 591-601.
Wei, G. H., Badis, G., Berger, M. F., Kivioja, T., Palin, K., Enge, M., Bonke, M., Jolma, A., Varjosalo, M., Gehrke, A. R., et al. (2010). Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J 29, 2147-2160.
Woo, C. J., Maier, V. K., Davey, R., Brennan, J., Li, G., Brothers, J., 2nd, Schwartz, B., Gordo, S., Kasper, A., Okamoto, T. R., et al. (2017). Gene activation of SMN by selective disruption of lncRNA-mediated recruitment of PRC2 for the treatment of spinal muscular atrophy. Proc Natl Acad Sci USA 114, E1509-E1518.
Xie, Z., Hu, S., Blackshaw, S., Zhu, H., and Qian, J. (2010). hPDI: a database of experimental human protein-DNA interactions. Bioinformatics 26, 287-289.
Yap, K. L., Li, S., Munoz-Cabello, A. M., Raguz, S., Zeng, L., Mujtaba, S., Gil, J., Walsh, M. J., and Zhou, M. M. (2010). Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Molecular cell 38, 662-674.
Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nusbaum, C., Myers, R. M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.
Zhao, J., Ohsumi, T. K., Kung, J. T., Ogawa, Y., Grau, D. J., Sarma, K., Song, J. J., Kingston, R. E., Borowsky, M., and Lee, J. T. (2010). Genome-wide identification of polycomb-associated RNAs by RIP-seq. Molecular cell 40, 939-953.
Zhen, C. Y., Tatavosian, R., Huynh, T. N., Duc, H. N., Das, R., Kokotovic, M., Grimm, J. B., Lavis, L. D., Lee, J., Mejia, F. J., et al. (2016). Live-cell single-molecule tracking reveals co-recognition of H3K27 me3 and DNA targets polycomb Cbx7-PRC1 to chromatin. Elife 5.
Zovoilis, A., Cifuentes-Rojas, C., Chu, H. P., Hernandez, A. J., and Lee, J. T. (2016). Destabilization of B2 RNA by EZH2 Activates the Stress Response. Cell 167, 1788-1802 e1713.
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/750,503, filed on Oct. 25, 2018. The entire contents of the foregoing are hereby incorporated by reference.
This invention was made with Government support under Grant No. GM090278 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62750503 | Oct 2018 | US |