Provided herein are compositions and methods for identifying endogenous DNA-DNA interactions. In particular, compositions and methods are provided for performing Capture of Associated Targets on CHromatin (CATCH) assays which use efficient capture and enrichment of specific genomic loci of interest through hybridization and subsequent purification via complementary oligonucleotides, without the need for enzymatic digestion or ligation steps.
Chromatin architecture is a key regulator of many aspects of cell biology, including gene transcription, DNA repair processes, DNA replication, and long-term processes such as X-chromosome inactivation (ref 1; incorporated by reference in its entirety). A host of transcription factors, enzymes, scaffolding proteins, and other factors ensure that local chromatin architecture is a dynamic environment that directly, and indirectly regulates the complex cellular processes noted above (ref. 2; incorporated by reference in its entirety). This complex network of chromatin architecture is largely composed of the non-protein-coding genome. Estimates of the true functional percentage of our genome range from 10% to as much as 80% (refs. 3-4; incorporated by reference in their entireties), but despite this discrepancy, only a small fraction of the genome has been evolutionarily conserved, largely in those protein-coding regions. Transcriptional enhancers are disproportionately common amongst evolutionarily conserved non-protein coding sequences (ref. 5; incorporated by reference in its entirety).
Transcriptional enhancer regions are hubs of transcription factor binding, and are thought to underlie a significant portion of the tissue-specific expression of many gene targets (refs. 5-7; incorporated by reference in their entireties). Whereas gene promoters are enriched for H3K4me3, these enhancer regions contain almost exclusively the mono-methylated version of H3K4 (H3K4me1). In addition it has been found that most active enhancers are characterized by the increased presence of H3K27ac. The epigenetic enhancer signature (ref 8; incorporated by reference in its entirety) has greatly contributed to the genome-wide prediction of transcriptional enhancer sites. It has become increasingly clear that the majority of transcriptional enhancers are not located within or directly adjacent to the genes they modulate, but are typically located at great linear distance. Despite the long-range linear distance (in base pairs) between two interacting loci—in many cases, hundreds or thousands of kilobases—the prevailing model is that of a physical interaction between the two sites. This requires long-distance genomic looping to occur, whereby two linearly distant genomic loci come into close proximity, and are held together by complexes of proteins and transcription factors (ref 9; incorporated by reference in its entirety). While the looping mechanism of DNA-DNA interaction has been postulated for nearly four decades (ref 10; incorporated by reference in its entirety), it has been notoriously difficult to study.
Existing assays rely on random end-ligation at very low DNA concentrations after restriction digestion, which reduces assay reproducibility, and results in significant data loss (ref. 12; incorporated by reference in its entirety). These assays are easy to corrupt, difficult to troubleshoot, and are impractical to the average research laboratory (ref. 13; incorporated by reference in its entirety). Thus, the field of chromatin interaction is lacking in tools to facilitate mechanistic understanding of this important process.
Provided herein are compositions and methods for identifying endogenous DNA-DNA interactions. In particular, compositions and methods are provided for performing Capture of Associated Targets on CHromatin (CATCH) assays which use efficient capture and enrichment of specific genomic loci of interest through hybridization and subsequent purification via complementary oligonucleotides, without the need for enzymatic digestion or ligation steps.
Experiments conducted to develop a protocol (referred to herein as “Capture of Associated Targets on Chromatin” (“CATCH”)) that would overcome the current technological limitations (e.g., enzymatic digestion, ligation, etc.) that constrain the field of DNA-DNA interaction research. CATCH utilizes chemical crosslinking to capture naturally occurring nucleic acid-protein interactions, an unbiased sonication approach to shear DNA, enrichment of a genomic locus of interest through hybridization, and purification using a complementary labeled (e.g., biotinylated) oligonucleotide. Due to the crosslinking (e.g., formaldehyde crosslinking), this procedure purifies both the targeted DNA sequence and any interacting nucleic acid segments. Following de-crosslinking, the resulting DNA sample is subjected to analysis (e.g., PCR, sequencing, etc.) to identify interacting fragments.
Experiments were conducted during development of embodiments herein to demonstrate the effectiveness of CATCH by interrogating a downstream enhancer of the human SIAH2 gene, which had been previously analyzed using ChIA-PET. SIAH2 (3q25.1) is an E3 ubiquitin ligase whose up-regulation correlates with ER activation and has been linked to poor outcome in breast cancer patients (refs. 14-15; incorporated by reference in their entireties). Currently, SIAH2 gene transcriptional control is poorly understood: the only confirmed genomic loop within SIAH2 occurs between an intronic estrogen response element (ERE) and downstream ERE (ref. 16; incorporated by reference in its entirety), however, multiple ER binding sites are present within and around the gene. In order to resolve the interactions involved in SIAH2 regulation, as well as to demonstrate the looping events around SIAH2, next-generation sequencing was performed after CATCH of the SIAH2 downstream enhancer. In addition, experiments conducted during development of embodiments herein demonstrate the reproducibility of CATCH using distinct pull-downs near the SIAH2, EIF4A1, and MYC genes. These experiments also show that CATCH-seq peaks are overwhelmingly found overlapping with enhancers (H3K4me1 and H3K27ac enriched) and estrogen receptor (ER) binding sites. Finally, these experiments reveal unique subsets of physically interacting gene promoters that are shown to be transcriptionally co-expressed over thousands of data sets using the SEEK search system (ref. 17; incorporated by reference in its entirety).
In some embodiments, provided herein are methods comprising one or more (e.g., all) of the steps of: (a) fixing (e.g., crosslinking nucleic acids and/or protein within) a cell population to capture nucleic acid-protein-nucleic acid interactions; (b) sonicating the cell population to shear the nucleic acid into small fragments; (c) hybridizing nucleic acid target sequences to a labeled oligo; (d) separating the hybridized nucleic acid from unhybridized nucleic acid, thereby enriching for target sequences and associated protein-nucleic acid complexes; (e) de-crosslinking; and (f) analyzing target sequences and any associated nucleic acid sequences. In some embodiments, the cell population is formaldehyde fixed. In some embodiments, the labeled oligo is a biotinylated oligo. In some embodiments, the hybridized nucleic acid is separated from the unhybridized nucleic acid using streptavidin-linked magnetic beads. In some embodiments, analyzing target sequences and any associated nucleic acid sequences comprises performing PCR amplification. In some embodiments, analyzing target sequences and any associated nucleic acid sequences comprises next-generation sequencing. In some embodiments, the method results in the identification of DNA sequences physically associated with the target acid target sequences (via proteins). In some embodiments, the method does not comprise an enzymatic digestion step. In some embodiments, the method does not comprise a ligation step. In some embodiments, the target nucleic acid and/or associated nucleic acid is DNA. In some embodiments, the target nucleic acid and/or associated nucleic acid is RNA.
In some embodiments, provided herein are compositions, systems, or kits for performing the methods described herein (e.g., labelled oligonucleotides, fixing reagents, amplification reagents, sequencing reagents, tags and corresponding capture reagents, etc.).
The terminology used herein is for the purpose of describing the particular embodiments only, and is not intended to limit the scope of the embodiments described herein. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. However, in case of conflict, the present specification, including definitions, will control. Accordingly, in the context of the embodiments described herein, the following definitions apply.
As used herein and in the appended claims, the singular forms “a”, “an” and “the” include plural reference unless the context clearly dictates otherwise. Thus, for example, reference to “an oligonucleotide” is a reference to one or more oligonucleotides and equivalents thereof known to those skilled in the art, and so forth.
As used herein, the term “comprise” and linguistic variations thereof denote the presence of recited feature(s), element(s), method step(s), etc. without the exclusion of the presence of additional feature(s), element(s), method step(s), etc. Conversely, the term “consisting of” and linguistic variations thereof, denotes the presence of recited feature(s), element(s), method step(s), etc. and excludes any unrecited feature(s), element(s), method step(s), etc., except for ordinarily-associated impurities. The phrase “consisting essentially of” denotes the recited feature(s), element(s), method step(s), etc. and any additional feature(s), element(s), method step(s), etc. that do not materially affect the basic nature of the composition, system, or method. Many embodiments herein are described using open “comprising” language. Such embodiments encompass multiple closed “consisting of” and/or “consisting essentially of” embodiments, which may alternatively be claimed or described using such language.
Provided herein are compositions and methods for identifying endogenous DNA-DNA interactions. In particular, compositions and methods are provided for performing Capture of Associated Targets on CHromatin (CATCH) assays which use efficient capture and enrichment of specific genomic loci of interest through hybridization and subsequent purification via complementary oligonucleotides, without the need for enzymatic digestion or ligation steps.
CATCH is highly reproducible, does not require enzymatic digestion or ligation, exhibits base pair resolution, and is completed in less than 24 hours. In addition to capture and analysis of a single locus, CATCH finds use in assessing genome-wide relationships when paired with next-generation sequencing. CATCH finds use in analysis of a single chromosome, or, with deeper sequencing coverage, genome-wide CATCH is achieved.
CATCH utilizes a labeled capture oligonucleotide. In some embodiments, the label is a tag or handle that allows for capture of the oligonucleotide and a target hybridized thereto. In some embodiments, a capture oligonucleotide is a biotinylated capture oligonucleotide. In experiments conducted during development of embodiments herein, biotin was attached at the 5′ end of the oligonucleotide using a 15-atom triethylene glycol (TEG) spacer that eliminated steric hindrance between the biotin moiety and the target DNA-protein complexes, allowing full accessibility to the streptavidin magnetic beads. Other linkers and linker lengths find use in embodiments herein. In some embodiments, a desthiobiotin moiety is used, allowing for a gentler elution from the beads with the addition of excess biotin.
Experiments conducted during development of embodiments herein demonstrated that CATCH is capable of detecting previously unreported long-distance chromatin interactions. Earlier studies demonstrated an interaction between the intronic EREB and the ERE downstream of the SIAH2 gene (ref. 16; incorporated by reference in its entirety) and CATCH confirms this interaction. However, CATCH-seq also demonstrated the existence of a highly complex web of interactions between the downstream enhancer of SIAH2 and multiple enhancers and promoters spanning the entirety of chromosome 3; this finding also held true for loci on chromosomes 8 and 17. While the data presented herein support the concept that gene promoters are being physically linked, the biochemical data paint a slightly different picture. Traditionally, gene promoters are thought to span approximately 5 kbp upstream of a gene's TSS, however CATCH-seq data demonstrates that the average chromatin looping interaction involving the TSS region of genes occurs between 50-200 bp downstream of the TSS. The identification of such interactions has not been demonstrated by existing techniques.
Experiments conducted using CATCH indicate that single enhancers regulate a host of genes, even at linear distances of multiple millions of base pairs. The data demonstrate that subsets of genes, spanning entire chromosomes, physically associate with the same enhancer, and that a highly significant portion of those genes are co-expressed within the cell. In relation to these transcriptionally-associated DNA-DNA interactions, the CTCF protein has been implicated in mediating such looping (ref 28; incorporated by reference in its entirety). Experiments were conducted during development of embodiments herein to assess to what degree CATCH peaks and CTCF binding sites were adjacent or overlapping in T47D cells. There was very little overlap found between CTCF binding sites and CATCH peaks (
In some embodiments, methods of fixing protein-protein, protein-DNA, protein-RNA, RNA-DNA, RNA-RNA, DNA-DNA interactions are employed. In some embodiments, fixing reagents are added to cells to fix (e.g., crosslink such interactions. In certain embodiments, a sample is fixed with formalin, formaldehyde, ethanol, methanol, picric acid, etc.
In some embodiments, methods herein comprise a step of shearing (fragmenting) nucleic acids (e.g., fixed nucleic acids). Nucleic acids may be sheared (fragmented) by physical (mechanical) or chemical means, for example, by sonicating, shearing, or enzymatically digestion or chemical cleavage of DNA.
In some embodiments, tagged oligonucleotides are hybridized to target sequences. In some embodiments, the oligonucleotides are complementary (e.g., 100%, >95%, >90%, >85%, >80%, >75%, >70%, >65%, >60%, >55%) to target sequences. In some embodiments, the tag on the oligonucleotide facilitates capture of the hybridized complex by a complimentary capture moiety. Suitable tags include biotin, glutathione, a hexahistidine tag, a FLAG tag and digoxigenin, which can captured by streptavidin, glutathione S-transferase, an anti-his antibody, an anti-FLAG antibody and anti-digoxigenin, respectively. In some embodiments, the capture moiety is attached to a solid surface, bead (eg., magnetic bead), etc.
Some embodiments herein comprise methods for analyzing target sequences and any associated nucleic acid sequences. Such processes may include nucleic acid amplification, hybridization, mass analysis, sequencing, etc.
In some embodiments, methods of detection/analysis comprise nucleic acid amplification, for example, by polymerase chain reaction (PCR). The PCR process is well known in the art (U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,800,159). To briefly summarize PCR, nucleic acid primers, complementary to opposite strands of a nucleic acid amplification target nucleic acid sequence, are permitted to anneal to the denatured sample. A DNA polymerase (typically heat stable) extends the DNA duplex from the hybridized primer. The process is repeated to amplify the nucleic acid target. If the nucleic acid primers do not hybridize to the sample, then there is no corresponding amplified PCR product. In this case, the PCR primer acts as a hybridization probe.
In PCR, the nucleic acid probe can be labeled with a tag as discussed before. Most preferably the detection of the duplex is done using at least one primer directed to the target nucleic acid. In yet another embodiment of PCR, the detection of the hybridized duplex comprises electrophoretic gel separation followed by dye-based visualization.
DNA amplification procedures by PCR are well known and are described in U.S. Pat. No. 4,683,202. Briefly, the primers anneal to the target nucleic acid at sites distinct from one another and in an opposite orientation. A primer annealed to the target sequence is extended by the enzymatic action of a heat stable DNA polymerase. The extension product is then denatured from the target sequence by heating, and the process is repeated. Successive cycling of this procedure on both DNA strands provides exponential amplification of the region flanked by the primers.
Amplification is then performed using a PCR-type technique, that is to say the PCR technique or any other related technique. Two primers, complementary to the target nucleic acid sequence are then added to the nucleic acid content along with a polymerase, and the polymerase amplifies the DNA region between the primers.
The expression “specifically hybridizing in stringent conditions” refers to a hybridizing step in the process of the invention where the oligonucleotide sequences selected as probes or primers are of adequate length and sufficiently unambiguous so as to minimize the amount of non-specific binding that may occur during the amplification. The oligonucleotide probes or primers herein described may be prepared by any suitable methods such as chemical synthesis methods.
Hybridization is typically accomplished by annealing the oligonucleotide probe or primer to the DNA under conditions of stringency that prevent non-specific binding but permit binding of this DNA which has a significant level of homology with the probe or primer.
Among the conditions of stringency is the melting temperature (Tm) for the amplification step using the set of primers, which is in the range of about 55° C. to about 70° C.
Typical hybridization and washing stringency conditions depend in part on the size (i.e., number of nucleotides in length) of the DNA or the oligonucleotide probe, the base composition and monovalent and divalent cation concentrations (Ausubel et al., 1997, eds Current Protocols in Molecular Biology).
In some embodiments, methods herein involve sequencing target and/or captured nucleic acid sequences. Nucleic acid molecules may be sequence analyzed by any number of techniques. The analysis may identify the sequence of all or a part of a nucleic acid. Illustrative non-limiting examples of nucleic acid sequencing techniques include, but are not limited to, chain terminator (Sanger) sequencing and dye terminator sequencing, as well as “next generation” sequencing techniques. In some embodiments, RNA is reverse transcribed to cDNA before sequencing. A number of DNA sequencing techniques are known in the art, including fluorescence-based sequencing methodologies (See, e.g., Birren et al., Genome Analysis: Analyzing DNA, 1, Cold Spring Harbor, N.Y.; herein incorporated by reference in its entirety). In some embodiments, automated sequencing techniques understood in that art are utilized. In some embodiments, the systems, devices, and methods employ parallel sequencing of partitioned amplicons (PCT Publication No: WO2006084132 to Kevin McKernan et al., herein incorporated by reference in its entirety). In some embodiments, DNA sequencing is achieved by parallel oligonucleotide extension (See, e.g., U.S. Pat. No. 5,750,341 to Macevicz et al., and U.S. Pat. No. 6,306,597 to Macevicz et al., both of which are herein incorporated by reference in their entireties). Additional examples of sequencing techniques include the Church polony technology (Mitra et al., 2003, Analytical Biochemistry 320, 55-65; Shendure et al., 2005 Science 309, 1728-1732; U.S. Pat. No. 6,432,360, U.S. Pat. No. 6,485,944, U.S. Pat. No. 6,511,803; herein incorporated by reference in their entireties) the 454 picotiter pyrosequencing technology (Margulies et al., 2005 Nature 437, 376-380; US 20050130173; herein incorporated by reference in their entireties), the Solexa single base addition technology (Bennett et al., 2005, Pharmacogenomics, 6, 373-382; U.S. Pat. No. 6,787,308; U.S. Pat. No. 6,833,246; herein incorporated by reference in their entireties), the Lynx massively parallel signature sequencing technology (Brenner et al. (2000). Nat. Biotechnol. 18:630-634; U.S. Pat. No. 5,695,934; U.S. Pat. No. 5,714,330; herein incorporated by reference in their entireties), the Adessi PCR colony technology (Adessi et al. (2000). Nucleic Acid Res. 28, E87; WO 00018957; herein incorporated by reference in its entirety), and suitable combinations or alternative thereof.
A set of methods referred to as “next-generation sequencing” techniques have emerged as alternatives to Sanger and dye-terminator sequencing methods (Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; each herein incorporated by reference in their entirety). Next-generation sequencing (NGS) methods share the common feature of massively parallel, high-throughput strategies, with the goal of lower costs and higher speeds in comparison to older sequencing methods. NGS methods can be broadly divided into those that require template amplification and those that do not. Sequencing techniques that finds use in embodiments herein include, for example, Helicos True Single Molecule Sequencing (tSMS) (Harris T. D. et al. (2008) Science 320:106-109; U.S. Pat. No. 7,169,560), Lapidus et al. (U.S. patent application number 2009/0191565), Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No. 7,282,337), Quake et al. (U.S. patent application number 2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964 (2003), each of which is incorporated by reference in their entireties); 454 sequencing (Roche) (Margulies, M et al. 2005, Nature, 437, 376-380; incorporated by reference in its entirety); SOLiD technology (Applied Biosystems); Ion Torrent sequencing (U.S. patent application numbers 2009/0026082, 2009/0127589, 2010/0035252, 2010/0137143, 2010/0188073, 2010/0197507, 2010/0282617, 2010/0300559), 2010/0300895, 2010/0301398, and 2010/0304982; incorporated by reference in their entireties); Illumina sequencing; real-time (SMRT) technology of Pacific Biosciences; nanopore sequencing (Soni G V and Meller A. (2007) Clin Chem 53: 1996-2001; incorporated by reference in its entirety); use of a chemical-sensitive field effect transistor (chemFET) array to sequence DNA (for example, as described in US Patent Application Publication No. 20090026082; incorporated by reference in its entirety); other sequencing techniques (e.g., NGS techniques) understood in the field, or alternatives or combinations of the above techniques find use in embodiments herein.
MCF-7 or T-47D cells (log-phase growth, approximately 1×106 cells per sample) were fixed for 6 minutes at room temperature with a final concentration of 1% formaldehyde (fresh single-use vials) at approximately 50% culture confluency. Crosslinking was quenched for 10 minutes at room temperature by the addition of Tris-HCl pH 8.0 to a final concentration of approximately 0.125 M. Next the cells were harvested via scraping in 1 ml of PBS into a 1.5 ml eppendorf tube, and then spun at 250×g for 8 minutes (centrifugation steps were carried out in a standard tabletop microfuge). The supernatant was aspirated and the cells were resuspended in 500 μl of cold Nuclear Isolation Buffer supplemented with a protease inhibitor cocktail (Calbiochem). The samples were dounce homogenized 20 times with a tight-fitting pestle and centrifuged again for 10 minutes at 750×g to pellet nuclei. The supernatant was aspirated and the nuclear pellets were resuspended in 100 μl of CATCH Buffer supplemented with protease inhibitor cocktail. The samples were sonicated for 2 cycles (HIGH, 30s on/off) of 8 minutes each in a Diagenode BioRuptor® sonication device, and the cellular debris was pelleted by centrifugation at 24,000×g for 15 minutes at 4° C. Sonication efficiency was then assessed on a 1.5% (w/v) agarose gel. Genomic DNA fragments should largely fall between 100 and 500 bp. Next, the sheared chromatin sample was incubated at 58° C. for 5 minutes to unmask biotin on endogenous proteins. Then, 10 μl of pre-equilibrated (in CATCH Buffer) streptavidin magnetic beads (Thermo Scientific) were added to each sample. The samples were incubated for 1 hour at room temperature while gently rotating. Next, the magnetic beads were extracted and the supernatant was transferred to a clean PCR tube. To each sample, specific biotinylated oligonucleotide probe (Integrated DNA Technologies) was added to a final concentration of approximately 300 nM. The probe was then hybridized by incubating the samples as follows: 25° C. for 2 minutes, 81° C. for 4 minutes (denaturation), 72° C.-42° C. decreasing gradient (12 seconds per degree), 42° C. for 30 minutes (hybridization), followed by storage of the sample at 25° C. During testing, denaturation temperatures below 75° C. or above 85° C. were detrimental to oligonucleotide annealing or long-range interaction detection, respectfully; impact on interaction detection at 81° C. was undetectable. The hybridized sample was then transferred to a new 1.5 ml eppendorf tube and any unhybridized biotinylated oligo was removed with an Illustra Sephacryl (S-400HR) spin column, according to manufacturer's instruction. The cleared product was again transferred to a new 1.5 ml eppendorf containing 300 μl of nuclease-free H2O. Next, 25 μl of pre-equilibrated (in CATCH Buffer) streptavidin magnetic beads were added to each sample. The samples were incubated at room temperature for 1 hour while gently rotating. The beads from each sample were then immobilized on a magnetic stand and washed 5 times in CATCH Buffer at 42° C. while shaking at 1000 RPM in a thermomixer. The beads were then resuspended in 150 μl of De-crosslinking Buffer supplemented with 5 μl of 20 mg/ml Proteinase K. The sample was incubated at 55° C. for 30 minutes to while shaking at 1000 RPM on a thermomixer, followed by incubation at 65° C. overnight on the same thermomixer. Finally, the sample was incubated at 100° C. for 60 seconds to destroy any remaining biotin-streptavidin binding and elute the DNA from the magnetic beads. The supernatant was immediately transferred to a new 1.5 ml eppendorf tube. The DNA was then purified using phenol-chloroform-isoamyl alcohol, and then precipitated in 100% ethanol using glycogen (Thermo Scientific) as a carrier. The DNA was pelleted by spinning at 24,000×g for 25 minutes at room temperature, and resuspended in TE Buffer. An overview comparison of CATCH and other chromosome capture methods is available in Table 1 (See
Biotinylated oligonucleotides were ordered from Integrated DNA Technologies, using the TEG-Biotin modification on the 5′ end of the oligo. All oligos were designed with Primer3 version 4.0 to be between 23 and 25 nucleotides in length, with a Tm as close to 63° C. as possible. Testing multiple oligonucleotides, it was found that those biotinylated oligos targeted to regions approximately 150 bp from the targeted protein-binding site gave the most reliable data. The oligos were resuspended at 1 μg/μl in TE buffer and stored at −20° C. until use.
MCF-7 (ATCC; HTB-22) and T-47D (ATCC; HTB-133) cells were maintained in phenol red-free RPMI 1640 with L-glutamine supplemented with 10% (v/v) heat-inactivated fetal bovine serum and 100 U/ml penicillin-streptomycin. Cells were housed at 37° C. in 5% CO2 for a maximum of 12 passages after being purchased directly from the American Type Culture Collection (ATCC). The cells used in
Using the freely available CLOVER (zlab.bu.edu/clover) program (ref. 30; incorporated by reference in its entirety) according to the specified instructions, full-site estrogen response elements were identified within and around the SIAH2 gene ±100kb. The resulting potential binding sites were then cross-referenced to previously identified ER-binding sites within MCF-7 cells (ref 31; incorporated by reference in its entirety). The NCBI36/hg18 build of the human genome was used. Full-site EREs identified by CLOVER and/or positively correlated with previous data were then used in the subsequent ChIP and CATCH assays. These sites are detailed in
Cells were fixed with 1% formaldehyde for 10 minutes at room temperature. Reaction was quenched with glycine, cells were centrifuged to pellet, and resuspended in ChIP Lysis Buffer (10 mM Tris pH 8.0, 10 mM NaCl, 5 mM EDTA, 1% NP-40, 1% SDS, 0.5% Deoxycholate) supplemented with protease inhibitors. The cell slurry was incubated for 10 minutes on ice and then sonicated for 3 cycles (HIGH, 30s on/off) of 7 minutes each in a Diagenode BioRuptor® sonication device, and the cellular debris was pelleted by centrifugation at 24,000×g for 15 minutes at 4° C. Sonication efficiency was then assessed on a 1.5% (w/v) agarose gel. Genomic DNA fragments largely fell between 100 and 700 bp. Next, the sheared chromatin sample was diluted to 1 ml in ChIP Dilution Buffer (17 mM Tris pH 8.0, 33 mM NaCl, 1% SDS, 0.5% NP-40) supplemented with protease inhibitors. Here, 10% of total volume was taken as input. Then, 2 μg of anti-ERα (Santa Cruz Biotech, HC-20) or rabbit IgG antibody was added to each sample, and the samples were rotated overnight at 4° C. Next, magnetic protein-G Dynabeads (Invitrogen) were washed once in PBS supplemented with 5% BSA and resuspended in ChIP dilution buffer. Then 30 μl of the pre-washed beads were added to each sample, and the samples were rotated at 4° C. for 2 hours. The beads were then washed consecutively in ChIP Wash Buffer I (20 mM Tris pH 8.0, 150 mM NaCl, 2 mM EDTA, 1% NP-40, 1% SDS), ChIP Wash Buffer II (20 mM Tris pH 8.0, 500 mM NaCl, 2 mM EDTA, 1% NP-40, 1% SDS), ChIP Wash Buffer III (20 mM Tris pH 8.0, 250 mM LiCl, 1 mM EDTA, 1% NP-40, 1% Deoxycholate), and TE buffer. The beads were then resuspended in 100 μl freshly made ChIP Elution Buffer (200 μl of 10% SDS and 0.168 g of NaHCO3 in 2 ml of H2O). Next the samples were incubated at 65° C. for 15 minutes to elute the complex from the beads. That process was repeated and the eluates were combined. Finally, 8 μl of 5.0 M NaCl was added to each sample (including input samples) and they were incubated overnight at 65° C. Samples were incubated with RNase and Proteinase K prior to processing with QIAquick® PCR purification kit (Qiagen) according to manufacturers instructions. PCR Primers for individual ChIP experiments are detailed in
PCR products were diluted in 6× Orange G loading buffer and run at 100V for 28 minutes on a 1.5% agarose gel with ethidium bromide (ladder was Bioline EasyLadder I). The resulting gel was imaged under ultraviolet light, and the individual bands were quantified via ImageJ using the measure function. First, the background value for each band was taken. Next, the value of the band itself was taken, and the background value was subtracted from the value of the band. Each resulting value was then normalized to the value of the targeted pull down in the experiment, such that the value of the pull down became 1.0. This ensured subtraction of background variation and random variation in pull down efficiency. The resulting values were then plotted as mean with error bars of SEM.
10 μl of DNA from CATCH final elution was immediately (without freezing) put through second strand synthesis protocol using NEBNext Module #E6111S according to manufacturer instructions. Nucleic Acid binding beads were AMPure XP #A63881, purchased from Agencourt. Next, the DNA template was made into a sequencing library using the KAPA Biosystems library kit #KK8232 following manufacturer instruction. The KAPA kit was critical as it produces a library with fewer “bead swap” steps, allowing you to retain a better DNA template yield and thus make a library from less starting material. Sequencing depth for each library varied between ˜15 to ˜24 million reads: GRB7 replicates 1 and 2 had 16.0 and 22.9 million reads, respectively, MYC had 16.1 million reads, EIF4A1 replicates 1 and 2 had 18.6 and 24.2 million reads, respectively, SIAH2 vehicle-treated had 16.2 million reads, and SIAH2 estradiol-treated had 15.1 million reads.
The SEEK algorithm (seek.princeton.edu) is a web application stemming from research done at Princeton University (ref. 17; incorporated by reference in its entirety). SEEK allows a number of genes as input to determine a ranked-order list by co-expression. This co-expression rank is a comprehensive analysis based on over 5000 independent microarray and RNA sequencing data sets. In T-47D cells: to create each SEEK list, 3 “seed” input genes were selected with the following rules: (a) the gene must have a CATCH-seq peak within 2 kb of its TSS and the peak must be above background, (b) the peak near the TSS of the gene must be one of the top 500 (in height) such peaks on the chromosome, and (c) the gene must not be the primary target of the enhancer, according to our study (e.g. SIAH2 was not used as a “seed” gene, despite being identified in the associated CATCH experiment wherein the downstream enhancer of SIAH2 was captured). Using those guidelines, the three gene promoters nearest the CATCH capture site were chosen as input to determine the list of co-expressed genes via SEEK (
Data analysis was performed using R version 3.2.2 within RStudio. ERα ChIP-seq BED file data from MCF-7 cells (
CATCH-seq Data Sets: Each CATCH-seq experiment was done alongside an unfixed control experiment using an identical capture oligo. The raw data underwent FASTQ Groomer processing, before being aligned to the hg19 build of the chromsome of interest using Bowtie 2. Firstly, an unfixed control pulldown (CATCH experiment, minus any fixation method) using the same biotinylated oligonucleotide is normalized from the experimental pulldown data by aligning unfixed control reads (.bam) and ‘subtracting’ from the experimental reads, directly, to remove any background signal using the bamCompare function in Galaxy deepTools2. The data was then subsetted by individual chromosome (e.g. chr3 for SIAH2, chr8 for MYC, etc.), in R, and then by signal threshold (the signal threshold is determined based on the number of CATCH interactions desired to discover). The identification of that threshold determined the signal strength at which CATCH peaks were defined as peaks; with the threshold determined, those CATCH peaks that satisfied the signal strength threshold in the BIGWIG of respective pull-down were next called. Next, gene promoters that had peaks within 2 k of their promoters were then annotated and considered as interactions with the pulldown locus. The top 500 (highest peaks) genes were then assessed to determine the three closest CATCH-identified genes to the pull-down. These three genes were used as the “seed” for creating the SEEK list, which is described above. In the case of
Enhancers: To determine the location of Enhancers, ChIP-seq data from both H3K4me1 and H3K27ac was used. Peaks were called using the MACS (version 1.4.1; p-value cutoff 1e-05; MFOLD range 32, 128; fixed background lambda) function of Galaxy, and peak locations of at least 1 bp overlap between H3K4me1 and H3K27ac signal were identified. The combined distance of the two peaks was merged into a single peak, denoted an Enhancer, and made into a BED file for further use.
ER-binding: locations of ER binding were determined as above, instead using ERα ChIP-seq data and without combining any other data sets.
CTCF-binding: locations of CTCF binding were from CTCF-binding data from T47D cells (GEO accession: GSM803348).
CATCH-seq gene identification: to ensure that only the most significant CATCH-seq peaks were used for analysis, gene promoters with peaks within 2 kb of their TSS were identified. If multiple peaks occurred near a TSS, only the most robust peak was considered. Then, the 500 genes with the strongest CATCH-seq signal peaks were identified for use in subsequent analyses. CATCH-Enhancer adjacency/overlap: CATCH-seq signal strength (peak height) ranging from 100 to 500 was analyzed. CATCH peaks were determined by filtering based on the strength of the signal in BIGWIG files of the respective pull-down. Any CATCH-seq peak within 2 kb of an Enhancer region (as defined above) was considered to be adjacent or overlapping, thus achieving our criteria for being considered an overlap in these analyses. TSS Density Plot: the plot(density(x)) function in R was used to plot CATCH-seq signal strength at locations within 2 kb up- and down-stream of every TSS on a chromosome (chr3 for SIAH2, chr17 for EIF4A1, and chr8 for MYC). That signal density plot was used to determine the average location of signal “peaks” near gene TSS.
A flowchart representing the CATCH methodology and process is shown in
To demonstate CATCH compatibility with next generation sequencing, the established enhancer region downstream of the SIAH2 gene was targeted. CATCH followed by next generation sequencing (CATCH-seq) demonstrated that the oligo-targeted downstream ERE (pull down region) was highly enriched when compared to any other genomic site (
SIAH2 is an estradiol-responsive ER target gene with multiple putative EREs located within and adjacent to the gene. In a study focused on identifying functional ER binding sites, the authors predicted that the intronic region of SIAH2 contributed to the transcriptional regulation of the gene (ref 18; incorporated by reference in its entirety). However, the promoter region of SIAH2 does not contain a recognizable ERα binding site, and chromatin immunoprecipitation (ChIP) experiments confirmed that ERα binding was nearly undetectable at the promoter, nor was it responsive to E2 (
In concordance with these data, ERα ChIA-PET analysis of the SIAH2 gene demonstrated interaction between a portion of the intron and an enhancer region directly downstream of the gene (ref. 16; incorporated by reference in its entirety). Additionally, the interaction between the downstream enhancer of SIAH2 and multiple other long-distance genomic loci (visualized through Washington University in St. Louis's WashU Epigenome browser; epigenomegateway.wustl.edu;
In order to demonstrate that CATCH-seq recapitulates data obtained with previously validated techniques, experiments were conducted to identify ERα-mediated genomic looping interactions attributed to the downstream enhancer of SIAH2. Each of the four long-distance interactions tested were positively demonstrated by CATCH-seq. Interaction with the intronic ERE of SIAH2 (distance: 17 kb) was demonstrated here in MCF-7 cells (
Another canonical ERE/Promoter interaction on the TFF1 gene was confirmed (
In order to demonstrate the specificity of CATCH at the level of sequencing, multiple biological replicates were sequenced separately and compared to a ‘random’ locus capture on the same chromosome (chromosome 17). It has been suggested that the promoter region of the human EIF4A1 gene is involved in multiple chromatin interactions with neighboring loci (ref 19; incorporated by reference in its entirety). In contrast, while a number of interactions occur adjacent to the GRB7 promoter, it was used as a control pull-down because none of the interactions identified near GRB7 looked to directly involve the promoter. While the direct capture of both regions of interest was successful (
To determine if these observations held true on a different chromosome, an enhancer downstream of the MYC gene was interrogated in similar fashion (
Estradiol is a powerful genome-wide transcriptional inducer via estrogen receptor activation. SIAH2 transcription is upregulated upon ER activation.
Despite SIAH2 transcription being activated by estradiol treatment, both vehicle and estradiol treatments showed DNA-DNA interaction between the downstream enhancer (pull-down region) and the SIAH2 intron and promoter (
CATCH-Seq Predicts Correlation with Gene Expression Using the SEEK Algorithm
In order to test whether a single enhancer is capable of interacting with, and altering the transcription of, multiple gene targets on the same chromosome, the SEEK algorithm (search-based exploration of expression compendia; seek.princeton.edu) was employed. SEEK determines gene expression correlation by weighting available gene expression datasets based on input genes of interest (ref 17 incorporated by reference in its entirety); using this weighted correlation aggregation method, it calculates relative gene co-expression amongst those datasets. If CATCH-seq is identifying DNA-DNA interactions at gene promoters that led to the alteration of transcriptional expression of that gene, CATCH-seq data should be significantly more proficient at predicting co-expression of gene cohorts than random. If the list of gene promoters identified via CATCH-seq is significantly enriched (over random) for genes that are also transcriptionally co-expressed, it would indicate that the long-distance genomic interactions of a single enhancer are capable of influencing gene expression patterns, not just the transcriptional output of a single gene. The top 500 gene promoters for each CATCH-seq experiment were identified. Next, SEEK lists were created using a “seed list”; a three gene subset specific to each CATCH-seq experiment; each SEEK list was then sorted based on the highest co-expression value. The top 100 co-expressed genes from the pull-down chromosome of interest for each CATCH-seq experiment were denoted the SEEK list. A flow chart describing the processing of each data set can be found in
The presence of estradiol induced changes in the DNA-DNA interactions of the downstream enhancer of SIAH2 (
The following references, some of which are cited above by number, are herein incorporated by reference in their entireties.
The present invention claims priority to U.S. Provisional Patent Application 62/395,130, filed Sep. 15, 2016, which is incorporated by reference in its entirety.
This invention was made with government support under Grant Number(s) R01 CA089489 awarded by The National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62395130 | Sep 2016 | US |