Temporal and tissue-specific gene expression in mammals depends on the cis-regulatory elements in the genome. These non-coding sequences can be divided into many classes depending on their regulatory functions [1]. Among the better-characterized elements are promoters, enhancers, silencers, and insulators. Transcription initiates from promoters, which serve as anchor points for the recruitment of the general transcriptional machinery [2,3]. Enhancers act to recruit a complex array of transcription factors and chromatin-modifying activities that facilitate gene transcription [4,5]. Repressor elements, on the other hand, bind proteins and/or modify chromatin structure to inhibit gene transcription [4,6]. Insulator elements provide additional regulation by preventing the spread of heterochromatin and restricting transcriptional enhancers from activating unrelated promoters [7]. Besides these four classes of cis-regulatory sequences, there are also locus control regions that facilitate the activation of a cluster of genes through still poorly understood mechanisms. A recent comprehensive survey of 1% of the human genome, using a combination of multiple genomic and computational methods, has identified a large number of transcripts and potential regulatory elements. However, it remains to be resolved how each class of regulatory element contributes to cell-type specific gene expression [8].
While all types of cis-regulatory elements can contribute to the cell-type specific gene expression program, recent studies have mainly focused on the role of promoters as a driving force behind tissue-specific and differential expression. These studies have revealed that many promoters contain transcription factor binding motifs for tissue-specific factors [9,10]. Indeed, some experimental evidence indicates that promoters are capable of directing certain degrees of cell-type specific expression in transient transfection assays [11]. However, it remains unclear to what extent promoters play a role in differential gene expression. On the other hand, it has long been recognized that enhancers are critical for the proper temporal and spatial expression from the gene promoter [12,13]. While the complex interplay between promoters and enhancers can occur across great distances in the genome [14,15], many enhancers have been shown to be within “close” proximity of the target promoter [13,16,17,18]. A number of studies have provided various means by which enhancers can regulate expression levels, including frequency of promoter-enhancer interaction, length of interactions [13,19], as well as strength of transcription factor binding [20,21,22].
Whether an enhancer is distal or proximal, how it determines its target promoter is unclear. One means of modulating which interactions occur is through insulator elements in the genome that act as enhancer-blockers and prevent such communication by separating enhancers from neighboring promoters [23,24,25]. Additionally, many insulator elements are thought to define blocks in which promoter-enhancer interactions can occur. Promoters and enhancers within these blocks are likely brought within close proximity to one another through chromatin looping [26]. The chromatin is organized into loops via insulator-insulator interactions or by localization to structures such as the nuclear envelope [26,27,28,29]. In this manner, insulators play a critical role in defining promoter-enhancer interactions.
In order to understand the roles of promoters, enhancers, and insulators in cell-type specific gene expression, we have systematically characterized the binding of general transcription factors, the insulator binding protein CTCF and several active chromatin modifications in 1% of the human genome in five diverse cell types. We have previously mapped chromatin modification profiles in the ENCODE regions in HeLa cells, and demonstrated that chromatin signatures are predictive of both promoters and enhancers [30]. Here, we generated maps of active promoters and enhancers, along with the insulator binding protein CTCF, in four additional cell types, including the leukemia cell line K562, immortalized lymphoblasts GM06690 (GM), undifferentiated human embryonic stem cells (ES) and BMP4-induced differentiated ES (dES). We show that the pattern of CTCF binding across all five cell types is remarkably similar, and that chromatin modifications at promoters are also largely invariant. In contrast, chromatin modifications at enhancers are highly dynamic across cell types. We also observe that differential gene expression correlates with differential enrichment of chromatin at promoters, as well as with changes in enhancer numbers. These results indicate that enhancers play an important role in cell-type dependent gene expression, and highlight the importance of identifying these sequences for understanding mechanisms of cell-type specific gene expression.
The invention is based on the discovery that characteristic chromatin signatures are associated with enhancers and, further, that within the genus of characteristic chromatin signatures associated with enhancers, the signatures differ in a cell-type specific way.
One embodiment of the invention is concerned with the general identification of enhancers based on the characteristic chromatin modifications found to be associated with this class of regulatory element. Another embodiment is concerned with the identification of differentially active and inactive genes based on the presence and distribution of enhancers. A third embodiment involves the monitoring, diagnosis and/or prognosis of diseases based on the presence and distribution of enhancer signatures associated with particular cell types and levels of expression of gene products within the cell types.
a and b show the results of ChIP-chip analysis of the amounts of several different chromatin modifications including acetylated and methylated histones in selected promoters and enhancers.
a-c show plots of differential gene expression as a function of the difference in enrichment of chromatin for three different chromatin-associated proteins.
a-f show the results of comparative analysis of enhancer clustering near genes being differentially expressed and genes not being differentially expressed.
a-g depict the results of verification studies of histone-modification-based prediction of enhancers.
a and b show the results of comparative analysis of promoter histone modifications in differentially expressed genes and repressed genes.
a-f show plots of the relationship between differential enrichment of chromatin with various chromatin-associated proteins and differential gene expression.
a and b graphically depict the results of a comparison of the observed distribution of adjacent TSS-TSS and CTCF-CTCF distances with what would be expected with random placement of sites.
a shows the results of comparative analysis of the distribution the closest enhancer-TSS distance in genes differentially expressed and genes not being differentially expressed; 14b shows the correlation between enhancer numbers and differential gene expression.
a-f show the results of a parallel analysis to that shown in
Abbreviations: ChIP, chromatin immunoprecipitation; ChM-chip, chromatin immunoprecipitation coupled with DNA microarrays; ChIP-Seq, chromatin immunoprecipitation coupled with high-throughput parallel sequencing; dES, BMP4 differentiated embryonic stem cells; ES, embryonic stem cells; GM, GM06990 lymphoblast cell line; H3, histone H3; H3K4Me1, histone H3 lysine 4 monomethylation; H3K4Me2, histone H3 lysine 4 dimethylation; H3K4Me3, histone H3 lysine 4 trimethylation; H3K9Ac, histone H3 lysine 9 acetylation; H3K18Ac, histone H3 lysine 18 acetylation; H3K27Ac, histone H3 lysine 27 acetylation; IMR90, fetal fibroblast cell line; K562, leukemia cell line K562; TSS(s), transcription start site(s).
Passage 32 H1 cells were grown in mTeSR1 medium [45] on Matrigel (BD Biosciences, San Jose, Calif.), for 5 passages. 15×10 cm2 dishes were grown using standard mTeSR1 culture conditions and 20×10 cm2 dishes were cultured in mTeSR1 supplemented with 200 ng/ml BMP4 (RND systems, Minneapolis, Minn.). 5 days post passage, when cells were approximately 70% confluent, H1 p32 cells grown in unmodified mTeSR1 were cross-linked. To cross-link, 2.5 ml of cross-linking buffer (5M NaCl, 0.5M EDTA, 0.5M EGTA, 1M HEPES pH 8, 37% fresh formaldehyde) was added to 10 ml culture medium and incubated at 37° C. for 30 minutes, 1.25 ml of 2.5M glycine was added to stop the cross-linking reaction. Cells were removed from the culture dishes with a cell scraper, and collected by centrifugation for 10 minutes at 2500 rpm at 4° C. Cells were washed three times with cold PBS. After the final spin, cells were pelleted and flash frozen using liquid nitrogen. BMP4-treated cells were subjected to the same procedure after 6 days of exposure.
K562 (#CCL-243) cells were acquired from ATCC (www.atcc.org). K562 cells were grown to a density of 2.5×105 cells/mL in Iscove's modified Dulbecco's medium with 4 mM L-glutamine containing 1.5 g/L sodium bicarbonate, and 10% fetal bovine serum at 37° C., 5% CO2. GM06990 (#GM06990) B-lymphocyte cells were acquired from Coriell (www.ccr.coriell.org). GM cells were grown to a density of 2.5×105 cells/mL in RPMI 1640 medium with 2mM L-glutamine containing 15% fetal bovine serum at 37° C., 5% CO2. HeLa growth conditions were previously described [30].
ChIP-chip procedure and antibodies against p300, TAF1, histone H3, H3K4Me1, H3K4Me2, H3K4Me3, and CTCF were previously described [30,39,46]. Additional antibodies are commercially available [α-H3K9Ac Abcam ab4441; α-H3K18Ac Abcam ab1191; and α-H3K27Ac Abcam ab4729]. All ChIP-chip experiments were completed in triplicate, except for those with normal and BMP4-treated ES cells. All ChIP-DNA samples were hybridized to NimbleGen ENCODE HG17 microarrays (NimbleGen Systems). DNA was labeled according to NimbleGen Systems' protocol. Samples were hybridized at 42° C. for 16 hours on a MAUI 12-bay hybridization station (BioMicro Systems). Microarrays were washed, scanned and stripped for re-use following protocols from NimbleGen Systems. Gene expression data for HeLa, K562, and GM cells were obtained using HU133 Plus 2.0 microarrays (Affymetrix).
Identification of CTCF and p300 Binding Sites
The Mpeak program can reliably detect binding sites of transcription factors, and has worked well in previous studies to identify TAF1, CTCF, and p300 binding sites [30,39,40,46]. We used the Mpeak program to determine binding sites of CTCF [39] and p300 [30] peaks. Specifically, we called a CTCF peak such if there was a stretch of 4 probes separated by at most 300 by that were at least 2.5 standard deviations above the mean. For p300, we used a simple FDR cutoff of 0.0001 to define peaks as in Heintzman et al. We used different parameters for consistency with previous publications, but swapping these parameters did not vary the results significantly.
The procedure used to predict enhancers follows closely that in Heintzman et al. [30]. Specifically, we first binned the tiling ChIP-chip data into 100 by bins, averaging multiple probes that fell into the same bin. Empty bins were interpolated if the distance between flanking non-empty bins was less than 1 kb, and set to 0 otherwise. We scanned this binned data, keeping only those windows 1) in the top 10% of the intensity distribution and 2) having H3K4Me1 and H3K4Me3 profiles in the top 1% of all windows using the same training set of sites as in Heintzman et al (Figure la,b). We used a discriminative filter on H3K4Me1 and H3K4Me3 to keep only those sites that correlated with the averaged enhancer training set more than the promoter training set. Finally, we applied a descriptive filter on H3K4Me1 and H3K4Me3, keeping only those remaining predictions having a correlation of at least 0.5 with an averaged training set.
We used the GCRMA package [47] to normalize Affymetrix mRNA expression arrays for HeLa, GM, and K562 cell types. For every pair of these cell types, we also used GCRMA to find differentially expressed and repressed genes using a p-value cutoff of 0.01 in conjunction with a fold change cutoff of 2.0. The expression data for ES and dES cell types were generated using the Nimblegen platform, and thus were not directly comparable to the Affymetrix expression data. As such, we could only use this expression data to compare ES and dES cell types. As a conservative measure of differential expression, we used a fold-change cutoff of 2.
For gene expression analysis, we isolated the total RNA from H1 ES cells or BMP4-treated cells using Trizol (Invitrogen, Carlsbad, Calif.) according to the manufacturer's recommendations. PolyA RNA was then isolated using the Oligotex mRNA Mini Kit (Qiagen). The mRNA's were then reversed transcribed, labeled, mixed with differently labeled sonicated genomic DNA, and hybridized to a single array that tiled transcripts from approximately 36,000 human loci from the hg17 assembly (NimbleGen Systems). Detailed descriptions of array design, labeling, hybridization and data analysis are provided below. We set the expression level of genes in undifferentiated cells as 1 and calculated the relative fold change of individual genes in the dES cells.
Randomization and p-Values
To determine the expected distribution of adjacent element-to-element distances, we randomly placed the same number of elements into the ENCODE regions, with each base having an equal probability of being selected. To avoid complications such as repeat-masked regions, we restricted our sampling to only those regions covered by the NimbleGen tiling array.
The p-values for correlations were obtained by using the Matlab corr function. This p-value measures the probability that there is no correlation between the two variables, against the alternative that the correlation is non-zero. The p-values for Wilcoxon rank sum tests were obtained from the Matlab ranksum function.
The Human Whole Genome Expression arrays containing ˜385,000 60-mer probes were manufactured by NimbleGen Systems (http://www.nimblegen.com). This array design tiles transcripts from approximately 36,000 human locus identifiers for the hg17 (UCSC) assembly with typically 10 or 11 probes per transcript.
Total RNA was enriched for the polyA fraction using Oligotex mRNA Mini Kit (Qiagen). Enriched mRNA (250 ng) was primed using random hexamers and reverse transcribed using Superscript III (Invitrogen) in the presence of 5-(3-aminoallyl)-dUTP (Ambion). The purified product was coupled to Cy5-NHS ester (Amersham). Similarly, sonicated genomic DNA (2 μg) was primed with random octamers and labeled using Klenow fragments in the presence of 5-(3-aminoallyl)-dUTP. The resulting product was coupled to Cy3-NHS ester (Amersham). Cy3-labeled genomic DNA (4.5 μg) was used as a reference and added along with the Cy5-labeled mRNA sample (2 μg) onto each array. Hybridizations were performed in 3.6×SSC buffer with 35% formamide and 0.07% SDS at 42° C. overnight. Arrays were then washed, dried, and scanned using a GenePix 4000B scanner.
Gene expression raw data were extracted using NimbleScan software v2.1. Considering that the signal distribution of the RNA sample is distinct from that of the gDNA sample, the signal intensities from RNA channels in all eight arrays were normalized with the Robust Multiple-chip Analysis (RMA) algorithm [47]. Separately, the same normalization procedure was performed on the signals from the gDNA samples. For a given gene, the median-adjusted ratio between its normalized intensity from the RNA channel and that from the gDNA channel was then calculated as follows:
Ratio=intensity from RNA channel/(intensity from gDNA channel+median intensity of all genes from the gDNA channel).
We found that this median-adjusted ratio gave the most consistent results when compared to other published human ES cell expression data, such as SAGE library information available from the Cancer Genome Anatomy Project (CGAP). Consequently, we used this median-adjusted ratio as the measurement for the gene expression level.
Mapping of Chromatin Modifications, TAF1, p300, and CTCF Binding in 1% of the Human Genome in Diverse Cell Types
We performed ChIP-chip analysis [30] to determine the chromatin modification patterns along 44 human loci selected by the ENCODE consortium as common targets for genomic analysis [31], totaling 30 Mbp. We investigated the patterns of six specific histone modifications: acetylated histone H3 lysine 9, 18 and 27 (H3K9Ac, H3K18Ac and H3K27Ac), and mono-, di- and tri-methylated histone H3 lysine 4 (H3K4Me1, H3K4Me2, and H3K4Me3). We also examined binding of a component of the basal transcriptional machinery TAF1 in all five cell types to identify active promoters, along with the transcriptional coactivator p300 in HeLa, GM, and K562 cells to identify enhancers [32] (
Previously, we demonstrated that active promoters and enhancers could be determined by distinct chromatin signatures of H3K4Me1 and H3K4Me3 at these functional elements [30]. Curiously, we had not observed any consistent enrichment of acetylated histones near enhancers, even those bound by the known histone acetyltransferase p300. One possible explanation for this is the specificity of the antigen recognition of the pan-H3 and H4 acetylation antibodies used in the previous study. We hypothesized that using antibodies specific for individual acetylated histones would improve recovery of consistently acetylated histones, especially at p300 binding sites. Focusing on HeLa cells, we indeed found that three additional histone modification marks, namely H3K9Ac, H3K18Ac and H3K27Ac are also part of the chromatin patterns at promoters and enhancers. All three acetylation marks localize to active transcription start sites (TSSs), and remain absent, as do other chromatin modifications, at inactive promoters (
Most Human Promoters are Universally Associated With a Set of Active Chromatin Marks in Different Cell Types
A cell's gene expression program uniquely defines its cell type, and modulation of the chromatin state of a cell is a key component of this program [34,36]. Given the diversity of the five cell types used in this study, we hypothesized that the chromatin modifications at promoters would uniquely define each cell type. To visualize the cell-type specificity of chromatin modification patterns at promoters, we simultaneously clustered the ChIP enrichment ratios for three histone modifications associated with active promoters (H3K4Me1, H3K4Me3 and H3K27Ac) and TAF1 within 10 kb windows centered at Gencode [37] TSSs for all cell types. We expected to recover large clusters of promoters specific to each cell type. Unexpectedly, however, we found that the chromatin signatures at virtually all TSSs were remarkably similar across cell types (
Almost half (1296/2690=48.2%) of the promoters belonged to cluster G4, which generally lacks enrichment of chromatin marks typically found at active promoters. For the remaining clusters, the chromatin modification patterns appeared nearly identical across all five cell types. To quantify this, we defined a cell type's enrichment profile as the sum of the log ratio enrichment values of H3K4Me1, H3K4Me3, H3K27Ac, and TAF1 for each Gencode gene. We then calculated the Pearson correlation coefficient between enrichment profiles from different cell types (Table 1a). The enrichment profiles were highly correlated between all pairs of cell types, with an average correlation coefficient of 0.79, supporting the notion of the generally invariant nature of the chromatin marks at TSSs. Thus, this large-scale view indicated that roughly half of the promoters were consistently inactive across these five cell types, and that the remaining promoters were in general commonly marked by common histone modification patterns.
Since the cell-type specificity of epigenetic marks at promoters appears limited, we examined two other classes of cis-regulatory elements to determine if they were localized in a cell-type specific manner Insulator elements play key roles in restricting enhancers from activating inappropriate promoters, thereby defining the boundaries of gene regulatory domains [26].
Nearly all insulator elements that have been experimentally defined in the mammalian genome require the insulator binding protein CTCF to function [38]. Our previous genome-wide location analysis of the insulator binding protein CTCF in human fibroblasts indicated that CTCF binding is closely correlated with the distribution of genes, and is highly conserved throughout evolution, consistent with its key role in insulator function [39]. It is possible that CTCF localization could vary between cell types, contributing to cell-type specific gene expression. To test this hypothesis, we performed ChIP-chip to map CTCF binding sites in the ENCODE regions in all five cell types. After loess normalization, we used the Mpeak program [40] to identify CTCF binding sites (see Methods). We used a consistent set of parameters, calling a binding site such when at least 4 probes within a 300 by window were enriched at least 2.5 standard deviations above the mean. Using this method, there was an average of 517 CTCF binding sites identified for each cell type. On average, the overlap of CTCF binding sites from different cell types was a remarkable 82.8%, supporting the notion that CTCF binding sites are indeed cell-type independent, at a degree that is much higher than previously appreciated.
Peak finding is not perfect, so to further assess the cell-type specificity of CTCF binding, we merged CTCF binding sites found within 2.5 kb from sites in different cell types, giving a set of 729 non-redundant sites. To visualize the cell-type specificity of CTCF, we then created a heat-map of CTCF binding centered at these sites across all five cell types (
Not observing epigenetic cell-type specificity at promoters and insulators, we tested if nhancers were localized in a cell-type specific manner. First, using very stringent criteria, we defined active enhancers to be binding sites of p300, a histone acetyltransferase and coactivator protein. We identified a total of 411 TSS-distal p300 binding sites in HeLa, GM, and K562 cell lines. We observed that, unlike CTCF and chromatin modifications at promoters, the localization of p300 binding sites appears unique to each cell type in the three cell types where p300 ChIP-chip analysis was performed (
While the presence of p300 is sufficient to indicate an enhancer, p300 is not necessarily found at all enhancers. To obtain a more complete catalog of enhancers, we relied on the approach of Heintzman et al [30] (see Methods). Briefly, using a sliding window on H3K4Me1 and H3K4Me3, we scanned for chromatin modifications resembling a training set of enhancer patterns defined by the p300 binding sites in HeLa cells. We then kept only those predictions having a Pearson correlation of at least 0.5 with the training set and that had histone modification patterns correlating more with the enhancer training set than with promoter patterns (Tables 2-6). Consistent with the chromatin signatures of p300 binding sites, the putative enhancers were highly enriched in the chromatin modifications H3K4Me1 and H3K27Ac, but had no enrichment of H3K4Me3 (
Several lines of evidence supported the idea that the histone-modification-based predictions of enhancers are truly enhancers. First, we compared the predicted enhancers to DNase I hypersensitive (HS) sites, as hypersensitivity is a hallmark of enhancers. Using a recently published set of HS sites [40] mapped in HeLa, GM, K562, and H9 ES cells, we computed the percentage of predicted enhancers within 2.5 kb of HS sites (
Second, enhancers were defined to be regions in the genome bound to transcription factors and co-activators. To verify the predicted enhancers, we compared their overlap with p300 binding sites. For every cell line where we mapped p300 binding, we observed significant enrichment of predicted enhancers at p300 binding sites (HeLa: 86.4% overlap, Z-score=27.7 , p=2.9E-169; GM: 79.2% overlap, Z-score=35.7, p=4.6E-279; K562: 63.6% overlap, Z-score=23.3, p=1.7E-120) (
Next, we addressed the cell-type specificity of the predicted enhancers. As we expected the localization pattern of enhancers to resemble that of p300, we hypothesized that the predicted enhancers were also localized in a cell-type specific manner. To see if this was supported visually, we performed computational clustering on all predicted enhancers, encompassing chromatin modifications from all five cell types (
Since promoters, insulators, and enhancers are critical for regulating the expression of each gene, we expected that differences in chromatin modifications or transcription factor binding to these elements between different cell types might help explain cell-type specific gene expression program. To better define the roles of each class of element in differential gene expression, we focused on a subset of 54 genes that show at least 2-fold differential transcription between any pairs of two cell types from HeLa, K562 and GM.
Changes in Promoter Chromatin Sstructure at Differentially Expressed Genes Correlated With Transcriptional Changes
We have observed that the histone modification patterns at promoters across all five cell types are invariant at a global level (
As described above, chromatin modifications and co-activator binding at enhancers are generally cell-type specific, supporting the notion of their role in mediating cell-type specific gene expression programs. To further understand the role of enhancers in cell-type specific gene expression, we examined the distribution of predicted enhancers in the human genome. To obtain a coarse view of the localization pattern of enhancers, we first examined the distribution of distances between adjacent enhancers. We observed that enhancers are more highly clustered than expected at random (Wilcoxon p=1.1E-27) (
Having observed clustering of both enhancers and TSSs, we hypothesized that clustering of enhancers is associated with cell-type specific gene expression. To test this, we again focused on differentially expressed genes between pairs of cell types. We counted the number of enhancers near the differentially expressed genes in the neighboring domains defined by consensus CTCF sites. We found that enhancers were enriched near differentially expressed genes as compared to the same genes that are differentially repressed in another cell type, and this enrichment was largely confined within CTCF binding sites that directly flanked the gene's TSS (
There were 1355 enhancers identified in the HeLa, GM, and K562 cell lines (Tables 2-4), with nearly half (625 46.1%) in a CTCF block that also contained at least one of the 426 promoters for which we have expression data. Of these 426 promoters, 54 (12.7%) were differentially expressed in either HeLa, GM, or K562, and they were next to 158 (25.3%) of the 625 enhancers. While the enhancers were present in significantly enriched numbers near differentially expressed genes than would be expected for random placement (p=8.2E-17) (
The presence of multiple enhancers at differentially upregulated genes raises the possibility that enhancers may act cooperatively to regulate gene expression, and that the individual enhancer is weak. If enhancers generally modulate expression weakly, we would expect genes not differentially expressed to have minimal changes in enhancer numbers. To test this, we compared the distribution of changes in enhancer numbers for differentially expressed genes to those that were not. We found that the average change in enhancer counts was 1.47 for differentially expressed genes, whereas this figure was −0.05 for all other genes (t-test p=4.9E-6) (
We noticed that while some active promoters are near a single enhancer, others are near multiple enhancers. This led us to ask if there is a relationship between a gene's induction level and the number of enhancers in the gene's CTCF block. Given that enhancers are positive-acting, there are several distinct possibilities: 1) the presence of multiple enhancers can have the same effect as the presence of a single enhancer, 2) enhancers have an additive effect on gene expression, or 3) enhancers synergistically upregulate gene expression such that the output is greater than the effect of adding individual enhancers. Indeed, we found that the latter is likely to be true: as the number of enhancers increased (
While these properties of enhancers were shared by predicted enhancers in each cell type, all of the above results also held when considering enhancers stringently defined as TSS-distal p300 binding sites (
The identity of a mammalian cell is largely defined by its unique gene-expression profile. To understand the mechanisms that determine cell-type specific transcription, we have localized the binding sites of general transcription factors, the insulator protein CTCF and a number of histone modifications in 1% of the human genome in five diverse cell types. Using a previously defined chromatin signature for enhancers, we predicted a total of 1,423 non-redundant enhancers in these genome regions (Tables 2-6). The systematic, unbiased map of transcriptional regulatory elements in five different cell types allowed us to assess the differential roles of promoters, enhancers and insulators in cell-type specific gene expression. Contrary to expectations, we found that, from a global perspective, the chromatin modifications at promoters were remarkably invariant across cell types. But differences in enrichment of chromatin modifications did occur at a small set of promoters, and these differences correlated with differential gene expression. The binding of insulator protein CTCF to the genome was also nearly identical between different cells. In contrast, the majority of enhancers appeared to be epigenetically marked in a cell-type specific manner, and were enriched near genes with cell-type specific expression. Taken together, these observations strongly indicated that enhancers play important roles in driving cell-type specific gene-expression programs.
The observation that most promoters are commonly associated with active histone modifications in diverse cell types is surprising, and implies that most human promoters adopt a similar chromatin architecture in diverse cell types and lineages. Only a small fraction of the promoters take on different chromatin modifications that correlate with transcriptional changes of these genes. If the majority of the promoters exist in a similar chromatin configuration in different cell types, then what causes each cell to express its unique set of transcriptome? These results can be explained by a model in which the majority of promoters remain open and competent for transcriptional initiation in diverse cell types, but the actual level of transcription is modulated by the enhancers, whose activities are usually restricted to specific cell lineages and developmental stages. Consistent with this model, the enhancers that we identified in the ENCODE regions share several general properties: First, the enhancers are highly enriched near differentially expressed genes; Second, they are often located at considerable distances from active promoters and clustered together; Third, there is a remarkable synergistic relationship between enhancer numbers and differential expression of a gene, implying that single enhancers are often weak and have a small influence on gene expression. This model suggests that activation of cell-type specific gene expression will likely require the action of multiple enhancers.
The complex interaction of transcriptional regulators bound to cis-regulatory elements provides the basis for regulation of gene transcription. However, determining the role of each cis-regulatory element in gene expression has been limited to individual gene studies. Our results provide a large-scale, multiple cell-type view of promoters, enhancers and insulators, revealing important aspects of regulatory mechanisms, such as invariable insulator binding and highly specific enhancers that modulate the level of expression from promoters within CTCF blocks. The highly invariant nature of CTCF binding across this diverse assortment of cell types suggests that insulator binding is likely a stable feature of all human cells. This degree of consistency is higher than expected from our previous genome-wide study [37]. The results are indicative of genome-wide trends, and will provide the basis for the expansion of studies to include additional cell types, tissues, and organisms to define their regulatory networks.
The results and observations with respect to enhancers described herein lend themselves to application to various novel methods of monitoring and analysis in connection with the genome.
One aspect of the present invention is a method for identifying enhancer elements by analyzing portions of the genome for chromatin signatures found to be particularly associated with enhancers. Particular characteristics of the signatures associated with enhancers have been found to be enrichment in histone H3 lysine 4 monomethylation (H3K4Me1) and histone H3 lysine 27 acetylation (H3K27Ac). Other characteristics are enrichment in HS sites and overlap with transcription-factor binding sites, most particularly p300 binding sites. The analysis methods for enhancer-element identification employ, inter alia, ChIP-chip and ChIP-Seq analyses; antibodies against the desired transcription factors and modified histones; and digestion with DNase I.
In a further embodiment of the invention, the identification of enhancer elements provides for the analysis of the distribution of enhancers using computational clustering analysis. This enables the identification of differentially expressed and differentially unexpressed genes. This is a particularly powerful tool given our discovery that the effect of multiple enhancers is synergistic.
Not only have we discovered that enhancer signatures have features in common that enable the distinguishing of enhancers from promoters and other regulatory elements, but we have also discovered, as described above, that the enhancer signatures differ from each other on a cell-type specific basis within a given organism. Furthermore, again, we have demonstrated a correlation between differential gene expression and changes in enhancer numbers.
Accordingly, another aspect of the invention is the use of these tools in the diagnosis, prognosis and monitoring of disease, particularly cancer. However, the invention is by no means confined to methods useful in connection with cancer. Using techniques described herein, the characteristic enhancer signatures for both cancer cells and cells associated with other disease states can be identified. The diagnostic, prognostic and monitoring methods enabled by the disclosure herein involve analyzing chromatin samples from subjects for their signatures. This analysis is performed using the ChIP-chip analysis procedure described previously herein. Alternatively, the analysis can be performed using a ChIP-Seq procedure, whereby chromatin immunoprecipitation is combined with ultra high-throughput massive parallel sequencing. This procedure can be carried out as described by Jothi et al. [48] and Barski et al. [49]. Enhancer signatures are identified and further characterized by comparison with previously observed signatures known to be associated with particular cell types associated with disease states and the levels of gene expression in those cell types. The consequent identification of cell types and expression affords a basis for predicting disease states, diagnosing disease states and, in the latter case, monitoring the progress of the diseases and determining the appropriate parameters for treatment.
More particularly, one aspect of the invention is a diagnostic method for cancer and other diseases in a patient, comprising the steps of:
This diagnostic method may well also lend itself to further diagnostic/predictive studies. The methodology described can be employed to determine if there is a significant correlation between a quantity of cancer- or other-disease-associated enhancers below a set threshold and absence of the cancer or other disease in a patient and/or a correlation between such a threshold quantity and the diminished likelihood that the patient will get the cancer or other disease.
Another aspect of the invention is a prognostic method for cancer or another disease state in a patient known already to have such a condition, comprising the steps of:
Another aspect of the invention following from the prognostic method described immediately above is a method for monitoring the progress of treatment of a patient having cancer or another disease state, comprising the steps of:
Yet another aspect of the invention is a method for the identification of differentially expressed and differentially repressed genes in a genome segment from a particular cell type of a host, which comprises employing the techniques described herein previously for finding enhancer elements in a genome segment, followed by the further steps of:
Table 1: ChIP-chip enrichment values across different cell types are much more highly correlated at Gencode promoters and CTCF binding sites than at p300 binding sites and predicted enhancers.
Table 2: Predicted enhancers in HeLa. The first column is the ENCODE region, the second column is the hg17 chromosomal coordinate of the predicted enhancer, and the third column indicates the chromosome where the enhancer is found.
Table 3: Predicted enhancers in GM. The first column is the ENCODE region, the second column is the hg17 chromosomal coordinate of the predicted enhancer, and the third column indicates the chromosome where the enhancer is found.
Table 4: Predicted enhancers in K562. The first column is the ENCODE region, the second column is the hg17 chromosomal coordinate of the predicted enhancer, and the third column indicates the chromosome where the enhancer is found.
Table 5: Predicted enhancers in ES. The first column is the ENCODE region, the second column is the hg17 chromosomal coordinate of the predicted enhancer, and the third column indicates the chromosome where the enhancer is found.
Table 6: Predicted enhancers in dES. The first column is the ENCODE region, the second column is the hg17 chromosomal coordinate of the predicted enhancer, and the third column indicates the chromosome where the enhancer is found.
This application claims the benefit of U.S. Provisional Application No. 60/982,845, filed Oct. 26, 2007.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2008/012086 | 10/24/2008 | WO | 00 | 4/23/2010 |
Number | Date | Country | |
---|---|---|---|
60982845 | Oct 2007 | US |