Currently many efforts are underway to identify new “biomarkers” for cancer, which will facilitate more accurate diagnosis, classification, and therapeutic responses to cancer. While there are many studies of specific changes in proteins, mRNAs, microRNAs, or DNA methylation in cancer, studies using repeat RNAs were essentially unknown, since they are usually thought of as transcriptionally inert genomic elements. It is generally not considered that specific types of repeats may be expressed in cancer, despite the abundant literature suggesting that they are commonly hypomethylated during carcinogenesis. In fact, almost all genomic studies mask out the repeat sequences from their analyses, therefore precluding the possibility of discovering aberrations in repeat expression. About half of the human genome encodes repeat sequences of varying sorts, the function of which is largely unknown.
Much attention has been focused recently on the silencing of tumor suppressor genes in cancers by hypermethylation (epigenetics) instead of DNA mutation. However, these studies recognize a major paradox: hypermethylation often occurs in the context of broader genomic hypomethylation, including at centric/pericentric satellites. Despite its abundance, satellite II (Sat II) repeats found within the pericentromere of many chromosomes have no known function in normal cells or in disease. In fact several studies have noted hypomethylation of Sat II in cancer, but this is not presumed to have a functional impact, but rather may be considered secondary to the clearer functional implications of tumor suppressor gene hypermethylation and silencing. The hypermethylation of some regions of the nucleus in the same cell exhibiting widespread hypomethylation suggests a dramatic imbalance in the epigenome, which may not be explained by simple overexpression or reduction in a biomarker or regulatory factor.
Polycomb group (PcG) proteins are a family of master epigenetic regulators that control most early developmental pathways, primarily through repressive chromatin modifications, and are also involved in the formation and maintenance of constitutive peri/centric satellite heterochromatin. Polycomb repressive complex 2 (PRC2) includes the EZH2 protein, which introduces trimethylation of histone H3 lysine 27, whereas polycomb repressive complex 1 (PRC1) includes BMI-1, RING1B and Phc-1, and promotes histone ubiquitination, DNA compaction and other modifications. In mammalian cells, prominent PcG bodies have previously been described; however, they are widely considered to be part of normal nuclear structure and are currently studied as such, although studies are primarily conducted on cancer cell lines, which are presumed to reflect normal nuclear structure. BMI-1 is a key component of PRC1 linked to cell proliferation, senescence, self-renewal and tumor suppressor gene regulation (Ink4a/Arf), and is over-expressed in several tumor types. Although BMI-1 over-expression is linked to cancer progression and prognosis, its role is complex and currently unresolved, despite intense study.
There still exists a need for cancer biomarkers that can be used for surveillance, recognition and proper classification of different cancers and for designing/evaluating therapeutic interventions.
The invention relates to a first method of diagnosing, or providing a prognostic indicator of, cancer (e.g., metastatic cancer or a cancer selected from breast cancer (e.g., adenocarcinoma, ductal carcinoma, lobular carcinoma, metaplastic carcinoma, and papillary carcinoma), ovarian cancer (e.g., adenocarcinoma and carcinoma (metastatic)), Wilms tumor, multiple myeloma, brain cancer (e.g., glioblastoma), kidney cancer (e.g., renal cell carcinoma), lung cancer (e.g., squamous cell carcinoma), fibrosarcoma, prostate cancer (e.g., adenocarcinoma), stomach cancer (e.g., adenocarcinoma and gastrointestinal stromal tumor (GIST)), thyroid cancer (e.g., papillary carcinoma), bone cancer, colon cancer (e.g., adenocarcinoma), pancreatic cancer (e.g., serous cystadenoma (benign)), or cervical cancer) in a mammal (e.g., a human) by detecting at least one (or two or more) biomarker(s) selected from a satellite II ribonucleic acid (RNA) molecule, a cancer-associated polycomb group (CAP) body, and a cancer-associated satellite transcript (CAST) body in a sample from the mammal. In several embodiments, an increase in the level of expression of the satellite II RNA molecule in a cell of the sample, relative to the level of expression of the satellite II RNA molecule in a normal cell, or abnormal nuclear compartmentalization of the CAP body or the CAST body in a cell of the sample, relative to nuclear compartmentalization of the CAP body or the CAST body in a normal cell, indicates the sample includes at least one (or two or more) cancer cell(s). In another embodiment, the method includes detecting the level of expression of the CAP or CAST body and the satellite II ribonucleic acid (RNA) molecule in the sample.
The invention also relates to a second method for identifying an agent for the treatment of a cancer (e.g., metastatic cancer or a cancer selected from breast cancer (e.g., adenocarcinoma, ductal carcinoma, lobular carcinoma, metaplastic carcinoma, and papillary carcinoma), ovarian cancer (e.g., adenocarcinoma and carcinoma (metastatic)), Wilms tumor, multiple myeloma, brain cancer (e.g., glioblastoma), kidney cancer (e.g., renal cell carcinoma), lung cancer (e.g., squamous cell carcinoma), fibrosarcoma, prostate cancer (e.g., adenocarcinoma), stomach cancer (e.g., adenocarcinoma and gastrointestinal stromal tumor (GIST)), thyroid cancer (e.g., papillary carcinoma), bone cancer, colon cancer (e.g., adenocarcinoma), pancreatic cancer (e.g., serous cystadenoma (benign)), or cervical cancer) in a mammal (e.g., a human) by contacting a cancer cell that includes at least one (or two or more) biomarker(s) selected from a cancer-associated polycomb group (CAP) body, a cancer-associated satellite transcript (CAST) body, or a satellite II RNA molecule with a test agent and determining whether the test agent reduces the level of the biomarker. In an embodiment, the method includes detecting a reduction in the formation of the CAP body or CAST body, or a reduction in expression of the satellite II RNA molecule, in the cancer cell following contact with the test agent, in which a reduction in the level of the biomarker in the cancer cell, relative to the level of the biomarker in a cancer cell not contacted with the test agent, indicates that the test agent is suitable for the treatment of the cancer.
The invention also relates to a third method for determining whether a chemotherapeutic agent increases epigenetic imbalance in a cell(s) of a mammal (e.g., a human) by contacting a sample that includes the cell(s) with a chemotherapeutic agent and determining a level of one (or two or more) biomarker(s) selected from a cancer-associated polycomb group (CAP) body, a cancer-associated satellite transcript (CAST) body, and a satellite II RNA molecule in the cell. In an embodiment, an increase in the level of the biomarker(s) in the cell(s), relative to the level of the biomarker in a cell(s) not contacted with the chemotherapeutic agent, indicates that the chemotherapeutic agent increases epigenetic imbalance in the cell(s). In another embodiment, the increase in the level of the biomarker(s) indicates the chemotherapeutic agent increases a risk of cancer in the mammal (e.g., the increase in the level of the biomarker(s) indicates an increased risk the cancer will become more aggressive).
The invention also relates to a fourth method for diagnosing, or providing a prognostic indicator of, cancer (e.g., metastatic cancer or a cancer selected from breast cancer (e.g., adenocarcinoma, ductal carcinoma, lobular carcinoma, metaplastic carcinoma, and papillary carcinoma), ovarian cancer (e.g., adenocarcinoma and carcinoma (metastatic)), Wilms tumor, multiple myeloma, brain cancer (e.g., glioblastoma), kidney cancer (e.g., renal cell carcinoma), lung cancer (e.g., squamous cell carcinoma), fibrosarcoma, prostate cancer (e.g., adenocarcinoma), stomach cancer (e.g., adenocarcinoma and gastrointestinal stromal tumor (GIST)), thyroid cancer (e.g., papillary carcinoma), bone cancer, colon cancer (e.g., adenocarcinoma), pancreatic cancer (e.g., serous cystadenoma (benign)), or cervical cancer) in a mammal (e.g., a human) by detecting, in a cell present in a sample from the mammal, one or more of a change in the ubiquitination status of histone H2A, the presence of a biomarker selected from a mutant BRCA1 protein that exhibits an impaired ability to monoubiquitylate histone H2A, relative to wild-type BRCA1 protein, or a mutant BRCA1 gene that encodes the mutant BRCA1 protein, or an altered distribution of UbH2A or PRC1 complex, each of which is relative to a normal cell. In preferred embodiments, the change in histone H2A ubiquitination status is altered (e.g., unbalanced) distribution of ubiquitinated histone H2A (UbH2A) relative to a normal cell (e.g., an increase in UbH2A foci relative to UbH2A foci in a normal cell). In another embodiment, the altered distribution of UbH2A is caused by a perturbed distribution of PRC1 complex (or one or more proteins of the PRC1 complex or its associated proteins, such as BMI-1, RING 1B, Phc1, Phc2, CBX4, CBX8, RNF2, GLI1, MYC, CDKN2A, and HST2H2AC), which is known to mediate recruitment of UbH2A to heterochromatin.
The invention also relates to a fifth method for screening an agent for efficacy in a treatment of a cancer in a mammal (e.g., a human) by contacting the agent to either: a) a cell (e.g., a cancer cell) that includes a biomarker selected from a mutant BRCA1 protein that exhibits an impaired ability to monoubiquitylate histone H2A, relative to wild-type BRCA1 protein, or a mutant BRCA1 gene that encodes the mutant BRCA1 protein; or b) a cell (e.g., a cancer cell) that exhibits, as a biomarker, a decreased level of monoubiquitylated histone H2A, relative to, e.g., a wild-type BRCA1-expressing cell, and determining whether the agent increases the monoubiquitylation of histone H2A in the cell.
The invention also relates to a sixth method for determining whether a chemotherapeutic agent increases epigenetic imbalance in a cell (e.g., a non-cancer cell) of a mammal (e.g., a human) by contacting the cell with the chemotherapeutic agent and determining a level of monoubiquitylation of histone H2A as a biomarker in the cell. A determination that the chemotherapeutic agent decreases the level of monoubiquitylation of histone H2A in the cell, relative to a cell not contacted with the chemotherapeutic agent, indicates that the chemotherapeutic agent causes an increase in epigenetic imbalance and should not be administered to the mammal as a treatment of cancer.
The invention further relates to a seventh method for diagnosing, or providing a prognostic indicator of, cancer (e.g., metastatic cancer or a cancer selected from breast cancer (e.g., adenocarcinoma, ductal carcinoma, lobular carcinoma, metaplastic carcinoma, and papillary carcinoma), ovarian cancer (e.g., adenocarcinoma and carcinoma (metastatic)), Wilms tumor, multiple myeloma, brain cancer (e.g., glioblastoma), kidney cancer (e.g., renal cell carcinoma), lung cancer (e.g., squamous cell carcinoma), fibrosarcoma, prostate cancer (e.g., adenocarcinoma), stomach cancer (e.g., adenocarcinoma and gastrointestinal stromal tumor (GIST)), thyroid cancer (e.g., papillary carcinoma), bone cancer, colon cancer (e.g., adenocarcinoma), pancreatic cancer (e.g., serous cystadenoma (benign)), or cervical cancer) in a mammal (e.g., a human) by detecting, as a biomarker, the ubiquitination status of histone H2A and/or the distribution of a heterochromatic marker (e.g., ubiquitinated histone H2A (UbH2A), H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in a cell of the mammal. In an embodiment, the distribution of the heterochromatic marker is unbalanced (e.g., prominent foci of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) are apparent in a cell of the mammal suspected of being a cancer cell (e.g., within the same nucleus some regions exhibit prominent foci of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) and other regions exhibit little to no foci), but not in normal cells). In yet another embodiment, an unbalanced distribution of the heterochromatic marker can be determined upon visual detection using, e.g., a microscope, or using an automated system (e.g., quantification using an automated platform). The method can be performed using, e.g., chromatin immunoprecipitation (ChIP) or a ChIP sequence (ChIP-level. The presence of a cancer cell in the sample can be based upon the observation of a characteristic “patchy” (much less evenly distributed) pattern in the nucleus of the cell. Thus, the overall distribution of a heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) shows “imbalance” in the nucleus, which may impact a variety of other genes and regulator proteins (tumor supressors, oncogenes etc.) in the cell. In another embodiment, the unbalanced heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) is present on on Sat II 1q12 and/or 16q11. In still other embodiments, detection of an imbalance of a heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in the nucleus indicates the likelihood of a cancer cell (e.g., a cell that exhibits uncontrolled growth, metastasis, drug resistance, etc.) in the sample or the likelihood that a cell in the patient will progress to a cancer state (e.g., an aggressive cancer state). In another embodiment, the method is performed using a sample that includes at least one cell from a subject at risk from cancer. In a preferred embodiment, the method includes the use of a microarray to detect the ubiquitin status of H2A and/or the distribution of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in a cell of the subject. In yet another embodiment, the detection of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in a cell of a subject, relative to the distribution of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in a normal cell, is determined using an antibody that specifically binds to the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A). In another embodiment, detection of a “patchy” distribution of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A), as seen by, e.g., ChIP, in a cell of a subject, relative to the distribution of the heterochromatic marker (e.g., one or more of UbH2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in a normal cell, indicates the mammal has a cancer.
The invention also relates to a eighth method for detecting epigenetic imbalance in a cell present in a sample from a mammal (e.g., a human) by determining a copy number of a satellite II DNA locus at chromosome 1q12 in the cell or the level of polycomb proteins on a satellite II DNA locus at chromosome 1q12 in the cell. In an embodiment, an increase in the copy number of, or the amount of polycomb protein on, the satellite II DNA locus at chromosome 1q12 in the cell indicates the cell has epigenetic imbalance. In another embodiment, detection of the epigenetic imbalance in the cell indicates an increased risk of cancer in the mammal.
The invention also relates to a ninth method for diagnosing, or providing a prognostic indicator of, immunodeficiency, centromeric region instability, and facial anomalies syndrome (ICF), which is a rare chromosome breakage disease caused by mutations in the methyl transferase DNMT3B enzyme. The diagnostic characteristics of ICF are agammaglobulinemia with B cells as well as DNA rearrangements targeted to the centromere-adjacent heterochromatic region (qh) of chromosomes 1, 16, and sometimes 9 in mitogen-stimulated lymphocytes. These rearrangement-prone regions show DNA hypomethylation in all examined ICF cell populations. The method includes detecting CAP body formation, as a biomarker, in a cell present in a sample from a mammal (e.g., a human). In an embodiment, CAP body formation is due to demethylation of Sat II DNA on 1q12. In another embodiment, detection of CAP body formation in a cell of the mammal indicates that the mammal has ICF.
In embodiments of the first, second, third, seventh, eighth, and ninth methods, the method further includes detecting, in a cell of the sample, a biomarker selected from one or more of a) an unbalanced distribution of one or more polycomb proteins (resulting in, e.g., an impaired ability to monoubiquitylate histone H2A or an unbalanced distribution of heterchromatic markers), relative to the distribution in a normal cell; b) an unbalanced distribution of a heterochromatic marker (e.g., one or more of monoubiquitylated histone H2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in the nucleus of a cell in the sample, relative to the distribution in a normal cell (e.g., an increase or decrease in the amount of the heterochromatic marker present in the nucleus, or of a redistribution of the heterochromatic marker into prominent foci that are, e.g., largely absent in normal cells; and c) a mutant BRCA1 protein that exhibits an impaired ability to monoubiquitylate histone H2A, relative to wild-type BRCA1 protein, or a mutant BRCA1 gene that encodes the mutant BRCA1 protein, relative to a normal cell. In an embodiment, the detecting step includes, e.g., detecting the distribution, level, or presence of the biomarker(s).
In embodiments of the first, second, third, and ninth methods, the CAP body includes a satellite II deoxyribonucleic acid (DNA) molecule and/or the CAP body includes a polycomb group protein (e.g., the polycomb group protein is a PRC1 or PRC2 complex protein; in particular, the PRC1 complex protein is selected from BMI-1, RING 1B, Phc1, Phc2, CBX4, CBX8, and RNF2 or the PRC2 complex protein is one or more of SUZ12, EED, RBBP4, JARID2, EZH2, EZH1, and RBBP7) or a protein that interacts with the PRC1 complex (e.g., GLI1, MYC, CDKN2A, and HST2H2AC). In other embodiments, the CAP body is present at the 1q12 or 16q11 DNA locus in the nucleus of cell(s) of the sample.
In an embodiment of the first, second, and third methods, the detection of satellite II RNA is by direct visual analysis of cell(s) by microscopy following binding of a detection reagent (e.g., a labeled nucleic acid or LNA probe) to satellite II RNA in the cell(s) of the sample. In another embodiment, the detection of satellite II RNA includes quantifying the amount present in the nucleus of a cell(s) of the sample or its distribution within the nucleus. In still other embodiments, the satellite II RNA is quantified by digital microfluorimetry. In yet other embodiments, the amount of satellite II RNA detected in a cancer cell is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 fold higher than in a normal cell, more preferably 15, 20, 25, 30, 35, 40, 45, or 50 fold higher than in a normal cell, and most preferably 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, or 350 fold or more higher than in a normal cell (e.g., about 175 fold higher than in a normal cell). In an embodiment, the prominent aberrant foci of satellite II RNA are a unique “signature” of cancer cells, which can mark even a single cancer cell as distinct from normal, by direct visual analysis or quantitative digital microscopy.
In other embodiments of the first, second, third, fourth, fifth, sixth, seventh, eighth, and ninth methods, the difference in signal (CAP, CAST and UbH2A) between cancer and normal cells can be reduced to two parameters that are clearly visible by eye and/or can be easily quantified by one with skill in the art. They are “distribution” and “intensity.” The distribution of these biomarkers is clearly visibly different for cancer cells and easily differentiates cancer cells from normal cells (e.g., in in vitro, in situ, and ChIP results). The highest intensity signal (pixel intensity by microscopy, and peak height for ChIP) in a cancer nucleus is higher than any signal in a normal cell for these marks and can be quantified (as discussed above).
In other embodiments of the first, second, and third methods, the CAST body includes the satellite II ribonucleic acid (RNA) molecule, e.g., a cytosine methylated satellite II RNA molecule, and/or the CAST body includes proteins containing an RNA binding domain and/or proteins that are involved in RNA metabolism, such as a methyl DNA binding protein (e.g., the methyl DNA binding protein is methyl CpG (cytosine phosphate guanine) binding protein 2 (MeCP2)), a protein known to interact with MeCP2 (e.g., one or more of SIN3A, CDKL5, DNMT1, HDAC1, ATRX, DNMT3B, SMARCA2, DLX5, BDNF, and UBE3A), or a protein known to become sequestered on similar repeat RNA aggregates in microsatellite repeat diseases (e.g., one or more of MBNL 1, 2, and 3, hnRNP H, G, A, and K, proteosome 20Sα, 11Sγ and 11sα subunits, Y12, Y14, 9G8, snRNP Sm antigen, SAM68, SLM 1 and 2, Tra2β, Purα, and CPEB proteins).
In other embodiments of the first, second, and third methods, the CAST body includes an alpha-satellite RNA.
In embodiments of the first, second, third, fourth, fifth, sixth, seventh, eighth, and ninth methods, the method may include detecting the biomarker(s) using a serum screen or detecting one or more of the biomarker(a) (e.g., the satellite II RNA molecule or the UbH2A) using reverse transcriptase polymerase chain reaction (RT-PCR; e.g., quantitate real-time PCR), a microarray, a deep sequencing assay (e.g., a ChIP-Seq assay), or microscopy. The satellite II RNA molecule detection assay may utilize a nucleic acid molecule or a locked-nucleic acid (LNA) oligo as a probe (unbound or bound to a solid support). In other embodiments, the method may involve detecting the Satellite II RNA molecule using a probe having at least 50% (e.g., 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99%, or 100%) sequence identity (preferably 80% or more sequence identity) over at least 20 or more (e.g., 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more) consecutive nucleotides of one or more of SEQ ID NOs: 14 to 28. In an embodiment, the probe is capable of specifically hybridizing under stringent conditions to a nucleic acid molecule having the sequence of one or more of SEQ ID NOs: 14-28. In an embodiment, the detecting step includes, e.g., detecting one or more of the distribution, level, or presence of the biomarker(s) in the nucleus of at least one cell in the sample.
In still other embodiments of the first, second, third, fourth, fifth, sixth, seventh, eighth, and ninth methods, the method may include detecting the biomarker(s) (e.g., detecting one or more of the distribution, level, or presence of the biomarker(s)) using radioimmunoassay (RIA), enzyme-linked immunosorbent assay (ELISA), immunoblotting, immunoprecipitation, or microscopy (e.g., the microscopy is in situ fluorescence microscopy, such as immunofluorescence microscopy, indirect-immunofluorescence, immunocytochemistry, or immunohistochemistry). In another embodiment, the method may include detecting the CAP body using microscopy (e.g., the microscopy is in situ fluorescence microscopy, such as immunofluorescence microscopy, indirect-immunofluorescence, immunocytochemistry, or immunohistochemistry). Immunoprecipitation used in either method may be chromatin immunoprecipitation (e.g., the chromatin immunoprecipitation may include one or more of the following step: digesting the genome of the cell(s) in the sample, contacting an antibody that specifically binds one or more proteins of the CAP body to the digested genome in the sample, separating an antibody/CAP body/chromatin complex that includes DNA from the sample, and/or sequencing the DNA from the antibody/CAP body/chromatin complex (e.g., the presence of a satellite II DNA sequence within the antibody/CAP body/chromatin complex indicates the sample includes the cancer cell(s)). In still other embodiments, the immunoprecipitation used in the method may include one or more of the following steps: digesting the genome of the cell(s) in the sample, contacting a nucleic acid molecule complementary to and specific for a satellite II DNA sequence to the digested genome to form a hybridization complex, separating the hybridization complex from the sample, and/or contacting one or more components of the hybridization complex with an antibody that specifically binds to one or more proteins of the CAP body (e.g., binding of the antibody to one or more of the proteins of said CAP body indicates the sample includes the cancer cell(s)). The methods can also include quantification of the amount of the biomarker(s), e.g., using an automated pathology platform. The quantification may be digital quantification.
In other embodiments of the first, second, third methods, the method may include detecting the satellite II RNA molecule or the alpha-satellite RNA molecule in the sample using a method selected from a microarray, RNA fluorescence in situ hybridization (FISH), northern blot, polymerase chain reaction (PCR), RNA sequencing, and microscopy. In still other embodiments of the first, second, third, and ninth methods, detecting the satellite II DNA molecule in the sample may include a method selected from a microarray, DNA fluorescence in situ hybridization (FISH), Southern blot, polymerase chain reaction (PCR), DNA sequencing, and microscopy. In an embodiment, the detecting step includes, e.g., one or more of detecting the distribution, level, or presence of the biomarker(s).
In yet other embodiments of the first, second, third, fourth, fifth, sixth, seventh, and ninth methods, the biomarker(s) is detected with one or more antibodies (e.g., one or more antibodies to at least one CAP body protein, at least one CAST body protein, or at least one heterochromatic marker (e.g., one or more of histone H2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A)). In other embodiments, the methods include detection of at least two proteins (e.g., three, four, five or more proteins) of the CAP or CAST bodies using two antibodies (or a number of antibodies commensurate with the number of proteins to be detected), each of which is capable of specifically binding to a different CAP or CAST body protein. For example, detection of the CAP or CAST bodies may include the use of a first antibody that is capable of specifically binding to a first protein in the CAP or CAST body, and a second antibody that is capable of specifically binding to a second, different protein in the CAP or CAST body. In particular embodiments, the methods include the use of, e.g., one or more (e.g., two, three, four, five, or more) antibodies that specifically bind one or more of the polycomb group protein(s) of the CAP body, such as the PRC1 or PRC2 complex protein(s) or their associated protein(s) (for example, one or more of BMI-1, RING 1B, Phc1, Phc2, CBX4, CBX8, RNF2, SUZ12, EED, RBBP4, JARID2, EZH2, EZH1, RBBP7, GLI1, MYC, CDKN2A, or HST2H2AC), or one or more (e.g., two, three, four, five, or more) antibodies that specifically bind one or more proteins of the CAST body (for example, one or more of MeCP2, SIN3A, CDKL5, DNMT1, HDAC1, ATRX, DNMT3B, SMARCA2, DLX5, BDNF, UBE3A, MBNL 1, 2, and 3, hnRNP H, G, A, and K, proteosome 20Sα, 11Sγ and 11sα subunits, Y12, Y14, 9G8, snRNP Sm antigen, SAM68, SLM 1 and 2, Tra2β, Purα, or CPEB proteins), or one or more (e.g., two, three, four, five, or more) antibodies that specifically bind histone H2A).
In other embodiments of the first, second, and third methods, the satellite II RNA molecule or the alpha-satellite RNA molecule is detected using a probe (e.g., a probe having a sequence with at least 50% (e.g., 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99%, or 100%) sequence identity (preferably 80% or more sequence identity) to a sequence that is complementary to, and specific for, a Sat II RNA, such as a probe selected from Sat2-24 nt LNA, Sat2-24 nt, Sat2-59 nt, and Sat2-169 bp, or a probe having a sequence with at least 50% (e.g., 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99%, or 100%) sequence identity (preferably 80% or more sequence identity) to a sequence that is complementary to, and specific for, an alpha-satellite RNA, such as HuAlphaSat). In other embodiments, the probe has a sequence with at least 80% sequence identity to the sequence of SEQ ID NOs: 2 to 10, or its complement. In still other embodiments, the probe includes a sequence having at least 50% (e.g., 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99%, or 100%) sequence identity (preferably 80% or more sequence identity) to a sequence of at least 20 consecutive nucleotides (e.g., at least 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more, or the entire sequence) set forth in SEQ ID NOs: 14 to 28. In another embodiment, the probe is capable of specifically hybridizing under stringent conditions to a nucleic acid molecule having the sequence of one or more of SEQ ID NOs: 14-28. In yet another embodiment, the probe is an LNA probe. The LNA probe optionally has at least 50% (e.g., 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99%, or 100%) sequence identity to the complement of the target nucleic acid molecule sequence. In other embodiments, hybridization of the probe to the satellite II RNA molecule or the alpha-satellite RNA molecule is detected by microscopy.
In other embodiments of the first, second, third, fourth, fifth, sixth, seventh, eighth, and ninth methods, the sample includes an organ, tissue, cell, bodily fluid (e.g., saliva, serum, plasma, blood, urine, mucus, gastric juices, pancreatic juices, semen, products of lactation or menstruation, tears, or lymph), lavage (e.g., bronchalveolar lavage, a gastric lavage, a peritoneal lavage, a vaginal lavage, a colonic or rectal lavage, an arthroscopic lavage, a ductal lavage, or an ear lavage), skin, hair, or fecal matter from the mammal.
By “sequence identity” or “sequence similarity” is meant that the identity or similarity between two or more amino acid sequences, or two or more nucleotide sequences, is expressed in terms of the identity or similarity between the sequences. Sequence identity can be measured in terms of percentage identity; the higher the percentage, the more identical the sequences are. Sequence similarity can be measured in terms of percentage similarity (which takes into account conservative amino acid substitutions); the higher the percentage, the more similar the sequences are. Homologs or orthologs of nucleic acid or amino acid sequences possess a relatively high degree of sequence identity/similarity when aligned using standard methods.
Methods of alignment of sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981; Needleman & Wunsch, J. Mol. Biol. 48:443, 1970; Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al., Nuc. Acids Res. 16:10881-90, 1988; Huang et al. Computer Appls. in the Biosciences 8, 155-65, 1992; and Pearson et al., Meth. Mol. Bio. 24:307-31, 1994. Altschul et al., J. Mol. Biol. 215:403-10, 1990, presents a detailed consideration of sequence alignment methods and homology calculations.
The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., J. Mol. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biological Information (NCBI, National Library of Medicine, Building 38A, Room 8N805, Bethesda, Md. 20894) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. These software programs match similar sequences by assigning degrees of homology to various substitutions, deletions, and other modifications. Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. Additional information can be found at the NCBI web site.
BLASTN is used to compare nucleic acid sequences, while BLASTP is used to compare amino acid sequences. To compare two nucleic acid sequences, the options can be set as follows: −i is set to a file containing the first nucleic acid sequence to be compared (such as C:\seq1.txt); −j is set to a file containing the second nucleic acid sequence to be compared (such as C:\seq2.txt); −p is set to blastn; −o is set to any desired file name (such as C:\output.txt); −q is set to −1; −r is set to 2; and all other options are left at their default setting. For example, the following command can be used to generate an output file containing a comparison between two sequences: C:\B12seq c:\seq1.txt −j c:\seq2.txt −p blastn −o c:\output.txt −q −1 −r 2.
To compare two amino acid sequences, the options of B12seq can be set as follows: −i is set to a file containing the first amino acid sequence to be compared (such as C:\seq1.txt); −j is set to a file containing the second amino acid sequence to be compared (such as C:\seq2.txt); −p is set to blastp; −o is set to any desired file name (such as C:\output.txt); and all other options are left at their default setting. For example, the following command can be used to generate an output file containing a comparison between two amino acid sequences: C:\B12seq c:\seq1.txt −j c:\seq2.txt −p blastp −o c:\output.txt. If the two compared sequences share homology, then the designated output file will present those regions of homology as aligned sequences. If the two compared sequences do not share homology, then the designated output file will not present aligned sequences.
Once aligned, the number of matches is determined by counting the number of positions where an identical amino acid or nucleotide residue is presented in both sequences. The percent sequence identity is determined by dividing the number of matches either by the length of the sequence set forth in the identified sequence, or by an articulated length (such as 100 consecutive nucleotides or amino acid residues from a sequence set forth in an identified sequence), followed by multiplying the resulting value by 100. For example, a nucleic acid sequence that has 1166 matches when aligned with a test sequence having 1154 nucleotides is 75.0 percent identical to the test sequence (i.e., 1166=1554*100=75.0). The length value will always be an integer. For polypeptides, the length of comparison sequences will generally be at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 50, 75, 90, 100, 150, 200, 250, 300, or 350 contiguous amino acids. For nucleic acids, the length of comparison sequences will generally be at least 5 contiguous nucleotides, preferably at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 contiguous nucleotides, and most preferably the full length nucleotide sequence. By “specifically binds” is meant the preferential association of a binding moiety (e.g., an antibody or fragment thereof) to a target molecule (e.g., a polycomb group protein of the CAP body, such as a PRC1 or PRC2 complex protein or an associated protein (for example, BMI-1, RING 1B, Phc1, Phc2, CBX4, CBX8, RNF2, SUZ12, EED, RBBP4, JARID2, EZH2, EZH1, RBBP7, GLI1, MYC, CDKN2A, and HST2H2AC), a protein of the CAST body (for example, MeCP2, SIN3A, CDKL5, DNMT1, HDAC1, ATRX, DNMT3B, SMARCA2, DLX5, BDNF, UBE3A, MBNL 1, 2, and 3, hnRNP H, G, A, and K, proteosome 20Sα, 11Sγ and 11sα subunits, Y12, Y14, 9G8, snRNP Sm antigen, SAM68, SLM 1 and 2, Tra2β, Purα, and CPEB protein), or histone H2A) in a sample (e.g., a biological sample) or in vivo or ex vivo. It is recognized that a certain degree of non-specific interaction may occur between a binding moiety and a non-target molecule. Nevertheless, specific binding may be distinguished as mediated through specific recognition of the target molecule. Specific binding results in a stronger association between the binding moiety (e.g., an antibody or fragment thereof) and, e.g., an antigen (e.g., a CAP body protein, a CAST body protein, or histone H2A) than between the binding moiety and, e.g., a non-target molecule (e.g., a non-CAP body protein, a non-CAST body protein, or non-histone H2A protein). For example, an antibody specifically binds if it has, e.g., at least 2-fold greater affinity (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 102-, 103-, 104-, 105-, 106-, 107-, 108-, 109-, or 1010-fold greater affinity) to an epitope of a CAP body protein, a CAST body protein, or histone H2A than to polypeptides other than a CAP body protein, a CAST body protein, or histone H2A.
By “stringent conditions” is meant conditions under which an oligonucleotide probe will selectively or specifically hybridize to its target sequence (e.g., a satellite II RNA or DNA sequence), typically in a complex mixture of nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and length-dependent. Generally, stringent conditions are selected to be about 5° C. to about 25° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. Stringent conditions may also include destabilizing agents, such as formamide. For selective or specific hybridization, a positive signal is at least two times background, preferably 10 times background hybridization. Exemplary stringent conditions include: 50% formamide, 4×SSC, and 1% SDS, incubating at 42° C.; and 4×SSC, 1% SDS, incubating at 65° C., with wash in 0.2×SSC, and 0.1% SDS at 65° C. Hybridization techniques are generally described in Nucleic Acid Hybridization, A Practical Approach (eds. B. D. Hames and S. J. Higgins, IRL Press, 1985); Tijssen, “Overview of principles of hybridization and the strategy of nucleic acid assays” in Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization with Nucleic Probes (ed. P. C. van der Vliet, Elsevier Science Publishers B.V., 1993); PCR Protocols, A Guide to Methods and Applications (eds. M. A. Innis et al., Academic Press, Inc., New York, 1990); Gall and Pardue, Proc. Natl. Acad. Sci., USA 63:378-383, 1969; and John et al., Nature 223:582-587, 1969.
Other features and advantages of the invention will be apparent from the following Detailed Description, the drawings, and the claims.
Currently pathologists rely on changes in nuclear morphology to facilitate diagnosis of many cancers, but this is a relatively crude assay. Our discovery is that prominent nuclear accumulations of Sat-II RNA are a common property of cancer cells in vitro, and in vivo, reflecting compromised heterochromatic silencing in cancer cells, and that these RNA accumulations are capable of sequestering large amounts of regulatory proteins, which may further affect the cancer epigenome. Thus, we discovered that the mis-regulation of satellite RNAs is a characteristic “signature” of cancer cells. Our discovery suggests that gross over-expression of certain repeat RNAs is a common and robust manifestation of cancer cells, which differentiates it from normal cells. This usually involves the over-expression of satellite II (Sat II) RNA primarily, but there are also indications that other satellite sequences may be mis-regulated in cancers as well, such as alpha-satellite RNA.
Thus, a first aspect of the invention features the use of Sat II RNA as a biomarker for diagnosing cancer (e.g., metastatic cancer) in a mammal (e.g., a human).
The abundant Sat II repeat transcripts seen in cancer cells are not just inert by-products of epigenetic dysregulation, but can contribute to further imbalance of the epigenome. We find that Sat II RNA foci are associated with large amounts of the methyl-DNA binding protein, MeCP2 in cancer cells. This suggestion that abnormal conglomerations of repeat RNAs could “compartmentalize” nuclear factors, and thereby potentially impact expression of other genes, has strong precedence based on “toxic repeat RNAs” in certain triplet repeat diseases. Nuclear accumulations of mRNA containing CUG repeats sequester MBNL1, an alternative splicing factor, causing inappropriate splicing patterns that generate the Myotonic Dystrophy (DM1) phenotype. It is also notable that MeCP2, like MBNL1, is implicated in alternative splicing, and is also frequently altered in cancer. We reason that the abundant Sat II RNAs in cancer nuclei may have as much or more capacity to “soak up” regulatory factors as do the repeat containing RNA in DM1.
We conducted a broad survey of Cot-1 repeat RNA expression and distribution in human interphase nuclei. While competition with unlabelled Cot-1 DNA (repetitive genomic fraction) is often used to suppress hybridization to repeats, instead we labeled human Cot-1 DNA as a probe to examine the distribution of transcripts from the repeat genome by RNA FISH. In 2002 we were the first to publish that hybridization to Cot-1 RNA provides a convenient assay to evaluate chromosome inactivation within nuclei, and in 2007 used it in a manuscript to reveal breakdown of the peripheral heterochromatic compartment in cancer cells. However, the discovery that repeat RNAs were aberrantly expressed in cancer began when we initially observed large localized foci of Cot-1 RNA in several cancer cell lines in 2002, which were largely absent in normal cells. Because Cot-1 DNA is a complex probe containing several major classes of repeats, in 2005 we began to use probes to specific repeats to better define the content of these large Cot-1 RNA foci.
We found that interspersed repeats like long interspersed elements (LINEs) or short interspersed elements (SINEs) were not responsible for the large repeat RNA foci, and alpha-satellite accounted for some foci in only a few lines, but the majority of Cot-1 RNA foci in most cancer cell lines are comprised primarily of Sat II RNAs. A survey of cell lines shows that several cancer lines, representing different types of cancers (see Tables 2-6 below), exhibit prominent foci of Sat II RNA in the vast majority (70-100%) of cells, while none of the normal lines did. Prominent foci of alpha-sat RNA were also observed in some of the cancer tissues (see, e.g., Tables 3 and 4 below), but not in matched normal tissue. Evidence also suggests this is single-stranded and non-polyadenylated RNA, and shows some expression from the “reverse” strand. Similarly, although RNA preservation was often compromised in human primary samples, we also find large Sat II RNA foci in 5 of 6 malignant human effusions and 0 of 3 benign effusions, and in 5 of 6 solid human tumor samples (from breast, kidney, ovary and pancreas) while none of 3 matched normal samples nor the normal cell types present in the tumor samples had them. Several cancer tissues tested also exhibited prominent foci of Sat II DNA and its associated proteins (see Table 4 below). Thus, we find that gross over expression of satellite RNAs, and the presence bodies associated with Sat II DNA, is a common and previously unrecognized “hallmark” of many cancers.
The Sat II RNA over-expression itself provides a potentially useful biomarker, and indicator of heterochromatic instability, but these repeat RNAs would clearly have additional significance if they actually impact the cell and/or epigenome in some way, like the “toxic repeat RNAs” in certain triplet repeat expansion diseases (see above). We find that the DNA methyl binding protein, MeCP2, which plays a role in mRNA processing and splice site recognition and shows altered expression in cancer, sharply accumulates in several bright nuclear foci in cancer cells, distinct from the more dispersed and punctuate distribution in normal cells. Co-staining showed that MeCP2 foci do not overlap the Sat II DNA, but rather strictly co-localize with Sat II RNA. In most cells every Sat II RNA focus coincides precisely with an MeCP2 focus both in vitro and in vivo. The MeCP2 foci in primary tumor samples are particularly striking. We find many cells exhibit a pattern of one or a few large, round bright MeCP2 “bodies”, often contrasting with a much darker nucleoplasm, while matched normal tissue showed a higher nucleoplasmic stain with a somewhat variable punctuate pattern, but not large bodies against a dark nucleoplasm. Thus, we refer to this dramatic accumulation of MeCP2 at just a few sites as “cancer-associated satellite transcript” (CAST) bodies, and further corroborates the results suggesting that MeCP2 becomes sequestered with Sat II repeat RNAs in cancer lines. Thus, the aberrant accumulations of Sat II repeat RNAs are not without impact on epigenetic factors in the cell, and MeCP2 “CAST” bodies are another potential biomarker that reflects a highly abnormal cancer epigenome.
Thus, a second aspect of the invention features the use of CAST bodies as a biomarker for diagnosing cancer (e.g., metastatic cancer) in a mammal (e.g., a human).
The presence of satellite RNA and MeCP2 foci provide a readout of cancer cell epigenetics, and may provide robust biomarkers for cancer in general with potential diagnostic value. An important challenge in cancer biology is to identify specific, readily assayed changes that occur in neoplastic progression, which may be common to many cancers, specific to particular types, or indicators of progression level (grade). Knowledge of these changes and how to detect them will be vital for surveillance, recognition and proper classification of different cancers and for designing/evaluating therapeutic interventions. A biomarker could be a cellular, genetic or epigenetic change, such as p53 mutations common in many cancers or a marker such as CYP2W1 that is highly expressed in colorectal tumors. While biomarker discovery is an active area of research, we believe the use of “repeat RNA signatures” or MeCP2 “CAST” bodies as a biomarker for cancer would provide further information on the cancer biology and its aberrant epigenome.
Our studies also show that in cancer nuclei, but not normal nuclei, aberrant aggregations of certain PcG proteins are common (in vitro and in vivo), and form on specific Sat II DNA domains, possibly due to changes in their DNA methylation status. We refer to these aggregations as “cancer associated PcG” bodies (CAP bodies). A third aspect of the invention features the use of CAP bodies as a biomarker for diagnosing cancer. Our discovery provides the first evidence that changes in global methylation (a common hallmark of cancer) particularly at satellite repeats can trigger the dramatic redistribution of epigenetic factors in these cells. The sequestering of these important regulatory factors away from the remaining nucleoplasm is important, and could play a role in the activation of other previously silent genomic loci, like oncogenes or the pericentric satellites (Satellite II) (see above).
Our discovery finds that repeats in the genome (DNA and RNA) organize the distribution of important epigenetic regulators in the nucleus and this goes awry in cancer. We demonstrate that a common feature of cancer nuclei, in vitro and in vivo, is a grossly abnormal nuclear compartmentalization of master epigenetic regulators controlled by changes in methylation of satellite repeats. The hypermethylation and silencing of tumor suppressor genes is a critical mechanistic event in cancer which paradoxically often co-occurs with global hypomethylation, for reasons that are not at all understood. The grossly imbalanced nuclear distribution of master regulatory factors and their link to global demethylating events shown here provides a new way to think about what generates this epigenetic imbalance. In addition to this significance for understanding cancer epigenetics and human satellites, the cancer-specific Sat II RNA and MeCP2 “CAST” bodies (see above) as well as these important related PcG “CAP” bodies, provide new candidate cancer biomarkers, that offer a readout of the “heterochromatic instability” in cancer cells.
Thus, a third aspect of the invention features the use of CAP bodies as a biomarker for diagnosing cancer (e.g., metastatic cancer) in a mammal (e.g., a human).
The invention also features a method for identifying an agent for the treatment of a cancer in a mammal by contacting a cancer cell having a biomarker selected from a cancer-associated polycomb group (CAP) body, a cancer-associated satellite transcript (CAST) body, and a satellite II RNA molecule with a test agent and determining whether the test agent reduces the level of the biomarker by detecting a reduction in the formation of the CAP body or CAST body, or a reduction in expression of the satellite II RNA molecule, in the cancer cell, wherein a reduction in the level of the biomarker in the cancer cell relative to the level of the biomarker in a cancer cell not contacted with the test agent, indicates that the test agent is suitable for the treatment of the cancer.
At minimum, we believe the unusual foci (Sat II RNA and CAST and CAP bodies) that we detect in cancer cells are large and bright enough to provide a useful diagnostic adjunct to the pathologist. The methods of the invention can be used alone or can be used in conjuction with other assays, e.g., cytological assays, for detecting cancer in a subject. Sat II RNA is particularly attractive as a biomarker because it is essentially negative in normal cells, making this a sensitive assay that would also be amenable to extraction-based methodologies like RNA microarrays or a deep-sequencing approach, and possibly through serum screens as well. We also find that these bright foci lend themselves easily to simple digital quantification, which can be utilized in automated pathology platforms currently being designed by many companies (e.g. GE Global). For example, quantifying the single brightest pixel per nucleus clearly differentiated normal cells from cancer cells, and suggested a 175 fold difference between normal and cancer cells (see Example I below). This direct visualization of epigenetic regulatory factors within the nucleus of single cells can overcome the limitations of extraction based methodologies that may be “contaminated” by normal cells in the tumor sample. In addition, the methods described herein can be used to diagnose cancer by detecting aberrant localization of at least one (or two or more) protein(s) (e.g., one or more of MeCP2, SIN3A, CDKL5, DNMT1, HDAC1, ATRX, DNMT3B, SMARCA2, DLX5, BDNF, UBE3A, MBNL 1, 2, and 3, hnRNP H, G, A, and K, proteosome 20Sα, 11Sγ and 11sαsubunits, Y12, Y14, 9G8, snRNP Sm antigen, SAM68, SLM 1 and 2, Tra2β, Purα, or CPEB proteins in CAST bodies or one or more of BMI-1, RING 1B, Phc1, Phc2, CBX4, CBX8, RNF2, SUZ12, EED, RBBP4, JARID2, EZH2, EZH1, RBBP7, GLI1, MYC, CDKN2A, or HST2H2AC in CAP bodies) that may not exhibit altered expression in the cancer cell (e.g., the protein levels of the biomarkers in the cancer cell may remain normal relative to a normal, non-cancer cell, but the distribution of the biomarkers across the nucleus in the cancer cell is not “normal” relative to a non-cancer cell). The presence of aberrant accumulations and mis-compartmentalization of key regulatory components of the nucleus in cancer cells provides a robust assay for gross epigenetic mis-regulation in cancer cells and facilitates the evaluation of the tumor or therapy.
These new cancer properties (Sat II RNA and CAST and CAP bodies) are potential “red flags” for cancers in which failed maintenance of chromatin regulation is prominent. Such epigenetic biomarkers are particularly relevant in light of current new chemotherapeutics being tested that target histone modifications or DNA methylation of tumor suppressor genes, but which will likely have unintended consequences on pericentric satellite heterochromatin. Cytopathological changes in nuclear morphology, particularly heterochromatin patterns, are important diagnostic indicators of many cancers, however the distinctions can be subtle and difficult to accurately identify. Since excised tumors often contain just a sub-set of tumor cells mixed with normal, extraction-based assays will dilute the mark present in a small fraction of cells, and, in addition, do not allow direct correlation with the specific diagnostic structural changes upon which the pathologist relies. Thus, an advantage of the biomarkers and approach shown here is that it retains important cytopathology by overlaying these epigenetic hallmarks with cancer morphology at the single cell level, and highlights that epigenomic changes will be more fully understood if the cancer genome is considered as a complex three dimensional entity within a highly subcompartmentalized nuclear structure.
We initially observed the mis-regulation of heterochromatic satellite repeats in cancer cell lines and observed that prominent nuclear accumulations of Sat II RNA were common in many cancer samples in vitro and in vivo, and largely absent in normal cells. Thus, cancer cells show highly aberrant expression of a very abundant satellite repeat which reflects compromised heterochromatic silencing in cancer cells.
To understand why these satellites were being aberrantly expressed in cancer, we examined the proteins known to regulate satellite heterochromatin, the repressive Polycomb Group (PcG) proteins. The PRC1 complex proteins, BMI-1, RING 1B and Phc-1, were of particular interest since these were reported to form Polycomb bodies (PcG bodies) and localize to Sat II DNA domains, particularly tile very large (6 Mb) Sat II block at 1q12, which is commonly hypomethylated in cancer. Although PcG bodies are described as normal nuclear structures we see a dramatic difference between cancer and normal cells. We observed that the PcG proteins are found in a few very prominent nuclear bodies in most cells (70-100%) in 7 of 8 cancer lines, and 4 breast cancer samples, while non-neoplastic cell lines and match normal tissue samples have a more uniform granular or particulate distribution throughout the nucleoplasm. Digital quantification of the high contrast ratio between the PcG bodies versus the nucleoplasm in cancer cells (and normal cells) makes the point that this is a markedly different distribution in cancer, not just higher overall levels. And even if the overall level of the protein is higher in the cancer cell, BMI-1 piles up sharply at a few sites, while the nucleoplasm (where most chromatin resides), has lower levels. Thus, some regions of the cancer nucleus have abundant access to repressive factors, while other regions do not.
We also find that these large aberrant PcG bodies are the same “PcG bodies” that had been previously reported to localize to the large Sat II block on 1q12 (studied in HT1080 cells, a fibrosarcoma cell line). They clearly and consistently (˜100%) co-localize with the 1q12 DNA locus in cancer cells, suggesting a direct relationship between these nuclear elements. Thus, these prominent PcG accumulations which exhibit a high contrast ratio with the nucleoplasm and preferentially “cap” the Sat II locus at 1q12, are a hallmark of cancer cells, and are not a normal nuclear structure. To avoid confusion with the smaller, more numerous and widely dispersed particulate PcG foci in normal human cells, often referred to as “PcG bodies” we refer to the less numerous and larger conglomerations of PcG proteins at 1q12 in cancer cells as “CAP” bodies, for “cancer associated PcG” bodies. Importantly, Sat II DNA loci in other regions of the same nucleus, which contain significantly less PcG proteins than 1q12, are where the aberrant Sat II RNA expression is occurring. This suggests that the mis-compartmentalization of the repressive PRC1 complex from the rest of the nucleus may result in abnormal expression in some areas (e.g. Sat II expression and possibly oncogenes) and abnormal repression in others (possibly at tumor suppressor genes).
The large (6 Mb) Sat II domain on 1q12 is also commonly found hypomethylated in many cancers, and has been reported to be the region most sensitive to changes in methylation. 5-aza-2′-deoxycytidine is a pharmacologic inhibitor of DNA methylation in clinical trials as a chemotherapeutic agent for certain cancers and has also been shown to effectively demethylate Sat II on Chromosome 1. We find that when normal human fibroblasts are treated with this chemotherapeutic agent to hypomethylate 1q12, the PcG proteins of the PRC1 complex re-distribute into large accumulations at 1q12 similar to the CAP bodies seen in cancer cells. Prolonged treatment with this drug (8 days) eventually results in the aberrant expression of Sat II RNA in these normal cells similar to that seen in cancer. This suggests that the abundant satellite repeats have enormous capacity to “soak-up” large quantities of regulatory proteins if the conditions are right (e.g. global demethylation especially at 1q12), resulting in abnormal repression in certain regions of the nucleus, while other regions are abnormally de—repressed.
In addition to the use of CAST bodies and their associated Sat II RNA foci or CAP bodies and their associated Sat II DNA foci as biomarkers for the detection of many cancers (see Tables 3 and 4), we have also discovered that cancers can be detected by assaying the unbalanced distribution of heterochromatic markers (e.g., one or more of ubiquitylated histone H2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A) in the nucleus of a cell. Our molecular cytology indicates that the cancer nuclear genome has imbalanced (less homogenous) distribution of chromatin regulatory factors due to demethylation of Sat II on 1q12 and its subsequent recruitment of chromatin regulators. Screening for this “unbalanced epigenome” can be done as described above (e.g., by assaying for the presence of cancer associated bodies) or by using a whole genome ChIP-Seq approach (using, e.g., the repressive mark ubiquitin H2A). As shown in
Our discovery uses the visualization of important epigenetic regulatory proteins to provide a low resolution but “whole genome” synoptic view of their nuclear and genomic distribution, the dramatic nature of which may be less apparent by extraction-based analyses, and provides information on their function even in situations where these key regulatory proteins may not show altered expression levels or functional mutations.
Importantly, many new compounds being investigated for chemotherapy agents (e.g. 5-aza-2′-deoxycytidine and HDAC inhibitors) are known to affect gross epigenetic regulation across the nucleus, and not only at the targeted tumor suppressor gene. It is highly likely that more of these chemotherapeutic agents will produce imbalanced epigenomes in cancer and possibly non-cancer cells, similar to 5-aza-2′-deoxycytidine seen here. Reports suggest that although many patients initially respond well too many of these agents, there are high recurrence rates. We believe that these hallmarks of an imbalanced epigenome will be key in evaluating the effect of these broad range epigenetic inhibitors on normal and cancer cells, the therapeutic outcomes of treatment and recurrence after treatment.
Thus, the presence of large conglomerations of regulatory proteins in cancer cells, such as CAST bodies and their associated Sat II RNA foci or CAP bodies and their associated Sat II DNA foci, as well as changes in the distribution of heterochromatic markers (e.g., ubiquintinated proteins, such as histone H2A, H3K27me, H3K9me2, HP1, H4K20me, loss of H3K4me, loss of H4Ac, DNA methylation (5-mC), and macroH2A), across the genome, are not only common and previously unrecognized “hallmarks” of many cancers, but are robust biomarkers indicative of gross imbalance of epigenetic regulation in the cell. The methods described herein utilize robust biomarkers that can be used to not only diagnose the presence of cancer in a sample from a subject (and thus cancer in the subject), they can also be used to assess whether the cancer is an aggressive cancer. A common thread in the methods described herein is the imbalanced distribution of key chromatin regulators (e.g., PcG proteins and/or MeCP2 proteins, etc.), which is in turn reflected in imbalanced distribution of epigenetic chromatin marks (heterochromatin versus euchromatin), as we demonstrate directly for UbH2A. Knowledge of these changes and how to detect them can be used to provide surveillance, recognition, and proper classification of different cancers, and for designing/evaluating appropriate therapeutic interventions (e.g., avoiding the use of chemotherapeutic agents, such as 5-aza-2′-deoxycytidine, known to produce imbalanced epigenomes).
The following examples are to illustrate the invention. They are not meant to limit the invention in any way.
Epigenomic changes in cancer involve paradoxical gains and losses of heterochromatin within the same nucleus. We report that failed nuclear compartmentalization of polycomb proteins, master regulators of heterochromatin, is prevalent in cancer, and links to locus-specific over-expression of human Satellite II. In cancer, BMI-1 and Ring 1B aggregate in prominent Cancer-Associated PcG (CAP) bodies on the large ˜6 Mb locus at 1q12, which remains silent. In the nucleoplasm low in BMI-1, other Sat II loci express abundant RNA foci; these repeat RNAs accumulate methyl-cytosine binding protein, forming Cancer-Associated Satellite Transcript (CAST) bodies (previously referred to in U.S. 61/507,937 as Cancer-Associated MeCP2 (CAM) bodies). BMI-1 body formation on 1q12, a region commonly hypomethylated in cancer, is induced in normal cells by a DNA demethylating chemotherapeutic. All of these hallmarks of epigenetic dysregulation were readily apparent in vivo, in several breast and other tumors. This study connects novel biology of poorly studied Satellite II, DNA and RNA, to mis-regulation of epigenetic factors in cancer, linked to DNA demethylation at 1q12.
In recent years changes in the epigenome have been increasingly recognized as important to tumorigenesis (reviewed in Feinberg and Tycko, 2004; Fraga and Esteller, 2005; Jones and Baylin, 2007). While most attention has focused on silencing of tumor suppressor genes, recent studies recognize a major paradox: this often occurs in the context of broader genomic hypomethylation and/or loss of heterochromatin marks at centric/pericentric satellites (reviewed in Ehrlich, 2009). Centric/pericentric heterochromatin is populated by several classes of satellite sequences. The human satellites (alpha, beta, Sat I, II, III) are comprised of high copy tandem repeats packaged in constitutive heterochromatin and comprise ˜15% of the genome (Richard et al., 2008). In contrast to the 171 bp alpha-satellite repeat at the centromere proper of all human chromosomes, classical Sat II and III are comprised of highly repeated shorter Sat 2 and Sat 3 sequences, respectively, which form larger pericentric blocks on only a subset of human chromosomes. The largest Sat II DNA blocks on chr. 1 and 16 span several megabases of Sat 2 repeats. Sat II is a ˜26 bp degenerate form (Jeanpierre, 1994) of the more conserved 5 bp Sat 3 motif (ATTCC; SEQ ID NO: 1), which comprises the singular large Sat III locus on Chr 9 (Prosser et al., 1986). While a few reports have linked expression of Sat III on Chr 9 to the heat shock response and nuclear “stress bodies” (Jolly et al., 2004; Rizzi et al., 2004), Sat II has long received little attention and remains one of the most poorly-studied prominent features of the human genome.
Despite its abundance Sat II has no known function in normal cells or in disease, although several studies have noted common hypomethylation of Sat II in cancer (Cadieux et al., 2006; Ehrlich, 2009). Satellite heterochromatin has long been believed silent, but recent evidence indicates that certain murine satellites can be expressed at low levels, possibly linked to stress or cell-cycle changes (reviewed in Lu and Gilbert, 2007; Probst and Almouzni, 2007; Vourc'h and Biamonti, 2011). The fact that satellite sequences were so long considered transcriptionally silent is testimony to the fact that their expression has been difficult to detect using standard molecular techniques. However, RNAs tightly associated with chromatin or nuclear structure may be more amenable to analysis in situ; this also preserves molecular information in chromosomal and structural context, which proved key to most findings presented here.
Polycomb group (PcG) proteins, are a family of master epigenetic regulators that control most early developmental pathways, primarily through repressive chromatin modifications (reviewed in (Sparmann and van Lohuizen, 2006), and also function in the formation and maintenance of constitutive peri/centric satellite heterochromatin. Polycomb repressive complex 2 (PRC2) includes the EZH2 protein, which introduces trimethylation of histone H3 lysine 27 (reviewed in Valk-Lingbeek et al., 2004), whereas PRC1 includes BMI-1 and RING1B, which promotes histone ubiquitination (reviewed in Niessen et al., 2009), DNA compaction (Eskeland et al., 2010) and other modifications. In Drosophila embryos “PcG bodies” are believed to contribute to gene silencing via differential organization and access of gene loci to these concentrated repressive factors (Bantignies et al., 2011). In mammalian cells, prominent PcG bodies (with BMI-1 and RING1B) have also been described and are widely considered to be part of normal nuclear structure. BMI-1 is a key component of PRC1 and is essential for self-renewal of neuronal and hematopoietic stem cells, as well as suppression of the tumor suppressor locus Ink4a/Atf (Jacobs et al., 1999). Although BMI-1 over-expression has been linked to cancer progression (reviewed in Valk-Lingbeek et al., 2004), other evidence indicates a more complex relationship such that over-expression can correlate with a good prognosis in breast cancer (Pietersen et al., 2008). Thus the role of BMI-1 in cancer is currently intensively studied but unresolved (Glinsky, 2008; Lukacs et al., 2010; Riis et al., 2010).
The dichotomy regarding TS gene silencing versus broader breakdown of heterochromatin components (Pageau et al., 2007), suggests to us an imbalanced nuclear epigenome, the basis for which is unknown. Studies from our lab and others have shown that in normal somatic cells, specific genomic loci reside in distinct nuclear sub-compartments enriched for specific metabolic and regulatory factors (Hall et al., 2006; Misteli, 2000, 2004). This nuclear compartmentalization is increasingly recognized as an important contributor to the overall epigenetic program of particular cell types. Non-coding RNAs are being recognized for their normal role in recruitment of epigenetic regulators (Hall and Lawrence, 2011; Koziol and Rinn, 2010; Masui and Heard, 2006) as well as the structural underpinning for nuclear bodies (Clemson et al., 2009; Wilusz et al., 2009). In addition, repeat RNAs have been shown to underlie pathology in certain triplet repeat diseases (Osborne and Thornton, 2006). In this study, we provide evidence that key epigenetic regulators show aberrant compartmentalization within cancer nuclei that is intimately connected to localization on certain Sat II loci and to inappropriate expression of Sat II RNA from others.
This study began with a broad survey of Cot-1 repeat RNA expression and distribution in human interphase nuclei. While competition with unlabelled Cot-1 DNA (repetitive genomic fraction) (Britten and Kohne, 1968) is often used to suppress hybridization to repeats, here we labeled human Cot-1 DNA as a probe to examine the distribution of transcripts from the repeat genome by RNA FISH. We previously showed that hybridization to Cot-1 RNA provides a convenient assay to evaluate chromosome inactivation within nuclei (Clemson et al., 2006; Hall et al., 2002), and also reveals breakdown of the peripheral heterochromatic compartment in cancer (Pageau et al., 2007b). However, this study began when large localized foci of Cot-1 RNA were initially observed and then shown to be exclusive to cancer cells.
Expression of the Cot-1 Genomic Fraction Reveals Large Nuclear Foci of Repeat RNAs in Cancer but not Normal Cells:
In situ hybridization to repeat RNAs using a Cot-1 probe consistently produces a substantial disperse nucleoplasmic signal in all mammalian cells examined with essentially no cytoplasmic signal (
Cot-1 RNA Nuclear Foci are Primarily Satellite II RNA, which is Undetectable or Negligible in Normal Cells:
Cot-1 DNA is a complex probe containing several major classes of repeats. Therefore we used probes to specific repeats to better define the content of these large Cot-1 RNA foci. RNA hybridization with probes for LINE (L1) and SINE (Alu) repeats generally did not detect localized concentrations of RNA (
It was important to address whether normal human cells show significant expression of Sat II and alpha-Sat RNA using specific probes that are more sensitive for a given sequence. In fact, we surprisingly found that nuclear foci of alpha-satellite RNA are readily detected by FISH in normal cells (
The difference between Sat II RNA expression in cancer versus normal cells was easily discerned by eye, was scored consistently by multiple investigators, and moreover, could be quantified by digital microfluorimetry (
Cancer Associated Polycomb (CAP) Bodies Form on Sat II Loci at 1q12 in Neoplastic but not Normal Cells:
To gain insight into potential causes of Sat II expression in cancer cells we examined the proteins known to regulate satellite heterochromatin, the repressive Polycomb Group (PcG) proteins. PcG proteins, including BMI-1, are also linked broadly to developmental gene regulation and stem-cell self-renewal, and are increasingly implicated in cancer pathogenesis. The PRC1 complex proteins, BMI-1 and Ring1B, were of particular interest since these were reported to form Polycomb bodies (PcG bodies) and localize to Sat II loci, particularly the very large (6 Mb) Sat II block at 1q12 (Saurin et al., 1998). Mammalian PcG bodies were initially described as normal nuclear structures (Saurin et al., 1998) and are currently considered and studied as such (reviewed in (Bernardi and Pandolfi, 2007; Spector, 2006)). However, when we initially examined BMI-1 staining in a panel of various cell types, there was a key difference between normal and cancer cells.
We found that BMI-1 staining brightly labeled a few very prominent nuclear bodies in most cells in 7 of 8 neoplastic lines, which were not seen in non-neoplastic cells. For example, as seen in
As mentioned above, PcG bodies, which are repressive proteins, have been reported to localize to the large Sat II block on 1q12 (initially studied in HT1080 cells, a fibrosarcoma cell line) (Saurin et al., 1998). We confirm that the PcG bodies previously reported to localize to 1q12 are the same large aberrant PcG bodies studied here. Using dual labeling with 1q12 specific probes (puC 1.77 kb and Sat2-160 bp) and BMI-1 in U2OS and PC3 cells, we show that these large PcG bodies clearly and consistently (˜100%) co-localize with the 1q12 DNA signal in these cancer cells (
While most of our analyses utilized BMI-1 staining, we confirmed that RING 1B and Phc1, also in the PRC1 complex, concentrate sharply in the same CAP bodies (
Imbalanced Expression of Sat II Loci on Different Chromosomes Inversely Correlates with Aberrant Compartmentalization and Sequestration of PcG Proteins:
Sat II RNA over-expression could reflect failed maintenance of Sat II heterochromatin throughout the entire cancer genome. However, given the imbalanced nuclear compartmentalization of repressive polycomb proteins shown above, it was important to assess if all Sat II loci express RNA, and if not, determine if there was a random or non-random relationship between locus expression and PcG protein nuclear distribution. A priori we considered two alternate possibilities for a potential relationship between PcG proteins and Sat II RNA distributions. Since ncRNAs can recruit PcG proteins including BMI-1, Sat II RNA foci might emanate from the largest Sat II loci in the pericentromeres of Chrs 1 and 16, and induce PcG proteins to form CAP bodies there. Alternatively, the abundant PRC1 factors in CAP bodies on 1q12 and 16q11 may maintain repression of Sat II at these loci, while in the same nucleus relative depletion of these repressive factors from the rest of the nucleoplasm could contribute to aberrant expression from other Sat II loci.
The number of Sat II RNA foci varied in a manner characteristic for a given line (see Table 2), but this did not correlate with ploidy differences (see legend, Table 2), suggesting that only a subset of Sat II loci are expressed. To determine this directly, we used a sequential hybridization strategy to RNA and then to DNA (Smith et al., 2007; Xing et al., 1995) to visualize these simultaneously in two different colors (using the same Sat 2-24 sequence as probe) (see methods). As apparent even in U2OS, which has the most RNA foci of any tumor line, not all Sat II DNA loci are associated with an RNA signal, whereas RNA foci usually abut or partially overlap a DNA signal (
MeCP2 Accumulates with the Sat II RNA Foci and not with Sat II DNA at 1q12 Associated with CAP Bodies:
While this aberrant compartmentalization of epigenetic factors was previously unknown in cancer, abnormal DNA methylation has been intensely studied, and it would be important if our studies would reveal a link between these two major areas of epigenetic regulation. Given that the 1q12 Sat II locus accumulates PRC1 and is repressed, we considered it may be hypermethylated. On the other hand, substantial literature reports that Sat II at Chr 1 and 16 is commonly hypomethylated in many cancers. Thus we examined whether antibodies to MeCP2, a methyl-DNA binding protein, labeled the 1q12 domain associated with the PRC1 CAP bodies in cancer cells. Staining in U2OS cells revealed that MeCP2 sharply accumulates in several bright nuclear foci (
CAP Bodies Accumulate on 1q12 in Normal Fibroblasts Treated with a Global DNA Demethylating Agent in Development as a Chemotherapeutic:
The fact that MeCP2 does not localize to 1q12 is consistent with reported Sat II hypomethylation in many cancers, particularly breast, ovarian, Wilms tumor, multiple myeloma, glioblastoma, among others (reviewed in (Ehrlich, 2009). In fact, it has been reported that the 1q12 satellite is the region most susceptible to hypo-methylation in tumors, although it is not clear that the assays used could discriminate Sat II at 1q12 from other Sat II loci. Since DNA methylation changes are extensively documented in cancer, it would be important if these had an impact on the distribution of PcG proteins. To investigate this, we treated normal human fibroblasts with 5-aza-2′-deoxycytidine (5-aza-2d or decitabine), a pharmacologic inhibitor of DNA cytosine methylation, in limited clinical use and in trials as a chemotherapeutic for other cancers (reviewed in (Kelly et al., 2010). 5-aza-2′d has also been shown to effectively demethylate Sat II on Chromosome 1 (Ji et al., 1997), allowing us to test the possibility that this would in turn impact BMI-1 distribution in normal cells. Remarkably, within 24 hours of a single treatment, a marked accumulation of PRC1 components (BMI-1 and Ring1B) was seen at two large “bodies” within nuclei of ˜15% of primary human fibroblasts (consistent with the effect requiring transition through S-phase); these were similar in size and shape to the 1q12 DNA signal (
While hypomethylation at 1q12 leads to BMI-1 body formation there, a related question arises as to why PRC1 proteins do not aggregate proportionally on the other Sat II loci, since RNA expression from other loci indicates that they are likely also hypomethylated. In the course of these experiments we tested four different Sat II probes (see methods), which suggested that distinct sub-types of Sat II DNA correlate with CAP body formation. While details are in the methods, in sum the results suggest enrichment for different Sat 2 sequence sub-types on different Sat II chromosomal loci, which correspond to the distribution of CAP bodies. Sat 2 probes derived from the 1q12 sequence, which have a more restricted distribution on Chrs. 1 and 16 (
Aberrant Satellite RNA Foci, CAP Bodies, and “CAST” Bodies in Tumors In Vivo:
Since Sat II RNA foci are not in normal cultured cells, they cannot arise only as a consequence of cell culture. Nonetheless, a key question is whether these changes arise in vivo and would be detectable directly in tumor tissues. We began with abdominal and pleural effusions from 10 patients. Despite a high auto-fluorescence of these initial preparations, Sat II RNA foci were evident in five of nine samples examined by two blinded investigators (
Next, we examined several primary solid tumors in cryostat sections (which are readily amenable to fluorescence analyses), obtained through the UMMS tissue bank, with some matched normals. Given that RNA preservation in such pathology samples can be a challenge, we used FISH to poly A RNA as a positive control and tested three different fixation protocols to determine the most effective one (see Methods). The poly A RNA preservation varied with the sample and was generally poor to moderate as compared to cultured cells. Nonetheless, the first tumor sample examined (Block #2334T) displayed remarkably robust and prevalent Sat II RNA foci (
Based on above results with cultured cells, we hypothesized that CAP bodies would be in the same tumor cell nuclei with Sat II RNA foci, but in separate nuclear locations. As illustrated in
Finally, we also confirmed that the aberrant MeCP2 foci shown above in several cancer lines also occur in vivo. As shown in
As summarized in the model in
Nuclear Re-Distribution of Chromatin Regulators and Epigenetic Imbalance in Cancer:
Tumor suppressor (TS) gene silencing paradoxically often co-occurs with the more global loss of repressive chromatin marks, particularly on repeats throughout the genome (Fraga et al., 2005). The grossly imbalanced nuclear distribution of master epigenetic regulators shown here, including polycomb proteins (PRC1) and methyl-binding proteins (MeCP2), provides a new way to think about how this epigenomic imbalance evolves in cancer cells. In a sense, visualization of these key regulatory factors and Sat II DNA/RNA provides a low resolution but “whole genome” synoptic view of their changed nuclear distribution and expression patterns, which may be less apparent by extraction-based analyses, particularly if repeats are excluded or if the protein levels are normal and believed to be unaltered.
Since mammalian PcG bodies have been studied almost exclusively using cell lines with tumor origins(Hernandez-Munoz et al., 2005; Saurin et al., 1998), our conclusion that prominent PcG bodies are aberrations of cancer is not inconsistent with prior studies (Voncken et al., 1999). Our results demonstrate that cancer nuclei commonly aggregate PcG proteins on particular Sat II domains that remain silent, while other Sat II loci in regions relatively depleted of these repressive factors now aberrantly express RNA. The fact that Sat II RNA mis-regulation is locus specific and co-occurs with, and is inversely related to, the marked redistribution of BMI-1 (and other PRC1 proteins) provides evidence for a functional relationship between the aberrant nuclear compartmentalization of regulatory factors and changes in locus-specific expression in cancer cells. Studies in Drosophila embryos demonstrate that access of specific genes to concentrated accumulations of PcG proteins is important to their regulation (Bantignies et al., 2011; Grimaud et al., 2006), supporting the importance of our findings that some regions of the cancer nuclear genome have dramatically higher access to PcG proteins than others. Our results predict that some regions of the cancer genome will contain hot spots of repression, whereas other regions will show wide scale reduction in repression, consistent with the loss of the silent peripheral heterochromatic compartment (Pageau et al., 2007). We demonstrate that this relates to locus-specific misregulation of Sat II loci, but it also could play a role in TS gene silencing or oncogene upregulation. Our findings would predict that many aberrantly expressed loci may be BMI-1 regulated, such as stem cell and neuronal genes (as well as Sat II loci). Importantly, our results further show that the abnormal satellite RNA accumulations have impact on the distribution of MeCP2 (and possibly other epigenetic factors), which we suggest likely further contributes to a downward spiral of the cancer methylome, and epigenomic imbalance.
Additionally, our results demonstrate an important new finding that link the nuclear distribution of these key cellular regulatory proteins of the PRC-1 complex to the vast literature on DNA methylation changes in cancer, particularly at the Sat II locus on 1q12. As further discussed below, the fact that chemotherapeutic demethylating agents rapidly induce PRC1 capping of 1q12 in normal cells is consistent with reports that Sat II at 1q12 is especially sensitive to de-methylation, and suggests that this may reflect an early event in the evolution of the cancer epigenome. Interestingly, the demethylation at 1q12 does not result in its expression when bound with PRC1 complexes, instead nuclear repeat RNA foci subsequently emanate from other de-repressed Sat II loci. Thus, it is the presence of the repressive PRC1 CAP bodies that rescues the affects of demethylation at this region. Notably, cellular demethylation through the use of 5-aza-2d is assumed to be responsible for the aberrant expression of numerous genes across the nucleoplasm in treated cells (Fabiani et al., 2010); however, the fact that Polycomb target genes are overrepresented in this group suggests that the redistribution of PcGs to CAP bodies may play a major significant role. Thus, methylation changes that result in the failed nuclear compartmentalization of repressive factors can promote broad heterochromatic instability (including further methylation changes); this in turn would generate an array of diverse expression profiles, any one of which might be selected for if it promoted neoplastic cell growth (Pageau et al., 2007).
Implications for the Biology of Human Satellite II:
Study of the abundant Sat II repeats in all human genomes has lagged far behind the rest of the genome; however, lack of known function is not evidence for no function. This study now implicates this repeat family as both reflecting and contributing to the epigenomic imbalance in cancer. Work presented here suggests new avenues of investigation for the potential biological import of Sat II DNA (and RNAs, below), based on the capacity of high copy simple repeats to underlie abnormal compartmentalization and sequestration of chromatin regulatory factors. This is most apparent for the very large pericentromere at 1q12, which is a universal but unexplained component of all human genomes. Theoretically, if each 26 bp Sat 2 repeat in two ˜6 Mb 1q12 loci could bind BMI-1 or a PRC1 complex, this locus alone could corral roughly 5×105 such factors. Interestingly, BMI-1 proteins within PcG bodies have been shown to have low mobility (Hernandez-Munoz et al., 2005); since that study used U2OS osteosarcoma cells, our interpretation is that in cancer BMI-1 accumulates stably on 1q12.
Why PRC1 factors “pile up” on particular Sat II loci (primarily 1q12) in cancer nuclei remains an open question, but our results clearly link this to cytosine demethylation, which is the “switch” that promotes abnormal PRC1 binding to repeats across this huge locus. Results further suggest that this likely involves a distinct Sat 2 sequence sub-type at these loci, which BMI-1 may preferentially bind when demethylated. It is possible that the 1q12 locus undergoes similar changes during early development linked to some role in nuclear remodeling, since Sat II hypomethylation is reported in extra embryonic tissue (Zagradisnik and Kokalj-Vokac, 2000), although this remains speculative. Several earlier studies pointed out that 1q12 changes (breaks, amplications and gains of 1q) are unusually prominent in many cancers, with 1q gains in breast carcinoma long noted as particularly striking (Mertens et al., 1997). The findings here provide a clear path for further studies to understand how Sat II and DNA methylation changes relate to the abnormal compartmentalization of epigenetic factors shown here.
Sat II RNA, MeCP2, and the Concept of “Toxic Repeat RNAs”:
Another surprising aspect of our findings is the accumulations of MeCP2, which were clearly coincident with Sat II RNA foci in cancer cells. Of dozens of RNAs studied in our lab, such precisely overlapping RNA/protein signals (with same size and shape) were seen previously only for mutant CUG repeat RNAs, which we confirmed sequester MBNL1 in Myotonic Dystrophy (DM1) (reviewed in Osborne and Thornton, 2006; Smith et al., 2007), and NEAT I RNA which we showed is the structural scaffold for paraspeckle proteins (Clemson et al., 2009). Thus, this precise co-localization of an RNA and protein is significant, and suggest that they interact in some way. As noted above, these findings lend support for other evidence that MeCP2 can bind RNA (Hite et al., 2009), and acknowledge that the role(s) of methyl-binding proteins are not well explained by existing paradigms (Joulie et al., 2010). Another implication is that the satellite RNAs may themselves be cytosine methylated, as is known to occur for tRNAs and rRNA (Motorin et al., 2010). Since cytosine methylation can increase RNA stability, we note that aspects of our results hint that the accumulated Sat II transcripts are likely quite stable.
The accumulation of MeCP2 with Sat II RNA can be so marked in some tumor samples that just one or a few prominent “CAST” bodies are present in an otherwise dark nucleoplasm. As mentioned above, this suggests that these abundant repeat transcripts are not merely inert bi-products of epigenetic dysregulation, but can also impact the distribution of cellular factors and possibly contribute to further epigenetic imbalance. The potential for repeat RNAs to impact the distribution and availability of nuclear regulatory factors, and thereby impact expression of other genes, has strong precedence based on toxic repeat RNAs in certain triplet repeat diseases (Kanadia et al., 2003). Nuclear RNA accumulations of DMPK mRNA containing expanded CUG repeats sequester MBNL1, an alternative splicing factor, causing inappropriate splicing patterns that generate the Myotonic Dystrophy (DM1) phenotype (Osborne and Thornton, 2006). While neither Sat II RNA foci nor PcG bodies co-localize with MBNL1 or the “PNC compartment” linked to breast cancer (Kamath et al., 2005) (
It is interesting to consider that Sat II RNA may also have a normal role during some developmental or cell cycle stage, which we think plausible despite the negative or negligible levels in normal cycling cells. For example, repeat RNAs may be involved in maintaining heterochromatin structure (Probst and Almouzni, 2007) and our results suggest, for example, that Sat II transcripts could recruit methyl-binding proteins.
Potential New Biomarkers Indicative of Heterochromatin Instability in Single Cells:
Finally, this study provides evidence for new epigenetic biomarkers in cancer, each visible in as little as a single cell in pathology sections of primary tumors. Sat II RNA is particularly attractive as a biomarker because it is essentially negative in normal cells, making this a sensitive assay that would also be amenable to extraction-based methodologies. While more extensive studies of tumor samples will be required, the case for Sat II RNA as a candidate biomarker is strengthened by a wholly independent study (Ting et al., 2011). Using deep sequencing, Ting et al investigated over-expression of repeat RNAs, and found Satellite II most clearly different from normal, in ten pancreatic cancers and in a few other tumor samples. Although neither study examined a large tumor sample, both came to similar conclusions about Sat II RNA over-expression using completely different approaches and tumor types, and found similar levels of Sat II up-regulation (130 fold in Ting et al. and 175 fold here). While we strongly detect satellite over-expression in most human cancer lines in vitro, Ting et al. concluded that this RNA was not over-expressed in cultured cancer cells (in three mouse tumor lines examined). This may either reflect a species difference or greater sensitivity of the fluorescence in situ assay. However, our study extends well beyond the initial discovery of satellite over-expression to investigate the basic biology behind it, leading to several novel and fundamental insights regarding nuclear compartmentalization and the imbalanced cancer genome. Ting et al. speculate that general de-repression of genomic repeats could arise by some common mechanism, but state that the concomitant “upregulation of diverse mRNAs is less readily explained” (Ting et al., 2011). Our findings not only provide an explanation for what we show is locus-specific de-repression of Sat II loci, but potentially why there would be broader de-repression of mRNA encoding genes, involving the sequestration of BMI-1 and MeCP2 on some genomic sites, at the expense of others. In support of this concept, we note that Ting et al. report that the mRNAs over-expressed were predominantly neuronal (which BMI-1 has been strongly linked to). In addition, inappropriate expression of neuroendocrine markers is common in many epithelial cancers and linked to aggressiveness (Cindolo et al., 2007).
Thus, Sat II RNA, CAP bodies, and CAST bodies are all potential “red flags” for major epigenetic dysregulation in cancer, which may prove to be a poor prognostic indicator. Cytopathological changes in nuclear and heterochromatin morphology are important diagnostic indicators of many cancers (Fischer et al., 2010), however the distinctions can be subtle and difficult to accurately identify. An advantage of the biomarkers and approach shown here is the potential to directly correlate these specific molecular signatures with the cytological diagnostic structural changes upon which the pathologist relies. In addition, our findings that the 5-aza 2′deoxycytidine (decitabine) can induce prominent BMI-1 bodies on 1q12, is revealing not only mechanistically but in terms of the often high toxicity of this drug (Gore et al., 2006), which likely will have unintended consequences on satellite and other heterochromatin (Jones and Baylin, 2007).
In conclusion, this study highlights that epigenomic changes will be more fully understood if the cancer genome is considered as a complex three dimensional entity within a highly sub-compartmentalized nuclear structure. As illustrated here, it will be necessary to examine DNA, RNA, and protein in precise relation within nuclear structure to uncover potentially key aspects of cancer biology. While many questions remain, these findings provide a foundation for new avenues of research bridging cancer epigenetics, nuclear structure, and the novel biology of DNA and RNAs from the repeat genome.
Cell Lines, Growth Conditions & Fixation:
Twenty two cell lines were examined in this study (list in Supplement), and grown in conditions recommended by suppliers (ATCC, Cambrex, and Coriell). 5-azacytidine (6 mM) and 5-aza-2′deoxycytidine (0.2 ug/ml) was added fresh daily to asynchronously growing cultures and refreshed every day. Our standard fixation protocols have been detailed previously (Johnson et al., 1991; Tam et al., 2002), and summarized in the Supplement. Human effusions were fixed as for cultured cells and tissue blocks were cryosectioned onto cold glass slides (HistoBond+), and stored at −80 briefly until fixation. Of four fixations tested (Supplement) the one that gave best results was brief triton extraction followed by paraformaldehyde fixation and storage in ETOH.
FISH and IF:
Probes: L1 ORF2 (gift from J. Moran), XIST pG1A (from H. Willard & C.
Brown), and human Cot-1 DNA (Roche). Information on the Sat 2 probes used (Sat2-24 nt oligo, Sat2-59 nt oligo, Sat2-169 bp, & puc 1.77 kb), as well as Sat3 & HuAlphaSat (59 nt & 33 nt) oligos, is provided below.
Hybridization:
Sat2-24 nt LNA was used for most images unless otherwise indicated. Several methods of Sat 2 probe labeling and detection were tested (see below). RNA-specific hybridization was carried out under non-denaturing conditions where the DNA was not accessible. Oligos were usually hybridized at 15% formamide conditions, but were also compared to higher stringency hybridizations at 40% and 50% formamide.
Antibodies:
BMI-1 (from Dr. David Weaver, Upstate & Abcam), Ring 1B and EZH2 (Active Motif), MeCP2 and PTBP1 (Abcam), and MBNL (from Dr. Charles Thorton).
Microscopy and Quantitative Digital Imaging:
Digital imaging was performed using an Axiovert 200 or an Axiophot Zeiss microscope equipped with a 100× PlanApo objective (NA 1.4) and Chroma 83000 multi-bandpass dichroic and emission filter sets (Brattleboro, Vt.), set up in a wheel to prevent optical shift. Images were captured with the Zeiss AxioVision software, and an Orca-ER camera (Hamamatsu, N.J.) or a Photometrics 200 series CCD camera. Digital imaging software (Metamorph) was used to quantify signals (see below for details). Where required, care was taken to eliminate any bleed-thru of Texas-red fluorescence into the fluorescein channel. Most experiments were carried out a minimum of 3 times, and scored by at least two independent investigators. All findings were easily visible by eye through the microscope (unless otherwise noted), and images were minimally enhanced for brightness and contrast in Photoshop for publication (unless otherwise noted).
Human Cell Lines:
1) HSMM: Skeletal Myoblasts (Cambrex)
2) SUM 149PT: Inflammatory Breast Cancer (Asterand)
3) TIG-1: Fetal Lung Fibroblast (Coriell)
4) HCC1937: Breast Ductal Carcinoma (ATCC)
5) HCT: Colon Adenocarcinoma (ATCC)
6) HeLa: Cervical Adenocarcinoma (ATCC)
7) Hep-G2: Hepatocellular carcinoma (ATCC)
8) HFF: Foreskin Fibroblast (ATCC)
9) HT1080: Fibrosarcoma (ATCC)
10) IMR-90: Lung Fibroblast (ATCC)
11) JAR: Choriocarcinoma (ATCC)
12) MCF7: Breast Adenocarcinoma (ATCC)
13) MCF-10A: Breast Fibrocystic Disease (ATCC)
14) MDA-MB-231: Breast Adenocarcinoma (ATCC)
15) MDA-MB-436: Breast Adenocarcinoma (ATCC)
16) PC3: Prostate Adenocarcinoma (ATCC)
17) hTERT RPE-1: Telomerase immortalized retinal epithelial (ATCC)
18) SAOS-2: Osteosarcoma (ATCC)
19) T-47D: Breast Ductal Carcinoma (ATCC)
20) U2OS: Osteosarcoma (ATCC)
21) Wi38: Fetal Lung Fibroblast (ATCC)
22) WS-1: Embryonic Skin Fibroblast (ATCC)
Probe Sequences:
Sat 2 probes (Sat2-24 nt, Sat2-59 nt, Sat2-169 bp, & puc 1.77 kb) are distinct from one another (probes would not cross-hybridize), and appear to detect different “families” of Sat II. Sat II sequences contain degenerate forms of the 5 bp (ATTCC) Sat III motif, and consistent with this close relationship, the Sat 3 probe overlapped some Sat II RNA foci when used for RNA hybridizations (
Sat II probes can be used to detect different “families” of Sat II that show differential affinity for PcG proteins and for expression.
A highly sensitive 24 nt LNA oligo (Sat 2-24) was designed to maximize detection of Sat 2 family sequences. Hybridization to metaphase chromosomes with this LNA oligo detects Sat II loci on several chromosomes (including 1 and 16), consistent with a prior report (Silahtaroglu et al., 2004). This probe (under low stringency conditions) is also capable of detecting the more conserved Sat III locus on Chr 9. It also detects the highest number of expressed Sat II sequences in CAST bodies in cancer nuclei.
The 59 nt standard oligo to Sat II (Sat 2-59), described by (Prosser et al., 1986), detects Sat II of fewer chromosomes than Sat 2-24 (e.g. Chr 1, 16, 2, and 15), and none on the Sat III locus on Chr 9, and detects CAST bodies less robustly than Sat 2-24.
The PCR probe (Sat2—7) detects a smaller subset of CAST bodies eminating from Chromosome 7 in some cancer samples, representing 4 different organ systems, suggesting that this locus may be susceptible to misregulation in a number of cancers.
Other Sat 2 probes (Sat2-160 bp, Sat2—16, and puc 1.77 kb) have the most restricted distribution on Chrs. 1 and 16. These sequences correlate best with PcG distribution and do not detect appreciable RNA.
Because Sat II sequences are degenerate versions of the more conserved 5 bp Sat 3 sequence and often contain these sequences, the Sat 3 oligo (see table above), under low stringency, can also detect the same Sat II RNA foci as the Sat 2-24 LNA oligo.
Cell Fixation:
For our standard fixation conditions used in most experiments (Tam et al., 2002), cultured cells were grown on glass coverslips, and extracted in CSK buffer, 5% triton, and VRC (vanadyl ribonucleoside complex) for 1-3 min. Cells were fixed in 4% Paraformaldehyde for 10 min, then stored in 1×PBS or 70% ETOH. Four fixations were tested on frozen tissue sections: 1) our standard fixation protocol summarized above (this produced the best results), 2) Fixed first, extracted second, and stored in ETOH. 3) Fixed (4% Paraformaldehyde) for 10 min, no extraction, and stored in ETOH, and 4) 10 min incubation in PreservCyt (Cytic Corp) at rm temp and storage in ETOH.
RNA and DNA FISH & IF:
Our standard hybridization conditions for RNA, DNA, simultaneous DNA/RNA, and simultaneous DNA/IF or RNA/IF detection was performed as previously described (Johnson, Singer et al. 1991; Tam, Shopland et al. 2002), and briefly described below.
Oligo hybridizations were done overnight at 37 C, in 2×SSC, 1 U/ul RNasin and 15% formamide, with 5 pmol oligo or 0.1 pmol LNA oligo as indicated for lower stringency, or at 40-50% formamide for higher stringency.
Larger probe hybridizations were overnight at 37 C, in 2×SSC, 1 U/ul RNasin and 50% formamide, with 2.5 ug/ml of DNA probe. Cells were washed: 15% formamide/2×SSC at 37 C (20 min); 2×SSC at 37 C (20 min); 1×SSC at RT (20 min); and 4×SSC at RT (5 min).
Labeling and detection: Four methods of labeling and detection were used: 1) Larger (non-oligo) DNA probes were nick translated with biotin-11-dUTP or digoxigenin-16-dUTP (Roche Diagnostics, Indianapolis, Ind.), 2) the LNA oligo was end-labeled with either biotin or dig, 3), Sat2-59 nt was end-labeled with direct fluorochrome (Fite) or biotin, 4) and the PCR generated probe (Sat2-169 bp) used biotin. Detection utilized Alexa 488 or Alexa 549 Streptavidin (Invitrogen) in 1% BSA/4×SSC for 1 hr at 37 C. Postdetection washes: 4×SSC; 4×SSC with 0.1% Triton; and 4×SSC, each for 10 min at RT, in the dark.
For simultaneous RNA/DNA hybridizations, RNA hybridization was performed first (as above), fixed in 4% Paraformaldehyde for 10 min, then NaOH treatment, DNA denaturation and DNA hybridization. DNA was hybridized following denaturation. Briefly, the cells were treated with 0.2N NaOH in 70% ETOH for 5 min, rinsed with 70% ETOH then denatured in 70% formamide, 2×SSC, at 75 C for 2 min, before ethanol dehydration, and air-drying. Hybridization and detection was carried out as described above.
Simultaneous DNA/RNA and antibody detection: Most antibodies were used prior to RNA or DNA hybridization. Briefly, slides were incubated in the appropriate dilution of primary antibody in 1% BSA, 1xPBS and 1 U/ul RNasin, for 1 hour at 37 C. Slides were washed, and immunodetection was performed using 1:500 dilution of appropriately conjugated (Alexa 488 or Alexa 594, Invitrogen) secondary (anti-goat, mouse or rabbit) antibody, in 1×PBS with 1% BSA. The antibody signal is fixed in 4% paraformaldehyde for 10 min prior to hybridization (performed as detailed above), and all slides were counter stained with DAPI. Vectashield (Vector Labs) was used as mounting media for all fluorescence imaging.
Digital Quantification:
All images compared or quantified for signal intensity were taken with the same exposure on the same day with the same microscope and fluorochrome.
Linescans: The Linescan function in the Metamorph Image analysis software (Molecular Devices, Inc.) was used to measure relative signal intensities for each channel of a 3 color digital image of cell nuclei. Line regions were drawn across the entire nucleus of individual cells (unless otherwise noted) and pixel intensity along the line measured. Y-axis is intensity of each pixel across the length of the line (X-axis).
Maximum pixel intensity vs. threshold: Metamorph software was used to measure the single maximum pixel intensity of each cell nucleus. Three color images were used and the color channels separated. The regions outlining the nuclei on the DNA color channel were transferred to the channel containing the RNA signals. The single brightest pixel in each nuclear region was measured. This was then plotted against a threshold calculated for each cell line using 3× the average lowest intensity pixel in each nucleus for that cell line.
Total Sat RNA signal/cell: Metamorph software was used, and color channels separated for 3 color images. Computer generated regions were drawn around all RNA signals in each nucleus. The average pixel intensity for each region was multiplied by the area of each region, and then all regions in each nucleus were added to give the integrated intensity (area and brightness) for each nucleus.
Human Pericentromeric Satellite II Repeats are Aberrantly and Grossly Expressed in Cancer:
Almost 50% of the human genome consists of repetitive sequence elements with high-copy tandem satellite repeats associated with centromeric regions, such as Satellite II, representing a major portion of the repeat fraction. While alpha-satellite (α-Sat) is at the centromere proper of all human chromosomes, Satellite II (Sat II) defines the pericentromere of several chromosomes, the largest (˜6 Mb) on Chr 1q12 and also Chr 16, and smaller Sat II on several other chromosomes. Sat II is comprised of thousands of ˜25 bp repeats, evolved from the 5 bp more conserved Sat III repeat on Chr. 9 (Richard et al. 2008). While long thought to be silent and have no known function (reviewed in Richard et al. 2007, Plohl et al. 2008), in yeast centromeric satellite siRNAs are implicated in heterochromatin maintenance (Volpe et al. 2002), although it is not clear these findings apply to mammalian satellites (reviewed in Probst et al 2007). We have discovered that in many cancer cells there is over-expression of “COT-1” RNA, which represents the broad repetitive fraction. After a comprehensive analysis of numerous repeat types, including SINES, LINES, alpha-Sat, Sat III, and Sat II, we discovered that grossly aberrant Sat II RNA expression is linked to cancer. Importantly, this robust Sat II expression is negative or negligible in normal cells, suggesting a highly sensitive and potentially specific marker. Moreover, it is readily visualized in single cells in a pathology section, indicating this assay can be both qualitative as well as quantitative.
Polycomb Proteins and Satellite Heterochromatin:
More recently we have uncovered an exciting connection between Sat II mis-regulation and the exceptionally important polycomb group (PcG) proteins which control much of the epigenome and are intensely studied for their strong links to cancer. PcG proteins induce repressive chromatin modifications on heterochromatin, thereby controlling most key developmental pathways in ES cells and embryos (Lee et al. 2006; Muyrers-Chen et al. 2004). BMI-1 is a key component of the PRC1 complex necessary for self-renewal of stem cells and suppression of the tumor suppressor locus Ink4a/Arf in stem cells and cancer (O'Carroll et al. 2001; Valk-Lingbeek et al. 2004). While over-expression of BMI-1 has been described in several cancers including breast (Pietersen et al. 2008), colorectal, liver, and lung (reviewed in Valk-Lingbeek et al. 2004) other results find its down-regulation is a poor prognostic indicator in breast cancer; thus its role in cancer progression and prognosis is currently unresolved but intensively studied (Glinsky et al. 2005; Pietersen et al. 2008).
We have discovered in cancer gross perturbation in the nuclear organization of PcG proteins (e.g., one or more of BMI-1, RING 1B, Phc1, Phc2, CBX4, CBX8, RNF2, SUZ12, EED, RBBP4, JARID2, EZH2, EZH1, RBBP7, GLI1, MYC, CDKN2A, and HST2H2AC) into prominent “Cancer-Associated Polycomb” (CAP) bodies. These CAP bodies form on the large 1q12 Sat II locus which remains silent, whereas PcG proteins are sequestered from the rest of the nucleoplasm, where other loci are inappropriately expressed.
Satellite RNA Misregulation is a Hallmark of Epigenomic and Heterochromatic Instability in Cancer:
Inappropriate expression of satellite repeat RNAs, coupled with aggregation of polycomb heterochromatin regulators into abnormal bodies, is an indicator of “heterochromatic instability”, which may be more common in cancers than realized, and has unexplored but important implications for cancer etiology, and potentially diagnostics. Given that this involves defective centromere associated heterochromatin, it has implications for chromosome segregation and for genetic as well as epigenetic instability. And while satellite over-expression may arise during cancer progression, it is likely linked to abnormal mitosis and epigenetic regulation and thus may contribute to progression.
Bioinarkers and Breast Cancer:
An important challenge in cancer medicine is to identify specific changes that occur in neoplastic progression, which may be common to many cancers, specific to particular types, or indicators of progression level (grade), aggressiveness or response to therapy. This will be vital for surveillance, recognition and proper classification of different cancer sub-types and for designing/evaluating therapeutic interventions. The cancer biomarkers described herein are “red flags” for major aberrations in epigenetic state, increasingly recognized as important to cancer progression and aggressiveness. The Sat II RNA promises high sensitivity, assayable in pathology tissue or extraction based methods, including potentially in blood or other bodily fluids, which would be extremely valuable. While cytopathological changes in nuclear morphology are important diagnostic indicators of many cancers, the distinctions can be subtle and would benefit from biomarkers that confirm cancer cell diagnosis in as little as a single cell. While the PcG protein sequestration requires immunohistochemical analysis, the Sat II RNA assay can be done rapidly on tissue with LNA oligos, or RT-PCR or microarray of lysates or blood.
A biomarker may be useful if it enhances detection of many cancers, or if it discriminates certain cancer sub-types or grades, or correlates with response to therapy. For example, in breast cancer there is a strong need for more biomarkers (Hinestrosa et al., 2007) to determine which in situ cancers or occult metastases are more prone to invasive progression. Improved biomarkers have potential to spare some patients unnecessary treatments and discriminate those who require more aggressive therapies. In fact, these may constitute “red flags” for a category of more “epigenetic cancers”, in which failed maintenance of chromatin state (defective chromatin remodeling) is particularly prominent or an early contributor to cancer development. As a biomarker, epigenetic instability has important implications for treatment, given the availability of newer pharmacologic agents that modulate histone modifications or DNA methylation state, and many have unintended impact on pericentric satellite heterochromatin. Compared to chromosomal instability, epigenetic alterations are also theoretically reversible.
Bridging Molecular and Cellular Information:
Studies on epigenetic components in cancer usually employ molecular analyses of extracted tissues, such as DNA methylation. Sat II RNA expression can be studied by, e.g., RT-PCR, while FISH and PcG (BMI-1 antibody) assays can be used to provide the advantage of epigenetic markers overlayed with key tissue and cell context for the pathologist.
As illustrated in
We have discovered that Sat II RNA can be used as a biomarker to provide a “black and white” difference between normal cells and cancer cells. Our results in cell lines and a limited sample of tumors suggest a high incidence of Sat II RNA expression in breast cancer, which impacts 1 in 9 women (Tables 5 and 6). Both RT-PCR and molecular cytology, as well as other RNA biomarker assays (see, e.g., Tafe et al., 2010), can be used to assay the presence of Sat II RNA, which is expected to provide higher sensitivity than other biomarkers, in a panel of breast cancer sentinel lymph nodes (SLN) and other available well characterized tumors. Sat II RNA can also be detected in other bodily fluids, such as blood, using approaches similar to those currently pursued for microRNAs (see, e.g., Gao et al., 2011), which tend to have much less marked expression differences compared to Sat II RNA.
Sat II RNA Expression and CAP Bodies as Biomarkers in a Panel of Primary Breast Tumor Samples of Different Types and Grades.
Sat II RNA and CAP bodies are epigenetic “signatures” that can be used as robust cytological biomarkers of particular sub-types or stages of breast cancer, and these biomarkers can be used for cancer diagnosis and prognosis. Results in cell lines and several tumor samples predict Sat II RNA expression (and PcG bodies) will be seen in many breast tumors.
Sat II RNA Expression Detection by RT-PCR in a Panel of 59 Breast Cancer Sentinel Lymph Nodes.
Sat II RNA as a biomarker for breast cancer detection can be confirmed by using RT-PCR in already available lysates for comparison as a biomarker of occult metastasis and/or poor prognostic indicator. Analysis of pathology sections of nodes could also be used to determine if micrometastasis differ in expression of “epigenetic biomarkers” and whether this links to known survival and clinical pathology data.
Satellite II is Very Commonly Aberrantly Expressed in Cancer Lines and is Absent or Negligible in Normal Cells.
Use of a number of oligonucleotide probes for Sat II has revealed that prominent, aberrant foci of Sat II RNA are seen in eight of twelve cancer cell lines, whereas Sat II RNA is absent or negligible in all six normal somatic cell lines (Table 5). The clear difference between cancer and normal cells was very distinct (
Accumulations of Polycomb Proteins into Polycomb “Bodies” (PcG Bodies) is not a Feature of Normal Cells, but are Only Commonly Seen in Cancer Cells.
We find PcG bodies are almost exclusively found in cancer cells (7 out of 8 cancer lines were positive) and not normal cells (none of 5 non-neoplastic lines examined). Thus, we believe that the presence of PcG bodies is a hallmark of human cancer cells and are not structures of normal nuclei.
PcG Bodies are Associated with the Large Accumulations of Sat II DNA on Chromosomes 1, Which are not Expressing RNA.
PcG bodies form on the huge Sat II block on Chr 1q12 which remains transcriptionally silent. We find that PcG bodies and Sat II RNA appear to be mutually exclusive. Thus, Sat II RNA appears to be expressed only from loci that are not associated with accumulations of repressive PcG proteins. (Rather, PcG proteins may be sequestered away from loci that now inappropriately express Sat II.)
Aberrant Sat II RNA Foci and PcG Bodies are Also Observed in Solid Human Tumor Tissue and Not Normal Tissue.
Although aberrant satellite RNA and PcG bodies are not found in cultured normal cells, suggesting they did not arise as a consequence of cell culture, the question remained whether these foci can be seen in vivo (human tumors). We have also examined Sat II RNA over-expression and PcG protein distribution in frozen sections of 6 tumors from the Umass Tissue Bank and some of their matched normals. After working out proper fixation protocols that adequately preserved poly-A RNA (our positive control), we found that both PcG bodies and aberrant Sat II foci are commonly seen in human tumor tissue sections (5 of 6 tumors were positive) and not in matched normal tissue sections (
Evidence of Sequestration of PcG Proteins from the Rest of the Nucleus.
The presence of one or more prominent PcG bodies was often accompanied by marked sequestration of BMI-1 from the rest of the nucleoplasm (
We have demonstrated that Sat II RNA is expressed in cancer but not normal cells, and co-occurs with formation of aberrant cancer-associated PcG bodies. This was shown in numerous cancer cell lines as well as a small sample of primary tumors and ascites, including three breast ductal carcinomas and one ovarian tumor, all of which showed these hallmarks. We believe that Sat II RNA, as a biomarker of cancer, can be as a hallmarks to determine the sub-type, grade and/or clinical outcome (prognosis) of cancer (e.g., primary breast tumor). We also believe that Sat II RNA can be used as a sensitive indicator of metastatic cells in sentinel lymph nodes, and that Sat II expression can be used to correlate clinical outcome. Sat II RNA can also be assayed from a patient's bodily fluid to detect metastatic disease.
Sat II RNA is negative in normal cells and thus can be used as a highly sensitive indicator for the presence of at least some types of cancers (e.g., breast cancer and pancreatic cancer), assayable by a number of methods. Very recently a study appeared in Science reporting over-expression of Sat II RNA in ten of ten pancreatic tumors examined and proposing it should be pursued as a potential biomarker (Ting et al., 2011). Our data show that Sat II RNA over-expression is linked to sequestration of essential epigenetic regulators (PcG proteins) into aberrant nuclear bodies, and thus both Sat II RNA and PcG bodies indicate major epigenetic dysregulation; the presence of one or both biomarkers in a cell of a patient likely indicates a poor prognosis.
Sat II Expression and CAP Bodies can be Used to Type and Grade Primary Breast Tumor Samples
The presence of Sat II RNA and PcG foci is common in many breast tumors and may be linked to cancer sub-type, aggressiveness, or grade. The prevalence of Sat II RNA over-expression and PcG mislocalization in a large number of primary breast tumors may be related to clinicopathologic data. As explained above, since Sat II and PcG bodies often co-occur and reinforce one another as indicators of epigenetic instability (
All UMass specimens are registered with the North American Assoc. of Central Cancer Registries (NAACCR), and NCI's SEER program and have long term clinical outcome data available. OCT blocks have about 5-10 years of outcome data, while the archival paraffin samples are longer. Although we will initially use frozen OCT specimens from The Tissue Bank, we will seek to expand this into archival paraffin specimens using antibodies to BMI-1 to mark PcG bodies and in situ hybridization to probes for Sat II RNA. Poly-A RNA hybridization will provide an internal control for RNA preservation in every sample.
The “epigenetic markers” described herein may be used to discriminate a specific known (or unknown) sub-type of breast cancer. Mis-regulation of Sat II and PcGs may be a feature of many or all types of breast cancer. Thus, the biomarkers described herein may be use to identify cancer sub-types and clinical/pathological parameters, including grade, lymph node and distant metastases (stage), ductal vs lobular type, the presence of lymphatic or vascular invasion, estrogen and progesterone receptor status, ploidy, growth fraction by Ki 67 immunostaining, Her2 status, BRCA1 mutation status, complete response to neo-adjuvant chemotherapy, and occurrence of triple negative and basal phenotypes.
The biomarkers identified herein may also be used for early tumor detection or to discriminate a progression-prone cancer. About 40% of samples available through the tissue bank will contain non-invasive carcinoma in situ and varying degrees of pre-cancerous hyperplastic changes, and we can ascertain the stage in the multistep process of breast cancer development at which Sat II RNA or PcG bodies develop. The Sat II RNA fluorescence signal can also be quantified by microfluorimetry, and show a good agreement with extraction based methodologies.
Statistical analysis: Differences between tumor categories can be evaluated by analysis of variance (ANOVA), and pairwise comparisons made using Tukey's HSD multiple comparisons procedure. The strength of correlation between the new biomarkers (Sat II RNA, CAP bodies, and CAST bodies) with each other and with the other clinically-significant descriptors of the tumor can be determined to assess relationships between biomarkers and clinical and pathologic variables, using Pearson product moment correlations for continuous normally distributed variables or Spearman's Rank Correlation Coefficient for non-normally distributed or rank order variables.
Primary tumor samples can be characterized for their Sat II RNA/CAST/CAP signatures, thereby identifying which primary tumor types exhibit these aberrant marks, similar to that performed for cancer cell lines and tumor samples (Tables 5 and 6). While initial scoring can be done through the microscope, quantitative digital microfluorimetry can also be used to quantify differences (e.g.,
Sat II RNA can be Used as a Sensitive Detector or Prognostic Indicator of Metastases in Breast Sentinel Lymph Node by RT-PCR and Cytology and Initial Tests in Blood:
We have shown Sat II
RNA over-expression in primary breast tumors using in situ hybridization (
SAT II RNA can be detected in breast sentinel lymph nodes via RT-PCR. Primers have already been made based on consensus sequences targeting all SAT II RNA elements as well as others specifically for the SAT II locus on Chr. 7, which analysis of available RNA sequence data indicates is particularly over-expressed. These primers can be used for specific detection of SAT II RNA, e.g., in U2OS osteosarcoma that highly express SAT II RNA relative to normal fibroblasts which show no expression. We will first do Trizol extractions of the RNA, treat the samples with RNase-free DNase, followed by RT-PCR with our SAT II primers with an RT-minus control, then visualize products by semi-quantitative gel electrophoresis. If initial results indicate a significant difference in expression levels of Sat II RNA, as predicted, we will perform quantitative Real Time RT-PCR. We will initially compare the U20S Sat II expression level with that of TIG-1 (fetal lung fibroblast) cells. Expression levels will be normalized to that of a housekeeping gene.
The primers can also be used to detect Sat II RNA in clinical samples, with emphasis on the 59 RNA lysates of breast sentinel lymph node biopsies. An appropriate normal mRNA can be included as a control for RNA preservation. The Sat II RNA assay can be used as a sensitive assay for the detection of micro-metastases.
The presence or absence of Sat II RNA in micro-metastases correlates with clinical outcome. We believe that whether a sub-type of breast tumor expresses or does not express Sat II RNA may correspond with aggressiveness. The absence of this hallmark of epigenetic instability may correlate with better outcome, e.g., if nodes known to contain metastatic cells differ with respect to whether they contain Sat II RNA.
SAT II RNA detection could be used for non-invasive testing. Currently, breast sentinel node biopsies are the standard for detecting invasive cancer, but clearly it would be enormously important if Sat II RNA could be detected in bodily fluids of women with metastatic or more localized disease. Because Sat II RNA appears to be unusually stable, possibly due to methylation, this biomarker could be used in a non-invasive assay to diagnose cancer. Current studies in various fields indicate the presence of cell-free RNA in the blood, which can potentially be used diagnostically. To test this approach, RT-PCR can be performed on U2OS cell culture media, and the presence of cell-free SAT II RNA can be detected in the filtered culture media. This approach could be used to test blood or lymph samples of women known who have breast tumors for the presence of SAT II RNA.
All normal human cells have just two copies of the largest (6 Mb) satellite II locus on Chr 1q12, one on each of the two homologous chromosomes (illustrated in Example 1,
As shown in
As noted in Example 1, an earlier survey of chromosome aberrations in cancer (Mehrtens et al., 1997) noted that there is an unexplained correlation between increased copy number of the long arm of Chr 1q (over 100 Mb of DNA) and certain cancers, as was prominent in breast cancer. However, this finding was not useful diagnostically because such a broad and non-specific region of the largest human chromosome was examined, and it was unknown if any particular region of 1q might have an involvement in cancer. Our findings show for the first time that the 1q12 satellite locus is directly involved in the highly aberrant distribution of master epigenetic regulators in the cancer epigenome. Thus, either the formation of cancer-associated polycomb bodies (which form on 1q12) or the increased copy number of 1q12 satellite DNA can be assayed as an indicator of epigenetic dysregulation linked to cancer.
As shown in Example I,
The BRCA1 protein contains a RING finger domain in the amino terminus with ubiquitin E3 ligase activity and two BRCT repeats in the carboxy terminus. BRCA1 is highly expressed in proliferative cells and its loss leads most prominently to genetic instability and growth arrest. BRCA1 is responsible for the monoubiquitylation of histone H2A and disruption in this process impairs the integrity of constitutive heterochromatin, which leads to a disruption of gene silencing at tandemly repeated DNA regions, in particular in regions containing satellite DNA.
Defects in BRCA1 increase the risk of cancer in patients, in particular breast and ovarian cancer. As is known, a diagnosis of cancer in a mammal (e.g., a human) can be made by detecting a mutation in a BRCA1 gene or in a BRCA1 protein that prevents the monoubiquitylation of histone H2A (see Zhu et al., Nature 477:179, 2011). Also, a diagnosis of cancer in a mammal can be made by detecting a decrease in the monoubiquitylation of histone H2A. Furthermore, mutations that prevent BRCA1 from ubiquitylating histone H2A produce an imbalance in the epigenome that results in an increase in the expression of satellite II RNA and the formation of CAP and CAST bodies. Thus, the methods of this application, such as the detection of an increase in the expression of satellite II RNA and detection of the formation of CAP and CAST bodies, can be performed in combination with the detection of mutations in a BRCA1 gene or in a BRCA1 protein or a detection of the decrease in the monoubiquitylation of histone H2A using a sample from a patient having, or at risk of, cancer.
In addition, in view of the role that mutations in a BRCA1 gene or in a BRCA1 protein that prevent the monoubiquitylation of histone H2A play in producing epigenetic imbalance, it is now possible to screen agents for their suitable in the treatment of a cancer in a mammal (e.g., a human) by contacting a cancer cell that includes a mutation in a BRCA1 gene or in a BRCA1 protein that prevents the monoubiquitylation of histone H2A, or a cell that exhibits a decrease in monoubiquitylated histone H2A, with the agent in order to determine whether the agent increases the monoubiquitylation of histone H2A in the cell. This assay can be performed as the sole assay or it can be performed by also determining the effect of the agent on other biomarkers, such satellite II RNA molecules and CAP and CAST bodies, in the cancer cell, as is discussed herein.
Finally, increases in epigenetic imbalances caused by a chemotherapeutic agent can also be determined by contacting a cell (e.g., a non-cancer cell) with the chemotherapeutic agent and determining the level of monoubiquitylation of histone H2A in the cell. A determination that the chemotherapeutic agent decreases the monoubiquitylation of histone H2A in the cell (i.e., causes an increase in epigenetic imbalance) indicates that the chemotherapeutic agent should not be administered for the treatment of cancer.
An imbalance in the distribution of UbH2A has also been correlated with a cancer genome. As shown in
The distribution of UbH2A (as seen in
ChIP is a powerful method to selectively enrich for DNA sequences bound by a particular protein in living cells, in this case UbH2A. The ChIP process enriches specific crosslinked DNA-protein complexes using an antibody against a protein of interest. After size selection, all of the resulting ChIP-DNA fragments are sequenced simultaneously using a genome sequencer. A single sequencing run can scan for genome-wide associations with high resolution, meaning that features can be located precisely on the chromosomes.
Methods can also be used that analyze the sequences by using cluster amplification of adapter-ligated ChIP DNA fragments on a solid flow cell substrate to create clusters of approximately 1000 clonal copies each. The resulting high density array of template clusters on the flow cell surface can be sequenced by a Genome analyzing program. Each template cluster undergoes sequencing-by-synthesis in parallel using novel fluorescently labelled reversible terminator nucleotides. Templates are sequenced base-by-base during each read. Then, the data collection and analysis software aligns sample sequences to a known genomic sequence to identify the ChIP-DNA fragments.
Sensitivity of this technology depends on the depth of the sequencing run (i.e. the number of mapped sequence tags), the size of the genome and the distribution of the target factor. Unlike microarray-based ChIP methods, the precision of the ChIP-Seq assay is not limited by the spacing of predetermined probes. By integrating a large number of short reads, highly precise binding site localization is obtained. Compared to ChIP-chip, ChIP-Seq data can be used to locate the binding site within few tens of base pairs of the actual protein binding site. Tag densities at the binding sites are a good indicator of protein-DNA binding affinity, which makes it easier to quantify and compare binding affinities of a protein to different DNA sites.
Methods
ChIP-seq was performed as previously described (Yildirim et al., 2011) with some modification. Approximately 1×106 cells were crosslinked with formaldehyde to a final concentration of 1% for 10 minutes at room temperature and stopped by the addition of 125 mM glycine. Cells were washed twice with 1×PBS containing protease inhibitors (Roche complete Mini protease inhibitor tablets) and pelleted at 100 rpm at 4° C. for 5 min. Cell pellets were resuspended in SDS lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-Cl pH 8.1) with protease inhibitors and incubated on ice for 10 min. Cells were then sonicated at 10% duty, setting 2 for 10 minutes to a fragment size of 150-500 nt followed by centrifugation at 3000 rpm for 10 min at 4° C. Supernatant was collected and 100 uL chromatin was incubated with an antibody against Ubiquityl Histone H2A (UbH2A, Cell Signaling #8240) as per manufacturer's recommended concentrations at 4° C. overnight with rotation in IP Buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-Cl, pH 8.1, 167 mM NaCl)+0.5% BSA. 50 uL protein G magnetic beads (Cell Signaling, #9006) was added to antibody-chromatin complex for 4 hours at 4° C. with rotation. CUP washes were as follows: 2×IP Buffer, 2×RIPA buffer (0.1% SDS, 10 mM Tris, pH 7.6, 1 mM EDTA, 0.1% Na-deoxycholate, 1% Triton X-100), 2×RIPA buffer+0.3M NaCl., 1× LiCl Buffer (0.25M LiCl, 0.5% NP-40, 0.5% Na-deoxycholate), 1×TE. Crosslinks were reversed overnight at 65° C. in 1×TE with the addition of 3% SDS, 1 mg/mL proteinase K, 200 mM NaCl. DNA was extracted with phenol:chloroform and precipitated with 0.1× volume 3M NaOAc, pH 5.2 and 2.5× volume 100% EtOH overnight at −20° C.
Preparation of Illumina paired end deep sequencing ChIP libraries was performed as described (Yildirim et al., 2011). Deep sequencing data was mapped to human genome build hg19 using Bowtie (Langmead, Trapnell, Pop, & Salzberg, 2009). Data normalization and peak calling was performed over a 10 kb sliding window using SeqMonk (Babraham Bioinformatics, Babraham Institute, Cambridge, UK).
The BRCA1 tumor suppressor, a ubiquitin ligase, is implicated in multiple nuclear functions, including DNA repair and recombination. In irradiated nuclei, BRCA1 foci localize to sites of DNA repair with other repair proteins. While the link to DNA repair has been extensively studied, the potential role of BRCA1 foci in normal S-phase nuclei has been relatively ignored. The typical 5-15 foci consistently present in S-phase nuclei are widely presumed to be just storage sites or endogenous repair. However, these foci could actually reflect an undiscovered aspect of BRCA1 function; key to this question is whether they form at specific genomic sites. In the course of studying BRCA1 in relation to XIST RNA and X-inactivation, we recently discovered that many BRCA1 foci directly abut or overlap markers of the interphase centromere/kinetochore complex. Mouse nuclei have prominent chromocenters reflecting a defined organization of centric and pericentric heterochromatin; the association of BRCA1 foci with these can be striking, particularly in a subset of cells that label with PCNA, a replication marker (see
BRCA1 has a fundamental but previously unrecognized role in centromere structure and function; this in turn may impact chromosome segregation and maintenance of genomic stability. Our findings show that BRCA1 foci have a substantial though incomplete association with interphase centromere-linked structures.
BRCA1 functions routinely during S-phase. Rather than being required for segregation of sister chromatids, BRCA1's role may be more focused at centric or pericentromeric DNA, the highly repetitive nature of which may pose special requirements for decatenation and/or chromatin modification. The BRCA1 S-phase pattern does not simply mirror that of replicating DNA, but may reflect a subset of replicating DNA.
BRCA1 mutations may impact the structure and function of centromeres and/or pericentric heterochromatin. A host of chromatin modifications that characterize centric heterochromatin can be examined, and a comparison of BRCA1 deficient breast cancer cells (e.g., human HCC1937) with normal control cells or BRCA1+ breast cancer cells can be used to show the effect of BRCA1 in centromere and heterochromatin structure and function. Chromatin modifications include biochemical hallmarks, such as lysK9, methK27, HP1, as well structural condensation and nuclear organization of centromeres.
We have found that centromeres are markedly ubiquitinated in a subset of cells, and we believe that BRCA1 (a ubiquitin ligase) plays a role in ubiquitination at the centromere, including Ub of Topo II and histone H2A. In addition, the loss of BRCA1 causes defects in mitotic chromosome segregation. BRCA1 status is believed to be linked to defective centromere segregation or microtubule association. DNA “bridges” seen in mitotic or early G1 cells lacking BRCA1 may be composed of centromeric satellite DNA. Other factors, in addition to known BRCA1-associated proteins or chromatin remodeling or DNA repair factors may localize with BRCA1 at constitutive heterochromatin.
BRCA1 is believed to function at chromosomal centromeres, structures critical for proper chromosome segregation. This constitutes a fundamentally new paradigm for how BRCA1 defects cause genomic stability and cancer.
All publications, patents, and patent applications mentioned in the above specification are hereby incorporated by reference. Various modifications and variations of the described methods of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the art are intended to be within the scope of the invention.
Other embodiments are in the claims.
This application claims benefit of U.S. Provisional Application No. 61/507,937, filed Jul. 14, 2011, the contents of which are hereby incorporated by reference in their entirety.
This invention was made with government support under grant number R37 GM053234 awarded by the NIH. The government has certain rights in this invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/046959 | 7/16/2012 | WO | 00 | 3/11/2014 |
Number | Date | Country | |
---|---|---|---|
61507937 | Jul 2011 | US |