The invention relates to DNA methylation signatures in human DNA, particularly in the field of molecular diagnostics.
Hepatocellular Carcinoma (HCC) is the fifth most common cancer world-wide (1). It is particularly prevalent in Asia, and its occurrence is highest in areas where hepatitis B is prevalent, indicating a possible causal relationship (2). Follow up of high-risk populations such as chronic hepatitis patients and early diagnosis of transitions from chronic hepatitis to HCC would improve cure rates. The survival rate of hepatocellular carcinoma is currently extremely low because it is almost always diagnosed at the late stages. Liver cancer could be effectively treated with cure rates of >80% if diagnosed early1. Advances in imaging have improved noninvasive detection of HCC (3, 4). However, current diagnostic methods, which include imaging and immunoassays with single proteins such as alpha-fetoprotein often fail to diagnose HCC early (2). These challenges are not limited to HCC but common to other cancers as well. Molecular diagnosis of cancer is focused on tumors and biomaterial originating in tumor including tumor DNA in plasma (5, 6), circulating tumor cells (7) and the tumor-host microenvironment (8, 9). The prevailing and widely accepted hypothesis is that molecular changes that drive cancer initiation and progression originate primarily in the tumor itself and that relevant changes in the host occur primarily in the tumor microenvironment. The identity of immune cells in the tumor microenvironment has attracted therefore significant attention (10, 11).
DNA methylation, a covalent modification of DNA, which is a primary mechanism of epigenetic regulation of genome function is ubiquitously altered in tumors (12-15) including HCC (16). DNA methylation profiles of tumors distinguish different stages of tumor progression and are potentially robust tools for tumor classification, prognosis and prediction of response to chemotherapy (17). The major drawback for using tumor DNA methylation in early diagnosis is that it requires invasive procedures and anatomical visualization of the suspected tumor. Circulating tumor cells are a noninvasive source of tumor DNA and are used for measuring DNA methylation in tumor suppressor genes (18). Hypomethylation of HCC DNA is detectable in patients' blood (19) and genome wide bisulfite sequencing was recently applied to detect hypomethylated DNA in plasma from HCC patients (20). However, this source is limited, particularly at early stages of cancer and the DNA methylation profiles are confounded by host DNA methylation profiles.
The idea that host immuno-surveillance plays an important role in tumorigenesis by eliminating tumor cells and suppressing tumor growth has been proposed by Paul Ehrlich (21, 22) more than a century ago and has fallen out of favor since. However, accumulating data from both animal and human clinical studies suggest that the host immune system plays an important role in tumorigenesis through “immuno-editing” which involves three stages: elimination, equilibrium and escape (23-25). Presence of tumor infiltrating cytotoxic CD8+ T cells associated with better prognosis in several clinical studies of human regressive melanoma (26-31), esophageal (32), ovarian (33, 34), and colorectal cancer (35-37). The immune system is believed to be responsible for the phenomenon of cancer dormancy when circulating cancer cells are detectable in the absence of clinical symptoms (15, 38). Interestingly, recent DNA methylation and transcriptome analysis of tumors revealed tumor stage specific immune signatures of infiltrating lymphocytes (39, 40). However, these signatures represent targeted immune cells in the tumor microenvironment and utilization of such signatures for early diagnosis requires invasive procedures. The tumor-infiltrating immune cells represent only a minor fraction of peripheral blood cells (41-44). Global DNA methylation changes were previously reported in leukocytes and EWAS studies revealed differences in DNA methylation in leukocytes from bladder, head and neck and ovarian cancer and these differences were independent of differences in white blood cell distribution (45). These studies were mainly aimed at identifying underlying DNA methylation changes in cancer genes that might serve as surrogate markers for changes in DNA methylation in the tumor. However, the question of whether the peripheral host immune system exhibits a distinct DNA methylation response to the cancer state that correlates with cancer progression has not been addressed.
Inventors of this invention find that cancer progression is associated with distinct DNA methylation profiles in the host peripheral immune cells. The present inventions also show that these DNA methylation markers differentiate between cancer and the underlying chronic inflammatory liver disease.
The present inventions illustrate these DNA methylation profiles in a discovery set of 69 people from the Beijing area of China (10 controls and 10 patients for each of the following groups Hepatitis B, C, stages 1-3, and 9 patients for stage 4) of HCC staged using the EASL-EORTC Clinical Practice Guidelines for HCC (Table 1). The present invention used a whole genome approach (Illumina 450k arrays) to delineate DNA methylation profiles without preconceived bias on the type of genes that might be involved. This invention demonstrates for the first time specific DNA methylation profiles of Hepatitis B and C that are distinct from HCC as well as DNA methylation profiles for each of the different stages of HCC in peripheral blood mononuclear cells. These profiles do not show a significant overlap with the DNA methylation profiles of HCC tumors that have been previously described (16), suggesting that they reflect changes in peripheral blood mononuclear cells genomic functions and are not surrogates of changes in tumor DNA methylation. Thus, this invention reveals the DNA methylation changes in the host immune system in cancer. This invention also reveals a DNA methylation signature in host T cells in people suffering from cancer. The present invention also shows that there is a significant overlap between DNA methylation profiles delineated in PBMCs and T cells. The present invention validates 4 genes that were differentially methylated in T cells from HCC patients in the discovery cohort by pyrosequencing of T cells DNA in a separate cohort of patients (n=79).
The present invention demonstrates the utility of this invention in predicting cancer and stage of cancer of unknown samples using statistical models based on these DNA methylation signatures. This invention has important implications for understanding of the mechanisms of the disease and its treatment and provides noninvasive diagnostics of cancer in peripheral blood mononuclear cells DNA. This invention could be used by any person skilled in the art to derive DNA methylation signatures in the immune system of any cancer using any method for genome wide methylation mapping that are available to those skilled in the art such as for example genome wide bisulfite sequencing, capture sequencing, methylated DNA Immunoprecipitation (MeDIP) sequencing and any other method of genome wide methylation mapping that becomes available.
Preferred embodiments of the present invention are as follows.
In the first aspect, the present invention provides DNA methylation signature of cancer in peripheral blood mononuclear cells (PBMC) for predicting cancer, said DNA methylation signature is derived using genome wide DNA methylation mapping methods, such as Illumina 450K or 850K arrays, genome wide bisulfite sequencing, methylated DNA Immunoprecipitation (MeDIP) sequencing or hybridization with oligonucleotide arrays.
In one embodiment, the DNA methylation signature is CG IDs derived from PBMC DNA listed below for predicting hepatocellular carcinoma (HCC) stages and chronic hepatitis using either PBMC or T cells DNA methylation levels of said CG IDs.
In one embodiment, the DNA methylation signature is CG IDs derived from T cells listed below for predicting HCC stages and chronic hepatitis using PBMC or T cells DNA methylation levels of said CG IDs.
In one embodiment, the DNA methylation signature is CG IDs listed below for predicting different stages of HCC using DNA methylation measurements of said CG IDs in T cells or PBMC obtained by using statistical models such as penalized regression or clustering analysis.
Target CG IDs for separating HCC stage 1 from controls: cg14983135, cg10203922, cg05941376, cg14762436, cg12019814, cg14426660, cg18882449, cg02914652;
Target CG IDs for separating HCC stage 2 from controls: cg05941376, cg15188939, cg12344600, cg03496780, cg12019814;
Target CG IDs for separating HCC stage 3 from controls: cg05941376, cg02782634, cg27284331, cg12019814, cg23981150;
Target CG IDs for separating HCC stage 4 from controls: cg02782634, cg05941376, cg10203922, cg12019814, cg14914552, cg21164050, cg23981150;
Target CG IDs for separating HCC stage 1 from hepatitis B: cg05941376, cg10203922, cg11767757, cg04398282, cg11151251, cg24742520, cg14711743;
Target CG IDs for separating HCC stage 1 from stage 2-4: cg03252499, cg03481488, cg04398282, cg10203922, cg11783497, cg13710613, cg14762436, cg23486701;
Target CG IDs for separating HCC stage 2 from stage 3-4: cg02914652, cg03252499, cg11783497, cg11911769, cg12019814, cg14711743, cg15607708, cg20956548, cg22876402, cg24958366;
Target CG IDs for separating HCC stage 1-3 from stage 4: cg02782634, cg11151251, cg24958366, cg06874640, cg27284331, cg16476382, cg14711743.
In one embodiment, the DNA methylation signature is CG IDs listed below for predicting stages of HCC using DNA methylation measurements of said CG IDs in T cells or PBMC obtained by using statistical models such as penalized regression or clustering analysis,
In the second aspect, the present invention provides a kit for predicting cancer, comprising means and reagents for detecting DNA methylation measurements of the DNA methylation signature.
In one embodiment, the present invention provides a kit for predicting hepatocellular carcinoma (HCC) stages and chronic hepatitis, comprising means and reagents for detecting DNA methylation measurements of the CG IDs of table 3 in embodiment.
In one embodiment, the present invention provides a kit for predicting HCC stages and chronic hepatitis, comprising means and reagents for detecting DNA methylation measurements of the CG IDs of table 6 in embodiment.
In one embodiment, the present invention provides a kit for predicting different stages of HCC, comprising means and reagents for detecting DNA methylation measurements of the CG IDs of table 4 in embodiment.
In one embodiment, the present invention provides a kit for predicting stages of HCC, comprising means and reagents for detecting DNA methylation measurements of the CG IDs of table 5 in embodiment.
In the third aspect, the present invention provides gene pathways that are epigenetically regulated in cancer in peripheral immune system.
In the fourth aspect, the present invention provides use of CG IDs disclosed in the present invention. In one embodiment, present invention provides use of DNA pyrosequencing methylation assays for predicting HCC by using CG IDs listed above, for example using the below disclosed primers for AHNAK (outside forward; GGATGTGTCGAGTAGTAGGGT, outside reverse CCTATCATCTCCACACTAACGCT, nested forward TGTTAGGGGTGATTTTTAGAGG, nested reverse ATTAACCCCATTTCCATCCTAACTATCTT, and sequencing primer TTTTAGAGGAGTTTTTTTTTTTTA);
SLFN2L (outside forward GTGATYTTGGTYAYTGTAAYYT, Outside reverse TCTCATCTTTCCATARACATTTATTTAR, forward nested AGGGTTTYAYTATATTAGYYAGGTTGG, reverse nested ATRCAAACCATRCARCCCTTTTRC, sequencing primer YYYAAAATAYTGAGATTATAGGTGT);
AKAP7 (outside forward TAGGAGAAAGGGTTTATTGTGGT, outside reverse ACACACCCTACCTTTTTCACTCCA, nested forward GGTATTGATTTATGGTTAGGGATTTATAG, nested reverse AAACAAAAAAAACTCCACCTCCAATCC, sequencing primer GGGATTTATAGTTTTGTGAGA); and
STAP1 (outside forward AGTYATGTYTTYTGYAAATAAAAATGGAYAYY, outside reverse, TTRCTTTTTAACCACCAACACTACC nested forward YYGTTTYTTTYATYTTYTGGTGATGTTAA, nested reverse ARARRRCAATCTCTRRRTAATCCACATRTR, sequencing primer GGTGATGTTAATYTTYTGTTTA).
In one embodiment, present invention provides use of Receiver operating characteristics (ROC) assays for predicting HCC by using CG IDs listed above, for example STAP1 (cg04398282). In one embodiment, present invention provides use of hierarchical Clustering analysis for predicting HCC by using CG IDs listed above.
In the fifth aspect, the present invention provides method for identifying DNA methylation signature for predicting disease, comprising the step of performing statistical analysis on DNA methylation measurements obtained from samples.
In one embodiment, the method comprises the step of performing statistical analysis on DNA methylation measurements obtained from samples, said DNA methylation measurements are obtained by performing Illumina Beadchip 450K or 850K assay of DNA extracted from sample. In one embodiment, said DNA methylation measurements are obtained by performing DNA pyrosequencing, mass spectrometry based (Epityper™) or PCR based methylation assays of DNA extracted from sample.
In one embodiment, the method comprises the step of performing statistical analysis on DNA methylation measurements obtained from samples; said statistical analysis includes Pearson correlation.
In one embodiment, said statistical analysis includes Receiver operating characteristics (ROC) assays.
In one embodiment, said statistical analysis includes hierarchical clustering analysis assays.
As used herein, the term “CG” refers to a di-nucleotide sequence in DNA containing cytosine and guanosine bases. These di-nucleotide sequences could become methylated in human and other animal DNA. The CG ID reveals its position in the human genome as defined by the Illlumina 450K manifest ((The annotation of the CGs listed herein is publicly available at https://bioconductor.org/packages/release/data/annotation/html/IlluminaHumanMethylation450k.db.html and installed as an R package IlluminaHumanMethylation450k.db as described in Triche T and Jr. IlluminaHumanMethylation450k.db: Illumina Human Methylation 450k annotation data. R package version 2.0.9.).
As used herein, the term “penalized regression” refers to a statistical method aimed at identifying the smallest number of predictors required to predict an outcome out of a larger list of biomarkers as implemented for example in the R statistical package “penalized” as described in Goeman, J. J., L1 penalized estimation in the Cox proportional hazards model. Biometrical Journal 52(1), 70-84.
As used herein, the term “clustering” refers to the grouping of a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
As used herein, the term “Hierarchical clustering” refers to a statistical method that builds a hierarchy of “clusters” based on how similar (close) or dissimilar (distant) are the clusters from each other as described for example in Kaufman, L.; Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis (1 ed). New York: John Wiley. ISBN 0-471-87876-6.
As used herein, the term “gene pathways” refers to a group of genes that encode proteins that are known to interact with each other in physiological pathways or processes. These pathways are characterized using bio-computational methods such as Ingenuity Pathway Analysis: http://www.ingenuity.com/products/ipa.
As used herein, the term “Receiver operating characteristics (ROC) assay” refers to a statistical method that creates a graphical plot that illustrates the performance of a predictor. The true positive rate of prediction is plotted against the false positive rate at various threshold settings for the predictor (i.e. different % of methylation) as described for example in Hanley, James A.; McNeil, Barbara J. (1982). “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve”. Radiology 143 (1): 29-36.
As used herein, the term “Multivariate linear regression” refers to a statistical method that estimates the relationship between multiple “independent variables” or “predictors” such as percentage of methylation, age, sex etc. and an “outcome” or a “dependent variable” such as cancer or stage of cancer. This method determines the statistical significance of each “predictor” (independent variable) in predicting the “outcome” (dependent variable) when several “independent variables” are included in the model.
Patient Samples
HCC staging was diagnosed according to EASL-EORTC Clinical Practice Guidelines: Management of hepatocellular carcinoma. The patients were divided into four groups, including Stage 0 (1), stage A (2), stage B (3) and stage C+D (4). For simplicity, the present invention refers to stages 1-4 in the figures and embodiments. Chronic hepatitis B diagnosing was confirmed using AASLD practice guideline for chronic Hepatitis B, and chronic hepatitis C diagnosing was according to AASLD recommendations for testing, managing and treating Hepatitis C. A strict exclusion criterion was any other known inflammatory disease (bacterial or viral infection with the exception of hepatitis B or C, diabetes, asthma, autoimmune disease, active thyroid disease) which could alter T cells and monocytes characteristics. Clinical characteristics of patients are provided in Table 1 and 2. The participants in the study provided consent according to the regulations of the Capital Medical School. The study received ethical approval from The Capital Medical School in Beijing and McGill University (IRB Study Number A02-M34-13B).
E+04
E+04
+ RFA
E+08
E+05
E+03
E+04
E+04
+ + RFA
E+05
E+02
+ RFA
E+03
E+05
E+04
E+04
E+04
E+03
E+02
indicates data missing or illegible when filed
years
indicates data missing or illegible when filed
Illumina Beadchip 450K Analysis
Blood was drawn from patients into EDTA coated tubes and peripheral blood mononuclear cells were isolated using standard protocols by centrifugation on Ficoll-Hypaque density gradient and mononuclear cells were collected on top of the Ficoll-Hypaque layer because they have a lower density using routine lab procedures, mononuclear cells were separated from platelets by washing (46). DNA was extracted from the cells using commercial human DNA extraction kits (Qiagen), DNA was bisulfite converted and subjected to Illumina HumanMethyaltion450k BeadChip hybridization and scanning using standard protocols recommended by the manufacturer. Samples were randomized with respect to slide and position on arrays and all samples were hybridized and scanned concurrently to mitigate batch effects as recommended by McGill Genome Quebec innovation center according to Illumina Infinum HD technology user guide. Illumina arrays hybridizations and scanning were performed by the McGill Genome Quebec Innovation center according to the manufacturer guidelines. Illumina arrays were analyzed using the ChAMP Bioconductor package in R (47). IDAT files were used as input in the champ.load function using minfi quality control and normalization options. Raw data were filtered for probes with a detection value of P>0.01 in at least one sample. Probes on the X or Y chromosome are filtered out to mitigate sex effects and probes with SNPs as identified in (48), as well as probes that align to multiple locations as identified in (48). Batch effects were analyzed on the non-normalized data using the function champ.svd. Five out of the first 6 principal components were associated with group and batch (slides). Intra-array normalization to adjust the data for bias introduced by the Infinium type 2 probe design was performed using beta-mixture quantile normalization (BMIQ) with function champ.norm (norm=“BMIQ”) (47). Batch effects are corrected after BMIQ normalization using champ.runcombat function.
Cell count analysis for peripheral blood mononuclear cells distribution in samples of this invention was performed according to the Houseman algorithm (49) using the function estimateCellCounts and FlowSorted.Blood.450k data as reference. The Beta values of the batch corrected normalized data are used for downstream statistical analyses.
To compute linear correlation between HCC stages and quantitative distribution of DNA methylation at the 450K CG sites, Pearson correlation between the normalized DNA methylation values and stages of HCC (with stage codes of 0 for control 1 and 2 for hepatitis B and C respectively and 3-6 for the 4 stages of HCC) is performed using the pearson con function in R and correcting for multiple testing using the method “fdr” of Benjamini Hochberg (adjusted P value (Q) of <0.05) as well as the conservative Bonferroni correction (Q<1×10−7). A similar approach could be used utilizing new generations of Illumina arrays such as Illumina 850K arrays.
Correlation Between Quantitative Distribution of Site-Specific DNA Methylation Levels and Progression of HCC
The analysis reveals a broad signature of DNA methylation that correlates with progression of HCC (160,904 sites). The analysis of this invention focus on 3924 sites with the most robust changes (r>0.8;r<−0.8; delta beta >0.2/, delta beta>−0.2, p<10−7). A genome wide view of the intensifying changes in DNA methylation of these sites during HCC progression relative to chronic hepatitis B and C and control is shown in
Utility of DNA Methylation Signature of HCC in Peripheral Blood Mononuclear Cells for Differentiating Cancer Samples from Controls
These DNA methylation signatures have therefore the utility of classifying the stage of HCC in patient sample. The heat map in
Inventors of the present invention delineated differentially methylated CGs between healthy controls and each of the HCC stages independently using the Bioconductor package Limma (50) as implemented in ChAMP. The number of differentially methylated CG sites (p<1×10−7) between each stage of HCC and healthy controls increases with advance in stages; 14375 for stage 1, 22018 stage 2, 30709, stage 3 and 54580 for stage 4. Significance of overlap between two groups was determined using hypergeometric Fisher exact test in R. There is a significant overlap between the stages of cancer (
The fraction of sites that are hypomethylated relative to hypermethylated sites in HCC increases as well from 26% in stage 1 to 57% in stage 4 (Figure. 3B). This increase in number of hypomethylated sites with progression of HCC was observed as well in the results of the Pearson correlation analysis (
HCC patients in the study and in clinical setting are a heterogeneous group with respect to alcohol, smoking (52-55), sex (56) and age (57) and each of these factors are known to affect DNA methylation. In addition, peripheral mononuclear cells are a heterogeneous mixture of cells and alterations in cell distribution between individuals might affect DNA methylation as well. This invention first determined the cell count distribution for each case using the Houseman algorithm (49). Two-way ANOVA followed by pairwise comparisons and correction for multiple testing found no significant difference in cell count between the groups. Multifactorial ANOVA with group, sex and age as cofactors was performed for CGs that were short listed for association with HCC using loop_anova lmFit function with Bonferoni adjustment for multiple testing. Multivariate linear regression was performed on the shortlisted CG sites that were found to associate with HCC to test whether these associations will survive if cell counts, sex, age, and alcohol abuse are used as covariates in the linear regression model using the lmFit function in R. Comparison of differentially methylated (relative to control) gene lists in different groups was performed using Venny (http://bioinfogp.cnb.csic.es/tools/venny/). Hierarchical clustering was performed using One minus Pearson correlation and heatmaps were generated in the Broad institute GeneE application (http://www.broadinstitute.org/cancer/software/GENE-E/).
Then, a multivariate linear regression on the normalized beta values of the 350 CG sites is performed that differentiate HCC from all other groups using group (HCC versus non HCC), sex, alcohol, smoking, age, and cell-count as covariates. All CG sites remained highly significant for the group covariate even after including the other covariates in the model. Following Bonferroni corrections for 350 measurements, 342 CG sites remained highly significant for group (HCC versus non HCC). A multifactorial ANOVA analysis is performed on the beta values of the 350 sites as dependent variables and group (HCC versus non-HCC), sex and age as independent variables to determine whether there are possible interactions between either sex and group, age and group and between sex+age and group on DNA methylation.
While group remained significant for all 350 CGs no significant interactions with sex or age were found after Bonferroni corrections. In summary, these data show robust DNA methylation differences in PBMC DNA between HCC and other non-HCC patients including Hepatitis B and Hepatitis C.
The differentially methylated sites for each of the HCC stages were derived by comparing 10 healthy control and 10 stage specific HCCs. Other stages and the Hepatitis B and C samples were not “trained” (“trained” is used by the model to derive the differentially methylated sites) for these differentially methylated CGs and served as “cross-validation” sets of “unknown” samples to address the following questions: First, would the markers derived for one stage of cancer cluster correctly HCC samples that were not “trained” by these markers? Second, would DNA methylation markers that were “trained” to differentiate HCC from healthy controls also differentiate HCC from Hepatitis B and hepatitis C. Differentiating HCC from chronic hepatitis is a critical challenge for early diagnosis of HCC since a notable fraction of HCC patient progress from chronic hepatitis to HCC.
Hierarchical clustering is performed by one minus Pearson correlation for all HCC and hepatitis samples using for each individual analysis a set of CG methylation markers that were “discovered” by testing only one stage of HCC and controls. All other stages were “naïve” to these markers and served as “cross-validation”. Cross validation refers to a statistical strategy whereby a small subset of samples in the study is used to “discover” a list of markers (predictors) that differentiate two groups from each other (i.e. “cancer” and “control”). These “discovered” markers are then tested as predictors in other “new” samples in the study. As demonstrated in
The overlap between independently derived CG markers that differentiate each of the HCC stages (
Although there is a large overlap between CGs that are differentially methylated at the different stages of cancer, the overlap is partial. The present invention demonstrates here that one could utilize the 350 CG list (described above) (Table 3) to differentiate HCC stages from each other. Hierarchical clustering by one minus Pearson correlation of all samples using these 350 CGs correctly clustered the HCC cases by stage while hepatitis B and C cases were clustered with healthy controls. Although there is a large overlap between sites that are differentially methylated from healthy controls at different stages of HCC, the intensity of differential methylation is enhanced with progression of HCC. Thus, the level of methylation of these 350 CG sites could be also used to differentiate stages of HCC. A kit, comprising means and reagents for detecting DNA methylation measurements of the CG IDs of table 3, could be used for predicting hepatocellular carcinoma (HCC) stages and chronic hepatitis. Note that the DNA methylation markers list was derived by comparing only healthy controls and single stages of HCC, nevertheless this list could correctly predict other “new” hepatitis B and C cases as non-HCC (
The disclosure of this invention reveals differentially methylated CGs in PBMC from HCC patients that can be used to distinguish particular stages of HCC from controls and from chronic hepatitis patients.
Data suggest that PBMC DNA methylation markers differentiate stages of HCC. The present invention then defined a list of the minimal number of CG sites that are required to differentiate stages of HCC from each other. “Penalized regression” of the 350 CG sites is performed between stage samples using the R package “penalized” for fitting penalized regression models (51). The penalized R package uses likelihood cross-validation and predictions are made on each left-out subject. The fitted model identified 8 CGs that predict stage 1 versus control, 5CGs that predict stage 2 versus control, 5 CGs that differentiate stage 3 versus control, 7 CGs that differentiate Stage 4 versus control and 7 CGs that are sufficient to differentiate stage 1 from hepatitis B (Table 4). 8 CGs are selected that differentiate between stage 1 and later stages 2-4, 10CGs that differentiate stage 1 and 2 from later stages 3-4 and 7 CGs that differentiate stage 4 from all earlier stages (stages 1-3) (Table 4). DNA methylation measurements in PBMC of the combined list of 31 CG stage-separators (after removing duplicates, table 5) accurately predicted all HCC cases and their stages using One minus Pearson clustering (
The penalized models derived for differentiating the specific stages using CGs listed in Table 4 were then used on other “naïve” (new samples that were not used for the discovery of the markers) HCC cases and hepatitis B and C controls to predict likelihood of each case being at different stages of HCC. The results of these analyses are shown in
Multivariate analysis suggests that the differences in PBMC DNA methylation between HCC and other groups (control and chronic hepatitis) remain even when differences in cell count are taken into account. Further, to determine whether differences in DNA methylation between cancer and control would disappear once the complexity of cell composition is reduced by isolation of a specific cell type (although heterogeneity in T cell subtypes remains), the differences in DNA methylation profiles between T cells isolated from 10 of the 39 HCC patients included in the study (samples from each of the HCC stages, indicated in the legend to table 1) and all healthy controls (n=10) were analyzed to determine whether differences in DNA methylation between cancer and control would disappear once the complexity of cell composition is partly reduced by isolation of a specific cell type.
T cells were isolated using antiCD3 immuno-magnetic beads (Dynabed Life technologies), Linear (mixed effects) regression using the ChAMP package on normalized DNA methylation values between HCC and healthy controls revealed 24863 differentially methylated sites at a threshold of p<1×10−7. 370 robust differentially methylated CGs are shortlisted at a threshold of p<1×10−7 and delta beta >0.3, <−0.3 (Table 6) and hierarchical clustering of the healthy control and HCC T cell DNA by One minus Pearson correlation was performed (
These 370 CG sites that differentiate T cells from HCC and healthy controls (Table 6) could be used to cluster “untrained” different chronic hepatitis and healthy control PBMC samples (n=69). The clustering analysis presented in
The 350 CGs that were derived by analysis of PBMC DNA clustered the T cell healthy controls and HCC samples correctly (
The present invention also shows that the shortlisted 31 CGs derived by penalized regression from PBMC DNA methylation measures (Table 5) also cluster and stage accurately T cell DNA methylation measurements from HCC patients and controls using One minus Pearson correlations (
Progression of HCC has a broad footprint in the methylome (the genome-wide DNA methylation profile) (
A comparative IPA analysis between PBMC and T cells differentially methylated genes revealed NFKB, TNF, VEGF and IL4 and NFAT as common upstream regulators. Overall, the DNA methylation alterations in HCC PBMC and T cell show a strong signature in immune modulation functions. Differentially methylated promoters between HCC and noncancerous liver tissue were previously delineated (16, 58). The present invention determined whether there was an overlap between the promoters that are differentially methylated in HCC in the cancer biopsies (1983 promoters) and peripheral blood mononuclear cells (545 promoters) and found an overlap of 44 promoters which was not statistically significant as determined by Fisher hypergeometric test (p=0.76). These data show that the changes in DNA methylation seen in peripheral blood mononuclear cells reflect changes in the immune system in HCC and that these differentially methylated CGs are most probably not a footprint of circulating DNA from tumors or “surrogates” of DNA methylation changes occurring in the tumor. The utility of these pathways is by providing new targets for cancer therapeutics in the peripheral immune system.
Pyrosequencing was performed using the PyroMark Q24 machine and results were analyzed with PyroMark® Q24 Software (Qiagen). All data were expressed as mean±standard error of the mean (SEM). The statistical analysis was undertaken using R. Primers used for the analysis are listed in Table 7.
For the replication set, this invention uses T cells DNA to reduce cell composition issues. The replication set included 79 people, 10 healthy controls and 10 individuals from each of the hepatitis B and C and 3 cancer stages and 19 stage 1 samples (Table 2). Following genes are examined that were found to be significantly differentially methylated in T cells in comparison with HCC in the discovery set: STAP1 (cg04398282) (also included in table 6), AKAP7 (cg12700074), SLFNL2 (cg00974761), and included 1 additional hypomethylated gene in HCC: Neuroblast differentiation-associated protein (AHNAK) (cg14171514). Linear regression between all controls (healthy and hepatitis B and C) and HCC stage 1,2 (0+A) revealed significant association with HCC stage 1,2 for all 4 CGs after correction for multiple testing (STAP1 p=4.04×10−7; AKAP7 p=0.046; SLFNL2 p=0.012; AHNAK p=0.003436). Linear regression between all controls and all stages of HCC revealed significant association for STAP1 (p=6.6×10−6) and AHNAK with HCC (p=0.026) after correction for multiple testing.
ANOVA analysis revealed a significant difference in methylation between the control group (healthy controls and hepatitis B and C) and the group of early HCC (stages 0+A; 1,2) in all 4 CGs that were validated. A group comparison between all controls and all HCC revealed a significant difference in methylation for STAP1 (p=1.7×10−6), AKAP7 (p=0.042), AHNAK (p=0.0062) but the difference for SLFNL2 was trendy but not significant (p=0.071). ANOVA revealed significant effect for diagnosis (F=10.017; p=7.49×10−6) on STAP1 methylation.
Pairwise analysis after correction for multiple testing on the 5 different diagnosis subgroups of controls (healthy controls, chronic hepatitis B and chronic hepatitis C) and early HCC (stages 1 and 2 or 0 and A) revealed significant differences between stage 1 (BCLC 0) HCC and either healthy controls (p=0.00037), chronic hepatitis B (p=0.00849) or hepatitis C (p=0.00698) and between stage 2 (BCLC A) and either healthy controls (p=0.00018), hepatitis B (p=0.00670) or hepatitis C (p=0.00534). While there was also an effect of diagnosis on SLFN2L methylation (F=3.9376; p=0.00810) AHNAK (F=3.0219; p=0.02809) and AKAP7 (F=3.4; p=0.01633), pairwise comparisons between the different diagnosis subgroups were not significant.
These data illustrates that these 4 CG sites could be used to predict early stages of HCC and differentiate them from controls (
A measure of the diagnostic value of a biomarker is the Receiver Operating Characteristic (ROC) which measures “sensitivity” (fraction of true discoveries) as a function of “specificity” (fraction of false discoveries). The ROC test determines a threshold value (ie. percentage of methylation at a particular CG) that provides the most accurate prediction (the highest fraction of “true discoveries” and the least number of “false discoveries”) (59) (
The methods used here to measure DNA methylation provide only an example and do not exclude measurements of DNA methylation by other acceptable methods. It should be noted that any person skilled in the art could measure DNA methylation of STAP1 and other differentially methylated sites using a number of accepted and available methods that are well documented in the public domain including for example, Illumina 850K arrays, mass spectrometry based methods such as Epityper (Seqenom), PCR amplification using methylation specific primers (MS-PCR), high resolution melting (HRM), DNA methylation sensitive restriction enzymes and bisulfite sequencing.
Applications of this Invention
The applications of this invention are in the field of molecular diagnostics of HCC and cancer in general. Any person skilled in the art could use this invention to derive similar biomarkers for other cancers. Moreover, the genes and the pathways derived from the genes can guide new drugs that focus on the peripheral immune system using the targets listed in embodiment 9. The focus in DNA methylation studies in cancer to date has been on the tumor, tumor microenvironment (8, 9) and circulating tumor DNA (5, 6) and major advances were made in this respect. However, the question remains of whether there are DNA methylation changes in host systems that could instruct us on the system wide mechanisms of the disease and/or serve as noninvasive predictors of cancer. HCC is a very interesting example since it frequently progresses from preexisting chronic hepatitis and liver cirrhosis (2) and could provide a tractable clinical paradigm for addressing this question. This invention reveals that the qualities of the host immune system might define the clinical emergence and trajectory of cancer.
Importantly, the present invention shows a sharp boundary between stage 1 of HCC and chronic hepatitis B and C that could be used to diagnose early transition from chronic hepatitis to HCC as illustrated in the embodiments of this invention. The present invention also reveals how this invention could be used to separate stages of cancer from each other. All assays will require a set of known samples with methylation values for the CG IDs disclosed in this invention to train the models using hierarchical clustering, ROC or penalized regression and unknown samples will then be analyzed using these models as illustrated in the embodiments of this invention.
The fact that the present invention is mentioning different dependent claims does not mean that one cannot use a combination of these claims for predicting cancer. The examples disclosed here for measuring and statistically analyzing and predicting cancer, stages of cancer and chronic hepatitis should not be considered limiting. Various other modifications will be apparent to those skilled in the art to measure DNA methylation in cancer patients such as Illumina 850K arrays, capture array sequencing, next generation sequencing, methylation specific PCR, epityper, restriction enzyme based analyses and other methods found in the public domain. Similarly, there are numerous statistical methods in the public domain in addition to those listed here to use this invention for prediction of cancer in patient samples.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2016/086845 | 6/23/2016 | WO | 00 |