METHYLATION MARKERS FOR DIAGNOSING CANCER

Information

  • Patent Application
  • 20200277677
  • Publication Number
    20200277677
  • Date Filed
    October 05, 2018
    5 years ago
  • Date Published
    September 03, 2020
    4 years ago
Abstract
Disclosed herein are methods, probes, and kits for diagnosing the presence of cancer and/or a cancer type in a subject.
Description
BACKGROUND OF THE DISCLOSURE

Cancer is a leading cause of deaths worldwide, with annual cases expected to increase from 14 million in 2012 to 22 million during the next two decades (WHO). Diagnostic procedures for liver cancer, in some cases, begin only after a patient is already present with symptoms, leading to costly, invasive, and sometimes time-consuming procedures. In addition, inaccessible areas sometimes prevent an accurate diagnosis. Further, high cancer morbidities and mortalities are associated with late diagnosis.


SUMMARY OF THE DISCLOSURE

In certain embodiments, disclosed herein is a method of selecting a subject suspected of having cancer for treatment, comprising: (a) contacting treated DNA with at least one probe from a probe panel to generate an amplified product, wherein the at least one probe hybridizes under high stringency condition to a target sequence of a cg marker selected from Table 1, Table 2, Table 7, Table 8, or Table 13, and wherein the treated DNA is processed from a biological sample obtained from the subject; (b) analyzing the amplified product to generate a methylation profile of the cg marker; (c) comparing the methylation profile to a reference model relating methylation profiles of cg markers from Tables 1, 2, 7, 8, and 13 to a set of cancers; (d) based on the comparison of step c), determining: (i) whether the subject has cancer; and (ii) which cancer type the subject has; and (e) administering an effective amount of a therapeutic agent to the subject if the subject is determined to have cancer and the cancer type is determined.


In certain embodiments, disclosed herein is a method of detecting the methylation status of a set of cg markers, comprising: (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of a cg marker from Table 1, Table 2, Table 7, Table 8, Table 13, Table 14, or Table 20; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR.


Disclosed herein, in certain embodiments, is a method of detecting a methylation pattern of a set of biomarkers in a subject suspected of having a cancer, the method comprising: (a) processing an extracted genomic DNA with a deaminating agent to generate a genomic DNA sample comprising deaminated nucleotides, wherein the extracted genomic DNA is obtained from a biological sample from the subject suspected of having a cancer; and (b) detecting the methylation pattern of one or more biomarkers selected from Table 1, Table 2, Table 7, Table 8, Table 13, Table 14, or Table 20 from the extracted genomic DNA by contacting the extracted genomic DNA with a set of probes, wherein the set of probes hybridizes to the one or more biomarkers, and perform a DNA sequencing analysis to determine the methylation pattern of the one or more biomarkers. In some embodiments, said detecting comprises a real-time quantitative probe-based PCR or a digital probe-based PCR. In some embodiments, the digital probe-based PCR is a digital droplet PCR. In some embodiments, the set of probes comprises a set of padlock probes. In some embodiments, step b) comprises detecting the methylation pattern of one or more biomarkers selected from Table 2, Table 13, Table 14, or Table 20. In some embodiments, step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg19516279, cg06100368, cg25945732, cg19155007, cg17952661, cg04072843, cg01250961, cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg01237565, cg16561543, cg13771313, cg13771313, cg08169020, cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg09095222, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, cg13169641, cg25352342, cg09921682, cg02504622, cg17373759, cg06547203, cg06826710, cg00902147, cg17609887, cg15721142, cg08116711, cg00736681, cg18834029, cg06969479, cg24630516, cg16901821, cg20349803, cg23610994, cg19313373, cg16508600, cg24096323, cg24746106, cg12288267, cg10430690, cg24408776, cg05630192, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, cg09921682, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, cg06405341, cg08557188, cg00690392, cg03421440, cg07077277, or cg20702527. In some embodiments, the subject is suspected of having a breast cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg19516279, cg06100368, cg20349803, cg23610994, cg19313373, cg16508600, and cg24096323. In some embodiments, the subject is determined to have a breast cancer if: at least one of the cg markers cg19516279 and cg06100368 is hypermethylated; at least one of the cg markers cg20349803, cg23610994, cg19313373, cg16508600, and cg24096323 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having a liver cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg25945732, cg19155007, cg17952661, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, and cg26859666, or cg00456086. In some embodiments, the subject is determined to have a liver cancer if: at least one of the cg markers cg25945732, cg19155007, or cg17952661 is hypermethylated; at least one of the cg markers cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, or cg00456086 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having a liver cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, 5-176829639, 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858. In some embodiments, the subject is determined to have a liver cancer if: at least one of the markers 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, or 5-176829639 is hypermethylated; at least one of the markers 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having an ovarian cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg04072843, cg01250961, cg24746106, cg12288267, and cg10430690. In some embodiments, the subject is determined to have an ovarian cancer if: at least one of the cg markers cg04072843 and cg01250961 is hypermethylated; at least one of the cg markers cg24746106, cg12288267, and cg10430690 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having a colorectal cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, and cg09921682. In some embodiments, the subject is determined to have a colorectal cancer if: at least one of the cg markers cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, or cg00846300 is hypermethylated; at least one of the cg markers cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, or cg09921682 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having a colorectal cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg10673833, cg10493436, cg10428836, cg27284288, cg16959747, cg17494199, cg23678254, cg24067911, or cg25459300. In some embodiments, the subject is suspected of having a colorectal cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195. In some embodiments, the subject is determined to have a colorectal cancer if: at least one of the cg markers cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195 is hypermethylated; at least one of the cg markers cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, or cg16391792 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having a prostate cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg06547203, cg06826710, cg00902147, cg17609887, and cg15721142. In some embodiments, the subject is determined to have a prostate cancer if: at least one of the cg markers cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, or cg26149167 is hypermethylated; at least one of the cg markers cg06547203, cg06826710, cg00902147, cg17609887, or cg15721142 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having a pancreatic cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg01237565, cg16561543, and cg08116711. In some embodiments, the subject is determined to have a pancreatic cancer if: at least one of the cg markers cg01237565 or cg16561543 is hypermethylated; cg marker cg08116711 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having acute myeloid leukemia and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg13771313, cg13771313, and cg08169020. In some embodiments, the subject is suspected of having cervical cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, and cg13169641. In some embodiments, the subject is determined to have cervical cancer if: at least one of the cg markers cg08169020, cg21153697, cg07326648, cg14309384, or cg20923716 is hypermethylated; at least one of the cg markers cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, or cg13169641 is hypomethylated; or a combination thereof. In some embodiments, the subject is suspected of having sarcoma and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg09095222. In some embodiments, the subject is determined to have sarcoma if at least cg marker cg09095222 is hypermethylated. In some embodiments, the subject is suspected of having stomach cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg00736681 and cg18834029. In some embodiments, the subject is determined to have stomach cancer if at least one of the cg markers cg00736681 or cg18834029 is hypomethylated. In some embodiments, the subject is suspected of having thyroid cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg06969479, cg24630516, and cg16901821. In some embodiments, the subject is determined to have thyroid cancer if at least one of the cg markers cg06969479, cg24630516, or cg16901821 is hypomethylated. In some embodiments, the subject is suspected of having mesothelioma and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg05630192. In some embodiments, the subject is determined to have mesothelioma if cg marker cg05630192 is hypomethylated. In some embodiments, the subject is suspected of having glioblastoma and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg06405341. In some embodiments, the subject is suspected of having lung cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg08557188, cg00690392, cg03421440, and cg07077277. In some embodiments, the subject is determined to have lung cancer if at least one of the cg markers cg08557188, cg00690392, cg03421440, or cg07077277 is hypomethylated. In some embodiments, the biological sample is a blood sample, a urine sample, a saliva sample, a sweat sample, or a tear sample. In some embodiments, the biological sample is a cell-free DNA sample. In some embodiments, the biological sample comprises circulating tumor cells.


In certain embodiments, disclosed herein is a kit comprising a set of nucleic acid probes that hybridizes to target sequences of cg markers illustrated in Table 1, Table 2, Table 7, Table 8, Table 13, Table 14, Table 20, or a combination thereof. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 1. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 2. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 7. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 8. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 13. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 14. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from Table 20. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg markers selected from cg19516279, cg06100368, cg25945732, cg19155007, cg17952661, cg04072843, cg01250961, cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg01237565, cg16561543, cg13771313, cg13771313, cg08169020, cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg09095222, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, cg13169641, cg25352342, cg09921682, cg02504622, cg17373759, cg06547203, cg06826710, cg00902147, cg17609887, cg15721142, cg08116711, cg00736681, cg18834029, cg06969479, cg24630516, cg16901821, cg20349803, cg23610994, cg19313373, cg16508600, cg24096323, cg24746106, cg12288267, cg10430690, cg24408776, cg05630192, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, cg09921682, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, cg06405341, cg08557188, cg00690392, cg03421440, cg07077277, cg00456086, and cg20702527. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg10673833, cg10493436, cg10428836, cg27284288, cg16959747, cg17494199, cg23678254, cg24067911, or cg25459300. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg10673833 or cg25462303. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, or cg16391792. In some embodiments, the set of nucleic acid probes hybridizes to target sequences of cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195. In some embodiments, the set of nucleic acid probes comprises a set of padlock probes.





BRIEF DESCRIPTION OF THE DRAWINGS

Various aspects of the disclosure are set forth with particularity in the appended claims. The file of this patent contains at least one drawing/photograph executed in color. Copies of this patent with color drawing(s)/photograph(s) will be provided by the Office upon request and payment of the necessary fee. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:



FIG. 1 illustrates the methylation status of biomarker 7-1577016.



FIG. 2 illustrates the methylation status of biomarker 11-67177103.



FIG. 3 illustrates the methylation status of biomarker 19-10445516 (cg17126555).



FIG. 4 illustrates the methylation status of biomarker 12-122277360.



FIG. 5 illustrates the methylation status of biomarker 6-72130742 (cg24772267).



FIG. 6 illustrates the methylation status of biomarker 3-15369681.



FIG. 7 illustrates the methylation status of biomarker 3-131081177.



FIG. 8 illustrates workflow chart of data generation and analysis. Whole genome methylation data on HCC and normal lymphocytes were used to identify 401 candidate markers. Diagnostic marker selection: Lasso and Random-forest analyses were applied to a training cohort of 715 HCC and 560 normal patients to identify a final selection of 10 markers. These ten markers were applied to a validation cohort of 383 HCC and 275 normal patients. Prognostic marker selection: univariant-cox and LASSO-Cox were applied to a training cohort of 680 HCC patients with survival data to identify a final selection of eight markers. These eight markers were applied to a validation cohort of 369 HCC with survival data.



FIG. 9A-FIG. 9H illustrate cfDNA methylation analysis for diagnosis of HCC. FIG. 9A shows the heatmap of methylation of 28 pairs of matched HCC tumor DNA and plasma cfDNA, with a mean methylation value threshold of 0.1 as a cutoff. FIG. 9B shows the methylation values and standard deviations of ten diagnostic markers in normal plasma, HCC tumor DNA, and HCC patient cfDNA. FIG. 9C and FIG. 9D show the confusion tables of binary results of the diagnostic prediction model in the training (FIG. 9C) and validation datasets (FIG. 9D). FIG. 9E and FIG. 9F illustrate ROC of the diagnostic prediction model with methylation markers in the training (FIG. 9E) and validation datasets (FIG. 9F). FIG. 9G and FIG. 9H show the unsupervised hierarchical clustering of ten methylation markers selected for use in the diagnostic prediction model in the training (FIG. 9G) and validation datasets (FIG. 9H).



FIG. 10A-FIG. 10K illustrate cfDNA methylation analysis and tumor burden, treatment response, and staging. The combined diagnosis score (cd-score) (FIG. 10A) and AFP (FIG. 10B) in healthy controls, individuals with liver diseases (HBV/HCV infection, cirrhosis, and fatty liver) and HCC patients. FIG. 10C shows the cd-score in normal controls and HCC patients with and without detectable tumor burden. FIG. 10D shows the cd-score in normal controls, HCC patients before treatment, with treatment response, and with progression. FIG. 10E shows the cd-score in normal controls and HCC patients before surgery, after surgery, and with recurrence. FIG. 10F shows the cd-score in normal controls and HCC patients from stage I-IV. FIG. 10G shows the ROC of cd-score and AFP for HCC diagnosis in whole HCC cohort. cd-score (FIG. 10H) and AFP (FIG. 10I) in HCC patients with initial diagnosis (before surgery or other treatment), with treatment response, with progression, and with recurrence. cd-score (FIG. 10J) and AFP (FIG. 10K) in HCC patients from stage I-IV.



FIG. 11A-FIG. 11G illustrate cfDNA methylation analysis for prognostic prediction HCC survival. FIG. 11A and FIG. 11B show the overall survival curves of HCC patients with low or high risk of death at 6 months, according to the combined prognosis score (cp-score) in the training (FIG. 11A) and validation datasets (FIG. 11B). Survival curves of HCC patients with stage I/II and stage III/IV in the training (FIG. 11C) and validation datasets (FIG. 11D). The ROC for the cp-score, stage, and cp-score combined with stage in the training (FIG. 11E) and validation datasets (FIG. 11F). FIG. 11G shows the survival curves of HCC patients with combinations of cp-score risk and stage in the whole HCC cohort.



FIG. 12 illustrates an unsupervised hierarchical clustering of top 1000 methylation markers differentially methylated between HCC tumor DNA and normal blood. Each column represents an individual patient and each row represents a CpG marker.



FIG. 13A-FIG. 13B illustrate an exemplary region encompassing two Blocks of Correlated Methylation (BCM) in cfDNA samples of from HCC and normal controls. FIG. 13A shows a genomic neighborhood of the BCM displayed within UCSC genome browser (Pearson correlation track showed correlation data by summing r values for a marker within a BCM. Cg marker names below the Pearson correlation graph (cg14999168, cg14088196, cg25574765) were methylation markers from TCGA. Gene name and common SNPs were also listed. FIG. 13B shows a not-to-scale representation of a set of analyzed cg markers belonging to two BCMs in this region. Boundaries between blocks are indicated by a black rectangle, whereas red squares indicate correlated methylation (r>0.5) between two nearby markers. Correlation between any two markers is represented by a square at the intersection of (virtual) perpendicular lines originating from these two markers. White color indicates no significant correlation. 10 newly identified methylation markers in the left MCB anchored by marker cg14999168 or 11 newly identified methylation markers in the right MCB anchored by cg14088196/cg 25574765 were highly consistent and correlated among HCC ctDNA, normal cfDNA, and HCC tissue DNA. Using markers within the same MCB can significantly enhanced allele calling accuracy. Vertical lines at the bottom of panel b were genomic coordinates of boundaries of two MCBs.



FIG. 14 illustrates an unsupervised hierarchical clustering of exemplary methylation markers for Stage I-Stage IV HCC tumor.



FIG. 15 shows methylation values correlated with treatment outcomes in HCC patients with serial plasma samples. FIG. 15A shows a change in cd-score comparing patients after surgery, with clinical response, and with disease progression (***p<0.001). FIG. 15B shows cd-score trends in individual patients after complete surgical resection with treatment response, and with disease progression. PRE: pre-treatment; POST: after-treatment.



FIG. 16 illustrates a dynamic monitoring of treatment outcomes in individual patients with cd-score and AFP. Dates of treatments are indicated by vertical blue arrows. PD, progressive disease; PR partial response; SD, stable disease; TACE, trans-catheter arterial chemoembolization.



FIG. 17A-FIG. 17C illustrates data analysis of an exemplary marker cg10673833.



FIG. 18 illustrates a workflow for building the diagnostic and prognostic models. Whole genome methylation data on HCC, LUNC and normal blood were used to identify candidate markers for probe design. Left panel: diagnostic marker selection: LASSO analysis was applied to a training cohort of 444 HCC, 299 LUNC, and 1123 normal patients to identify a final selection of 77 markers. These 77 markers were applied to a validation cohort of 445 HCC, 300 LUNC, and 1124 normal patients. Right panel: prognostic marker selection: LASSO-Cox were applied to a training cohort of 433 HCC and 299 LUNC patients with survival data to identify a final selection of 20 markers. These 20 markers were applied to a validation cohort of 434 HCC and 300 LUNC with survival data.



FIG. 19A-FIG. 19D illustrates cfDNA methylation analysis for diagnosis of LUNC and HCC. FIG. 19A shows receiver operating characteristic (ROC) curves and the associated Area Under Curves (AUCs) of the diagnostic prediction model (cd-score) using cfDNA methylation analysis in the validation cohort. FIG. 19B shows box plot of composite scores used to classify normal and cancer patients (left), and LUNC and HCC patients (right). Unsupervised hierarchical clustering of methylation markers differentially methylated between cancer (HCC and LUNC) and normal (FIG. 19C) and between HCC and LUNC (FIG. 19D). Each row represents an individual patient and each column represents a MCB marker.



FIG. 20A-FIG. 20D illustrates methylation profiling in healthy control, high-risk patients and cancer patients. FIG. 20A shows methylation profiling differentiates HCC from high risk liver disease patients or normal controls. High risk liver diseases were defined as hepatitis, liver cirrhosis and fatty liver disease. FIG. 20B shows serum AFP differentiates HCC from high risk liver disease patients or normal controls. FIG. 20C shows methylation profiling differentiates LUNC from patients who smoke and normal controls. FIG. 20D shows serum CEA differentiates LUNC from high risk (smoking) patients.



FIG. 21A-FIG. 21R illustrates cfDNA methylation analysis could predict tumor burden, staging, and treatment response using a composite diagnosis score in LUNC and HCC patients. cd-score in patients with and without detectable tumor burden in LUNC (FIG. 21A) when compared to CEA (FIG. 21I) and HCC (FIG. 21E) when compared to AFP (FIG. 21M); cd-score of patients with stage I/II and stage III/IV disease in LUNC (FIG. 21B) when compared to CEA (FIG. 21J) and HCC (FIG. 21F) patients when compared to AFP (FIG. 21N); cd-score in patients before intervention, after surgery, and with recurrence in LUNC (FIG. 21C) when compared to CEA (FIG. 21K) and HCC (FIG. 21G) when compared to AFP (FIG. 21O); cd-score in patients before intervention, with treatment response, and with worsening progression in LUNC (FIG. 21D) when compared to CEA (FIG. 21L) and HCC (FIG. 21H) when compared to AFP (FIG. 21P); FIG. 21Q: The ROC curve and the AUC of cd-score and AFP for LUNC diagnosis in the entire LUNC cohort. FIG. 21R: The ROC curve and the AUC of cd-score and AFP for HCC diagnosis in the entire HCC cohort.



FIG. 22A-FIG. 22F illustrates prognostic prediction in HCC and LUNC survival based on cfDNA methylation profiling. FIG. 22A shows the overall survival curves of HCC patients with low or high risk of death, according to the combined prognosis score (cp-score) in the validation cohort. FIG. 22B shows the overall survival curves of LUNC patients with low or high risk of death, according to the combined prognosis score (cp-score) in the validation dataset. FIG. 22C shows the survival curves of HCC patients with stage I/II and stage III/IV in the validation cohort. FIG. 22D shows the survival curves of patients with stage I/II and stage III/IV LUNC in the validation cohort. The ROC for 12 months survival predicted by cp-score, CEA, AFP, stage, and cp-score combined with stage of HCC (FIG. 22E) and LUNC (FIG. 22F) in the validation cohort.



FIG. 23A-FIG. 23B illustrates early detection of LUNC using a cfDNA methylation panel. 208 smoker patients was enrolled with lung nodules between 10 mm and 30 mm in size in a prospective trial and measured a cfDNA LUNC methylation panel. Patients were divided into a training and a testing cohort (FIG. 23A); Receiver operating characteristic (ROC) curves and the associated Area Under Curves (AUCs) of the prediction of Stage I LUNC versus benign lung nodules in the validation cohort with 91.4% accuracy (FIG. 23B); table showing prediction results between Stage I LUNC versus benign lung nodules showing high sensitivity and specificity in the validation cohort.



FIG. 24A-FIG. 24D illustrates methylation markers can differentiate between HCC and liver cirrhosis and Detect progression from liver cirrhosis to HCC. A prediction model was first built using 217 HCC and 241 cirrhosis patients and divided patients into a training and a testing cohort (FIG. 24A); Receiver operating characteristic (ROC) curves and the associated Area Under Curves (AUCs) of the prediction of Stage I HCC versus liver cirrhosis in the validation cohort with 89.9% accuracy (FIG. 24B); table showing prediction results between Stage I HCC and liver cirrhosis in a validation cohort (FIG. 24C); table showing prediction results on progression from liver cirrhosis to stage 1HCC with high sensitivity (89.5%) and specificity (98%) (FIG. 24D).



FIG. 25A illustrates unsupervised hierarchical clustering of top 1000 methylation markers differentially methylated in DNA in HCC and LUNC primary tissues versus normal blood.



FIG. 25B shows unsupervised hierarchical clustering of the top 1000 methylation markers differentially methylated between HCC and LUNC tissue DNA. Each column represents an individual patient and each row represents a CpG marker.



FIG. 25C shows global view of supervised hierarchical clustering of all 888 MCBs in the entire cfDNA dataset.



FIG. 26 illustrates Boxplots showing the features of MCBs in cohorts. Top plot: Mean values and deviations of Lasso MCBs in each one versus rest comparison.



FIG. 27 illustrates methylation values correlated with treatment outcomes in HCC and LUNC patients with serial plasma samples. Summary graphs of change in methylation value comparing patients after surgery, with clinical response (Partial Remission (PR) or Stable Disease (SD), or with disease progression/recurrent (PD).



FIG. 28A shows dynamic monitoring of treatment outcomes using the total methylation copy numbers of an MCB in LUNC patients.



FIG. 28B shows dynamic monitoring of treatment outcomes with the methylation value of an MCB in LUNC patients. PD, progressive disease; PR partial response; SD, stable disease; chemo, chemotherapy.



FIG. 29 illustrates dynamic monitoring of treatment outcomes using the total methylation copy numbers of an MCB and CEA in HCC patients.



FIG. 30 shows dynamic monitoring of treatment outcomes with the methylation rate of an MCB in HCC patients. Dates of treatments are indicated in the figure. PD, progressive disease; PR partial response; SD, stable disease; chemo, chemotherapy, TACE, trans-catheter arterial chemoembolization.



FIG. 31A-FIG. 31B illustrate workflow chart described in Example 5. FIG. 31A illustrates an exemplary workflow for building the diagnostic model, prognostic model, and generating the subtype based ctDNA methylation. FIG. 31B shows the enrollment and outcomes of the prospective screening cohort study.



FIG. 32A-FIG. 32H illustrate cfDNA methylation analysis for diagnosis of CRC. FIG. 32A: exemplary workflow for building the diagnostic models. FIG. 32B: Unsupervised hierarchical clustering of methylation markers differentially methylated between cancer (CRC) and normal in the training and the validation (FIG. 32C) testing cohort. Each row represents an individual patient and each column represents a CpG marker. FIG. 32D: Receiver operating characteristic (ROC) curves and the associated Area Under Curves (AUCs) of the diagnostic prediction model (cd-score) using cfDNA methylation analysis in the training and the validation (FIG. 32E) testing cohort. FIG. 32F: ROC curves and corresponding Area Under the Curve (AUCs) of cd-score and CEA for CRC diagnosis. FIG. 32G: Confusion matrices built from diagnostic model prediction in the training and the validation (H) testing cohort.



FIG. 33A-FIG. 33E illustrate prognostic prediction in CRC survival based on cfDNA methylation profiling. FIG. 33A: an exemplary workflow for building the prognostic models. FIG. 33B: Overall survival curves of CRC patients with low or high risk of death, according to the combined prognosis score (cp-score) in the training testing cohort. FIG. 33C: Overall survival curves of HCC patients with low or high risk of death, according to the combined prognosis score (cp-score) in the validation testing cohort. The ROC and corresponding AUCs for 12 months survival predicted by cp-score, Primary tumor location, TNM stage, CEA status and combined all in the training (FIG. 33D) and validation (FIG. 33E) testing cohort.



FIG. 34A illustrates a nomogram for predicting one year overall survival of CRC patients using cp-score and other clinical factors.



FIG. 34B illustrates a calibration plot of nomogram in external validation.



FIG. 35A-FIG. 35E cfDNA methylation subtyping analysis in 801 patients with CRC. FIG. 35A: A schematic diagram shown that the core algorithm utilized in the sample clustering. FIG. 35B: Iteratively unsupervised clustering of cfDNA methylation markers identified two subtypes/clusters in training data. Clinical and molecular features are indicated by the annotation bars above the heatmap. Patients without such information were colored in white. Mutation status was defined by the mutation detected in one of the following genes: BRAF, KRAS, NRAS and PIK3CA. FIG. 35C: Silhouette analysis of the clusters in the last iteration. FIG. 35D: Predicted subtypes/clusters of validation using the 45 makers. FIG. 35E: upper panel: overall survival for each of the cfDNA methylation in each subtypes. (log rank test p<0.05). lower panel: proportion of III-IV stage CRC patients in two subtypes (Chi-squared test, **P<0.01, *P<0.05. left, training cohort; right, validation cohort).



FIG. 36A-FIG. 36B presents a list of Methylation Correlated Blocks (MCBs) used for cd-score generation. FIG. 36A: MCBs markers selected by muti-class LASSO. FIG. 36B: Diagnostic marker selection: LASSO-based feature selection identified 13 markers and Random Forest-based feature selection identified 22 markers for discriminating cancer versus normal. There were 9 overlapping markers between these two methods.



FIG. 37A-FIG. 37F show cfDNA methylation analysis could predict tumor burden, staging, and treatment response using a cd-score in CRC patients. FIG. 37A: cfDNA methylation analysis cd-score in patients with and without detectable tumor burden; FIG. 37B: cd-score of patients with stage I/II and stage III/IV disease; FIG. 37C: cd-score in patients with primary tumor location on left or on right; FIG. 37D: CEA in patients with stage VII and stage III/IV CRC; FIG. 37E: cd-score in patients before treatment, after surgery, and with tumor recurrence; FIG. 37F: CEA in CRC patients before treatment, after surgery, and with tumor recurrence; Recurrence was defined as tumor initially disappeared after treatment/surgery but recurred after a defined period.



FIG. 38A-FIG. 38C illustrate comparison of subtype markers, diagnosis markers and prognosis markers. FIG. 38A: Venn diagram shows the intersects of the three marker lists. Patients in cluster 2 had higher cpscores than those in cluster 1 from the both training cohort (FIG. 38B) and validation cohort (FIG. 38C).



FIG. 39 illustrates patient treatment monitoring with marker methylation level. Dynamic monitoring of treatment outcomes with the methylation value of CpG site cg10673833 (upper panel) and CEA (lower panel) in CRC patients #1-6. Dates of treatments are indicated in the figure. PD, progressive disease; PR partial response; SD, stable disease; chemo, chemotherapy.



FIG. 40A-FIG. 40B illustrates methylation values correlated with treatment outcomes in CRC patients with serial plasma samples. FIG. 40A: Summary graphs of change in methylation value comparing patients after surgery, with clinical response (Partial Remission (PR) or Stable Disease (SD), or with disease progression/recurrent (PD). FIG. 40B: Methylation value trends in individual patients after complete surgical resection, with treatment response, and with disease progression. Delta methylation rate denotes the methylation value difference before treatment and after treatment. PRE: pre-treatment; POST: after-treatment.



FIG. 41 illustrates the methylation status of CpG marker cg00456086.



FIG. 42 illustrates the methylation status of biomarker 3-49757316, 8-27183116, 8-141607252, 17-29297711, and 3-49757306.



FIG. 43 illustrates the methylation status of biomarker 19-43979341, 8-141607236, 5-176829755, 18-13382140, and 15-65341965.



FIG. 44 illustrates the methylation status of biomarker 15-91129457, 2-1625431, 6-151373292, 6-151373294, and 20-25027093.



FIG. 45 illustrates the methylation status of biomarker 6-14284198, 10-4049295, 19-59023222, 1-184197132, and 2-131004117.



FIG. 46 illustrates the methylation status of biomarker 3-13152305, 17-29297770, 8-27183316, 5-176829740, and 19-41316693.



FIG. 47 illustrates the methylation status of biomarker 18-43830649, 15-65341957, 20-44539531, 7-30265625, and 2-131129567.



FIG. 48 illustrates the methylation status of biomarker 2-8995417, 12-10782319, 20-25027033, 6-151373256, and 8-86100970.



FIG. 49 illustrates the methylation status of biomarker 9-4839459, 17-41221574, 1-153926715, 20-25027044, and 20-20177325.



FIG. 50 illustrates the methylation status of biomarker 176829665, 3-13152273, 8-27183348, 3-49757302, and 19-41316697.



FIG. 51 illustrates the methylation status of biomarker 8-61821442, 20-44539525, 10-102883105, 11-65849129, and 5-176829639.



FIG. 52 illustrates the methylation status of biomarker 2-1625443, 20-25027085, 11-69420728, 1-229234865, and 6-13408877.



FIG. 53 illustrates the methylation status of biomarker 22-50643735, 6-151373308, 1-232119750, 8-134361508, and 6-13408858.





DETAILED DESCRIPTION OF THE DISCLOSURE

Cancer is characterized by an abnormal growth of a cell caused by one or more mutations or modifications of a gene leading to dysregulated balance of cell proliferation and cell death. DNA methylation silences expression of tumor suppression genes, and presents itself as one of the first neoplastic changes. Methylation patterns found in neoplastic tissue and plasma demonstrate homogeneity, and in some instances are utilized as a sensitive diagnostic marker. For example, cMethDNA assay has been shown in one study to be about 91% sensitive and about 96% specific when used to diagnose metastatic breast cancer. In another study, circulating tumor DNA (ctDNA) was about 87.2% sensitive and about 99.2% specific when it was used to identify KRAS gene mutation in a large cohort of patients with metastatic colon cancer (Bettegowda et al., Detection of Circulating Tumor DNA in Early- and Late-Stage Human Malignancies. Sci. Transl. Med, 6(224):ra24. 2014). The same study further demonstrated that ctDNA is detectable in >75% of patients with advanced pancreatic, ovarian, colorectal, bladder, gastroesophageal, breast, melanoma, hepatocellular, and head and neck cancers (Bettegowda et al).


Additional studies have demonstrated that CpG methylation pattern correlates with neoplastic progression. For example, in one study of breast cancer methylation patterns, P16 hypermethylation has been found to correlate with early stage breast cancer, while TIMP3 promoter hypermethylation has been correlated with late stage breast cancer. In addition, BMP6, CST6 and TIMP3 promoter hypermethylation have been shown to associate with metastasis into lymph nodes in breast cancer.


In some embodiments, DNA methylation profiling provides higher clinical sensitivity and dynamic range compared to somatic mutation analysis for cancer detection. In other instances, altered DNA methylation signature has been shown to correlate with the prognosis of treatment response for certain cancers. For example, one study illustrated that in a group of patients with advanced rectal cancer, ten differentially methylated regions were used to predict patients' prognosis. Likewise, RASSF1A DNA methylation measurement in serum was used to predict a poor outcome in patients undergoing adjuvant therapy in breast cancer patients in a different study. In addition, SRBC gene hypermethylation was associated with poor outcome in patients with colorectal cancer treated with oxaliplatin in a different study. Another study has demonstrated that ESR1 gene methylation correlates with clinical response in breast cancer patients receiving tamoxifen. Additionally, ARM gene promoter hypermethylation was shown to be a predictor of long-term survival in breast cancer patients not treated with tamoxifen.


In some embodiments, disclosed herein include methods, probes, and kits for diagnosing the presence of cancer and/or a cancer type. In some instances, described herein is a method of profiling the methylation status of a set of CpG markers (or cg markers). In other instances, described herein is a method for selecting a patient based on the methylation status of a set of CpG markers (or cg markers) for treatment.


Methods of Use

DNA methylation is the attachment of a methyl group at the C5-position of the nucleotide base cytosine and the N6-position of adenine. Methylation of adenine primarily occurs in prokaryotes, while methylation of cytosine occurs in both prokaryotes and eukaryotes. In some instances, methylation of cytosine occurs in the CpG dinucleotides motif. In other instances, cytosine methylation occurs in, for example CHG and CHH motifs, where H is adenine, cytosine or thymine. In some instances, one or more CpG dinucleotide motif or CpG site forms a CpG island, a short DNA sequence rich in CpG dinucleotide. In some instances, a CpG island is present in the 5′ region of about one half of all human genes. CpG islands are typically, but not always, between about 0.2 to about 1 kb in length. Cytosine methylation further comprises 5-methylcytosine (5-mCyt) and 5-hydroxymethylcytosine.


The CpG (cytosine-phosphate-guanine) or CG motif refers to regions of a DNA molecule where a cytosine nucleotide occurs next to a guanine nucleotide in the linear strand. In some instances, a cytosine in a CpG dinucleotide is methylated to form 5-methylcytosine. In some instances, a cytosine in a CpG dinucleotide is methylated to form 5-hydroxymethylcytosine.


In some instances, one or more DNA regions are hypermethylated. In such cases, hypermethylation refers to an increase in methylation event of a region relative to a reference region. In some cases, hypermethylation is observed in one or more cancer types, and is useful, for example, as a diagnostic marker and/or a prognostic marker.


In some instances, one or more DNA regions are hypomethylated. In some cases, hypomethylation refers to a loss of the methyl group in the 5-methylcytosine nucleotide in a first region relative to a reference region. In some cases, hypomethylation is observed in one or cancer types, and is useful, for example, as a diagnostic marker and/or a prognostic marker.


In some embodiments, disclosed herein are CpG methylation markers for diagnosis of a cancer in a subject. In some instances, also disclosed herein is a method of selecting a subject suspected of having cancer for treatment. In some instances, the method comprises (a) contacting treated DNA with at least one probe from a probe panel to generate an amplified product, wherein the at least one probe hybridizes under high stringency condition to a target sequence of a cg marker selected from Table 1, Table 2, Table 7, Table 8, or Table 13, and wherein the treated DNA is processed from a biological sample obtained from the subject; (b) analyzing the amplified product to generate a methylation profile of the cg marker; (c) comparing the methylation profile to a reference model relating methylation profiles of cg markers from Tables 1, 2, 7, 8, and 13 to a set of cancers; (d) based on the comparison of step c), determining: (i) whether the subject has cancer; and (ii) which cancer type the subject has; and (e) administering an effective amount of a therapeutic agent to the subject if the subject is determined to have cancer and the cancer type is determined.


In some instances, the method comprises (a) contacting treated DNA with the probe panel to generate amplified products, wherein each probe of the probe panel hybridizes under high stringency condition to a target sequence of a cg marker selected from Table 1, Table 2, Table 7, or Table 8; (b) analyzing the amplified products to generate a methylation profile of the cg markers targeted by the probe panel; (c) comparing the methylation profile to the reference model relating methylation profiles of cg markers from Tables 1, 2, 7, and 8 to a set of cancers; (d) evaluating an output from the model to determine: (i) whether the subject has cancer; and (ii) which cancer type the subject has; and (e) administering an effective amount of a therapeutic agent to the subject if the subject is determined to have cancer and the cancer type is determined.


In some cases, the biological sample is treated with a deaminating agent to generate the treated DNA.


In some cases, the at least one probe from the probe panel is a padlock probe.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 1.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 2.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 4.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 5.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 7.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 8.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 13.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg19516279, cg06100368, cg25945732, cg19155007, cg17952661, cg04072843, cg01250961, cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg01237565, cg16561543, cg13771313, cg13771313, cg08169020, cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg09095222, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, cg13169641, cg25352342, cg09921682, cg02504622, cg17373759, cg06547203, cg06826710, cg00902147, cg17609887, cg15721142, cg08116711, cg00736681, cg18834029, cg06969479, cg24630516, cg16901821, cg20349803, cg23610994, cg19313373, cg16508600, cg24096323, cg24746106, cg12288267, cg10430690, cg24408776, cg05630192, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, cg09921682, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, cg06405341, cg08557188, cg00690392, cg03421440, cg07077277, cg00456086, or cg20702527. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg19516279, cg06100368, cg20349803, cg23610994, cg19313373, cg16508600, or cg24096323. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg25945732, cg19155007, cg17952661, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, or cg00456086. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg04072843, cg01250961, cg24746106, cg12288267, or cg10430690. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, or cg09921682. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg06547203, cg06826710, cg00902147, cg17609887, or cg15721142. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg01237565, cg16561543, or cg08116711. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg13771313, cg13771313, or cg08169020. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, or cg13169641. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg09095222. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg00736681 or cg18834029. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg06969479, cg24630516, or cg16901821. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg24408776. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg05630192. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg06405341. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg08557188, cg00690392, cg03421440, or cg07077277. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg10673833, cg10493436, cg10428836, cg27284288, cg16959747, cg17494199, cg23678254, cg24067911, or cg25459300.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a gene selected from a gene panel consisting of BMPR1A, PSD, ARHGAP25, KLF3, PLAC8, ATXN1, Chromosome 6:170, Chromosome 6:3, ATAD2, and Chromosome 8:20.


some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a gene selected from a gene panel consisting of MYO1G, ADAMTS4, BMPR1A, CD6, RBP5, Chr 13:10, LGAP5, ATXN1, and Chr 8:20.


In some instances, the reference model comprises methylation profiles of cg markers from Tables 1 and 2 generated from samples of known cancer types. In some cases, the reference model further comprises methylation profiles of cg markers from Tables 1 and 2 generated from normal samples. In some cases, the reference model comprises methylation profiles of cg markers from Tables 1 and 2 generated from tissue samples.


In some instances, the reference model comprises methylation profiles of cg markers from Tables 7 and 8 generated from samples of known cancer types. In some cases, the reference model further comprises methylation profiles of cg markers from Tables 7 and 8 generated from normal samples. In some cases, the reference model comprises methylation profiles of cg markers from Tables 7 and 8 generated from tissue samples.


In some instances, the reference model comprises methylation profiles of cg markers from Table 13 generated from samples of known cancer types. In some cases, the reference model further comprises methylation profiles of cg markers from Table 13 generated from normal samples. In some cases, the reference model comprises methylation profiles of cg markers from Table 13 generated from tissue samples.


In some cases, the reference model is developed using an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model.


In some embodiments, the analyzing described above comprises quantitatively detecting the methylation status of the amplified product. In some cases, the detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR. In some cases, the detection comprises a real-time quantitative probe-based PCR. In other cases, the detection comprises a digital probe-based PCR, optionally, a digital droplet PCR.


In some embodiments, the treatment comprises a chemotherapeutic agent or an agent for a targeted therapy. Exemplary chemotherapeutic agents include, but are not limited to, cisplatin, doxorubicin, fluoropyrimidine, gemcitabine, irinotecan, mitoxantrone, oxaliplatin, thalidomide, or a combination thereof. In some cases, the chemotherapeutic agent comprises cisplatin, doxorubicin, fluoropyrimidine, gemcitabine, irinotecan, mitoxantrone, oxaliplatin, thalidomide, or a combination thereof. In some instances, the treatment comprises an agent for a targeted therapy. In additional instances, the treatment comprises surgery.


In some instances, the biological sample is a blood sample, an urine sample, a saliva sample, a sweat sample, or a tear sample. In some cases, the biological sample is a blood sample or an urine sample. In some cases, the biological sample is a tissue biopsy sample. In some cases, the biological sample is a cell-free DNA sample. In some cases, the biological sample comprises circulating tumor cells.


In some embodiments, also disclosed herein is a method of detecting the methylation status of a set of cg markers. In some embodiments, the method comprises (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of a cg marker from Table 1, Table 2, Table 7, Table 8, Table 13, Table 14, or Table 20; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR.


In some embodiments, the method of detecting the methylation status of a set of cg markers comprises (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of a cg marker from Table 1 or Table 2; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR.


In some cases, the detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR. In some cases, the detection comprises a real-time quantitative probe-based PCR. In other cases, the detection comprises a digital probe-based PCR, optionally, a digital droplet PCR.


In some cases, the at least one probe from the probe panel is a padlock probe.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 1.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 2.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 4.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 5.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 7.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 8.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 13.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 14.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from Table 20.


In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg19516279, cg06100368, cg25945732, cg19155007, cg17952661, cg04072843, cg01250961, cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg01237565, cg16561543, cg13771313, cg13771313, cg08169020, cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg09095222, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, cg13169641, cg25352342, cg09921682, cg02504622, cg17373759, cg06547203, cg06826710, cg00902147, cg17609887, cg15721142, cg08116711, cg00736681, cg18834029, cg06969479, cg24630516, cg16901821, cg20349803, cg23610994, cg19313373, cg16508600, cg24096323, cg24746106, cg12288267, cg10430690, cg24408776, cg05630192, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, cg09921682, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, cg06405341, cg08557188, cg00690392, cg03421440, cg07077277, cg00456086, or cg20702527. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg19516279, cg06100368, cg20349803, cg23610994, cg19313373, cg16508600, or cg24096323. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg25945732, cg19155007, cg17952661, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, or cg00456086. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg04072843, cg01250961, cg24746106, cg12288267, or cg10430690. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, or cg09921682. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg06547203, cg06826710, cg00902147, cg17609887, or cg15721142. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg01237565, cg16561543, or cg08116711. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg13771313, cg13771313, or cg08169020. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, or cg13169641. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg09095222. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg00736681 or cg18834029. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg06969479, cg24630516, or cg16901821. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg24408776. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg05630192. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker cg06405341. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg08557188, cg00690392, cg03421440, or cg07077277. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg10673833, cg10493436, cg10428836, cg27284288, cg16959747, cg17494199, cg23678254, cg24067911, or cg25459300. In some instances, the at least one probe hybridizes under high stringency conditions to a target sequence of a cg marker selected from cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195.


In some embodiments, the methylation status or pattern of a set of cg markers is further used to determine whether the subject has a cancer. For example, the methylation status or pattern of at least one cg marker selected from cg19516279, cg06100368, cg20349803, cg23610994, cg19313373, cg16508600, and cg24096323 is used to determine whether the subject has a breast cancer. In some cases, the subject is determined to have a breast cancer if at least one of the cg markers cg19516279 and cg06100368 is hypermethylated. In other cases, the subject is determined to have a breast cancer if at least one of the cg markers cg20349803, cg23610994, cg19313373, cg16508600, and cg24096323 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg25945732, cg19155007, cg17952661, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg00456086, and cg26859666 is used to determine whether the subject has a liver cancer. In some cases, the subject is determined to have a liver cancer if at least one of the cg markers cg25945732, cg19155007, or cg17952661 is hypermethylated. In other cases, the subject is determined to have a liver cancer if at least one of the cg markers cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, or cg00456086 is hypomethylated.


In some instances, the methylation status or pattern of at least one marker selected from 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, 5-176829639, 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858 is used to determine whether the subject has a liver cancer. In some instances, the subject is determined to have a liver cancer if at least one of the markers 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, or 5-176829639 is hypermethylated. In some cases, the subject is determined to have a liver cancer if at least one of the markers 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg04072843, cg01250961, cg24746106, cg12288267, and cg10430690 is used to determine whether the subject has an ovarian cancer. In some cases, the subject is determined to have an ovarian cancer if at least one of the cg markers cg04072843 and cg01250961 is hypermethylated. In other cases, the subject is determined to have an ovarian cancer if at least one of the cg markers cg24746106, cg12288267, and cg10430690 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, and cg09921682 is used to determine whether the subject has a colorectal cancer. In some cases, the subject is determined to have a colorectal cancer if at least one of the cg markers cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, or cg00846300 is hypermethylated. In other cases, the subject is determined to have a colorectal cancer if at least one of the cg markers cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, or cg09921682 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg10673833, cg10493436, cg10428836, cg27284288, cg16959747, cg17494199, cg23678254, cg24067911, or cg25459300 is used to determine whether the subject has a colorectal cancer.


In some instances, the methylation status or pattern of at least one cg marker selected from cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195 is used to determine whether the subject has a colorectal cancer. In some cases, the subject is determined to have a colorectal cancer if at least one of the cg markers cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195 is hypermethylated. In other cases, the subject is determined to have a colorectal cancer if at least one of the cg markers cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, or cg16391792 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg06547203, cg06826710, cg00902147, cg17609887, and cg15721142 is used to determine whether the subject has a prostate cancer. In some cases, the subject is determined to have a prostate cancer if at least one of the cg markers cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, or cg26149167 is hypermethylated. In other cases, the subject is determined to have a prostate cancer if at least one of the cg markers cg06547203, cg06826710, cg00902147, cg17609887, or cg15721142 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg01237565, cg16561543, and cg08116711 is used to determine whether the subject has a pancreatic cancer. In some cases, the subject is determined to have a pancreatic cancer if at least one of the cg markers cg01237565 or cg16561543 is hypermethylated. In other cases, the subject is determined to have a pancreatic cancer if cg marker cg08116711 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg13771313, cg13771313, and cg08169020 is used to determine whether the subject has acute myeloid leukemia.


In some instances, the methylation status or pattern of at least one cg marker selected from cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, and cg13169641 is used to determine whether the subject has cervical cancer. In some instances, the subject is determined to have cervical cancer if at least one of the cg markers cg08169020, cg21153697, cg07326648, cg14309384, or cg20923716 is hypermethylated. In other instances, the subject is determined to have cervical cancer if at least one of the cg markers cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, or cg13169641 is hypomethylated.


In some instances, the methylation status or pattern of one cg marker cg09095222 is used to determine whether the subject has sarcoma. In some cases, the subject is determined to have sarcoma if at least cg marker cg09095222 is hypermethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg00736681 and cg18834029 is used to determine whether the subject has stomach cancer. In some cases, the subject is determined to have stomach cancer if at least one of the cg markers cg00736681 or cg18834029 is hypomethylated.


In some instances, the methylation status or pattern of at least one cg marker selected from cg06969479, cg24630516, and cg16901821 is used to determine if the subject has thyroid cancer. In some cases, the subject is determined to have thyroid cancer if at least one of the cg markers cg06969479, cg24630516, or cg16901821 is hypomethylated.


In some instances, the methylation status or pattern of cg marker cg05630192 is used to determine whether the subject has mesothelioma. In some cases, the subject is determined to have mesothelioma if cg marker cg05630192 is hypomethylated.


In some instances, the methylation status or pattern of cg marker cg06405341 is used to determine whether the subject has glioblastoma.


In some instances, the methylation status or pattern of at least one cg marker selected from cg08557188, cg00690392, cg03421440, and cg07077277 is used to determine whether the subject has lung cancer. In some cases, the subject is determined to have lung cancer if at least one of the cg markers cg08557188, cg00690392, cg03421440, or cg07077277 is hypomethylated.


In some embodiments, the methylation status or pattern of one or more genes selected from MYO1G, ADAMTS4, BMPR1A, CD6, RBP5, Chr 13:10, LGAP5, ATX1N1, Chr 8:20, or a combination thereof is further used to determine whether the subject has a cancer. In some instances, the methylation status of one or more genes selected from MYO1G, ADAMTS4, BMPR1A, CD6, RBP5, Chr 13:10, LGAP5, ATXN1, Chr 8:20, or a combination thereof is further used to determine whether the subject has a colorectal cancer.


In some embodiments, also disclosed herein is a method of determining the prognosis of a cancer in a subject in need thereof based on the methylation status of a set of cg markers. In some embodiments, the method comprises (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of cg10673833 or cg25462303; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR. In some instances, the cancer is a colorectal cancer. In some cases, the method of determining the prognosis of a colorectal cancer a subject in need thereof comprises (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of cg10673833 or cg25462303; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR. In some cases, the methylation status of cg10673833, cg25462303, or a combination thereof is used for monitoring a treatment progression of a subject in need thereof. In additional cases, the methylation status of cg10673833, cg25462303, or a combination thereof is used as an early predictor of developing a cancer (e.g., CRC) of a subject in need thereof.


In some embodiments, additional disclosed herein is a method of determining the prognosis of a cancer in a subject in need thereof, comprising (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR. In some instances, the cancer is a colorectal cancer. In some cases, the method of determining the prognosis of a colorectal cancer a subject in need thereof comprises (a) processing a biological sample obtained from a subject with a deaminating agent to generate treated DNA comprising deaminated nucleotides; (b) contacting the treated DNA with at least one probe that hybridizes under high stringency condition to a target sequence of cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195; and (c) quantitatively detecting the methylation status of the cg marker, wherein said detection comprises a real-time quantitative probe-based PCR or a digital probe-based PCR.


In some instances, if one or more cg markers cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195 are hypermethylated, the prognosis of the cancer is correlated with an advanced tumor stage and poor survival.


In some cases, the methylation status or pattern of cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, cg25754195, or a combination thereof is used for monitoring a treatment progression of a subject in need thereof. In additional cases, the methylation status of cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, cg25754195, or a combination thereof is used as an early predictor of developing a cancer (e.g., CRC) of a subject in need thereof.


In some embodiments, the methylation status or pattern of one or more biomarkers selected from 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, 5-176829639, 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858 is used for determining the prognosis of a subject having a liver cancer. In additional cases, the methylation status or pattern of one or more biomarkers selected from 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, 5-176829639, 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858 is used for monitoring a treatment progression of a subject having a liver cancer.


In some instances, the biological sample is a blood sample, a urine sample, a saliva sample, a sweat sample, or a tear sample. In some cases, the biological sample is a blood sample or a urine sample. In some cases, the biological sample is a tissue biopsy sample. In some cases, the biological sample is a cell-free DNA sample. In some cases, the biological sample comprises circulating tumor cells.


Detection Methods

In some embodiments, a number of methods are utilized to measure, detect, determine, identify, and characterize the methylation status/level of a gene or a biomarker (e.g., CpG island-containing region/fragment) in identifying a subject as having liver cancer, determining the liver cancer subtype, the prognosis of a subject having liver cancer, and the progression or regression of liver cancer in subject in the presence of a therapeutic agent.


In some instances, the methylation profile is generated from a biological sample isolated from an individual. In some embodiments, the biological sample is a biopsy. In some instances, the biological sample is a tissue sample. In some instances, the biological sample is a tissue biopsy sample. In some instances, the biological sample is a blood sample. In other instances, the biological sample is a cell-free biological sample. In other instances, the biological sample is a circulating tumor DNA sample. In one embodiment, the biological sample is a cell free biological sample containing circulating tumor DNA.


In some embodiments, a biomarker (or an epigenetic marker) is obtained from a liquid sample. In some embodiments, the liquid sample comprises blood and other liquid samples of biological origin (including, but not limited to, peripheral blood, sera, plasma, ascites, urine, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, broncheoalveolar lavage fluid, semen, prostatic fluid, cowper's fluid or pre-ejaculatory fluid, female ejaculate, sweat, tears, cyst fluid, pleural and peritoneal fluid, pericardial fluid, ascites, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vomit, vaginal secretions/flushing, synovial fluid, mucosal secretion, stool water, pancreatic juice, lavage fluids from sinus cavities, bronchopulmonary aspirates, blastocyl cavity fluid, or umbilical cord blood. In some embodiments, the biological fluid is blood, a blood derivative or a blood fraction, e.g., serum or plasma. In a specific embodiment, a sample comprises a blood sample. In another embodiment, a serum sample is used. In another embodiment, a sample comprises urine. In some embodiments, the liquid sample also encompasses a sample that has been manipulated in any way after their procurement, such as by centrifugation, filtration, precipitation, dialysis, chromatography, treatment with reagents, washed, or enriched for certain cell populations.


In some embodiments, a biomarker (or an epigenetic marker) is obtained from a tissue sample. In some instances, a tissue corresponds to any cell(s). Different types of tissue correspond to different types of cells (e.g., liver, lung, blood, connective tissue, and the like), but also healthy cells vs. tumor cells or to tumor cells at various stages of neoplasia, or to displaced malignant tumor cells. In some embodiments, a tissue sample further encompasses a clinical sample, and also includes cells in culture, cell supernatants, organs, and the like. Samples also comprise fresh-frozen and/or formalin-fixed, paraffin-embedded tissue blocks, such as blocks prepared from clinical or pathological biopsies, prepared for pathological analysis or study by immunohistochemistry.


In some embodiments, a biomarker (or an epigenetic marker) is methylated or unmethylated in a normal sample (e.g., normal or control tissue without disease, or normal or control body fluid, stool, blood, serum, amniotic fluid), most importantly in healthy stool, blood, serum, amniotic fluid or other body fluid. In other embodiments, a biomarker (or an epigenetic marker) is hypomethylated or hypermethylated in a sample from a patient having or at risk of a disease (e.g., one or more indications described herein); for example, at a decreased or increased (respectively) methylation frequency of at least about 50%, at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or about 100% in comparison to a normal sample. In one embodiment, a sample is also hypomethylated or hypermethylated in comparison to a previously obtained sample analysis of the same patient having or at risk of a disease (e.g., one or more indications described herein), particularly to compare progression of a disease.


In some embodiments, a methylome comprises a set of epigenetic markers or biomarkers, such as a biomarker described above. In some instances, a methylome that corresponds to the methylome of a tumor of an organism (e.g., a human) is classified as a tumor methylome. In some cases, a tumor methylome is determined using tumor tissue or cell-free (or protein-free) tumor DNA in a biological sample. Other examples of methylomes of interest include the methylomes of organs that contribute DNA into a bodily fluid (e.g. methylomes of tissue such as brain, breast, lung, the prostrate and the kidneys, plasma, etc.).


In some embodiments, a plasma methylome is the methylome determined from the plasma or serum of an animal (e.g., a human). In some instances, the plasma methylome is an example of a cell-free or protein-free methylome since plasma and serum include cell-free DNA. The plasma methylome is also an example of a mixed methylome since it is a mixture of tumor and other methylomes of interest. In some instances, the urine methylome is determined from the urine sample of a subject. In some cases, a cellular methylome corresponds to the methylome determined from cells (e.g., blood cells) of the patient. The methylome of the blood cells is called the blood cell methylome (or blood methylome).


In some embodiments, DNA (e.g., genomic DNA such as extracted genomic DNA or treated genomic DNA) is isolated by any means standard in the art, including the use of commercially available kits. Briefly, wherein the DNA of interest is encapsulated in by a cellular membrane the biological sample is disrupted and lysed by enzymatic, chemical or mechanical means. In some cases, the DNA solution is then cleared of proteins and other contaminants e.g. by digestion with proteinase K. The DNA is then recovered from the solution. In such cases, this is carried out by means of a variety of methods including salting out, organic extraction or binding of the DNA to a solid phase support. In some instances, the choice of method is affected by several factors including time, expense and required quantity of DNA.


Wherein the sample DNA is not enclosed in a membrane (e.g. circulating DNA from a cell free sample such as blood or urine) methods standard in the art for the isolation and/or purification of DNA are optionally employed (See, for example, Bettegowda et al. Detection of Circulating Tumor DNA in Early- and Late-Stage Human Malignancies. Sci. Transl. Med, 6(224): ra24. 2014). Such methods include the use of a protein degenerating reagent e.g. chaotropic salt e.g. guanidine hydrochloride or urea; or a detergent e.g. sodium dodecyl sulphate (SDS), cyanogen bromide. Alternative methods include but are not limited to ethanol precipitation or propanol precipitation, vacuum concentration amongst others by means of a centrifuge. In some cases, the person skilled in the art also make use of devices such as filter devices e.g. ultrafiltration, silica surfaces or membranes, magnetic particles, polystyrol particles, polystyrol surfaces, positively charged surfaces, and positively charged membranes, charged membranes, charged surfaces, charged switch membranes, and charged switched surfaces.


In some instances, once the nucleic acids have been extracted, methylation analysis is carried out by any means known in the art. A variety of methylation analysis procedures is known in the art and may be used to practice the methods disclosed herein. These assays allow for determination of the methylation state of one or a plurality of CpG sites within a tissue sample. In addition, these methods may be used for absolute or relative quantification of methylated nucleic acids. Such methylation assays involve, among other techniques, two major steps. The first step is a methylation specific reaction or separation, such as (i) bisulfite treatment, (ii) methylation specific binding, or (iii) methylation specific restriction enzymes. The second major step involves (i) amplification and detection, or (ii) direct detection, by a variety of methods such as (a) PCR (sequence-specific amplification) such as Taqman(R), (b) DNA sequencing of untreated and bisulfite-treated DNA, (c) sequencing by ligation of dye-modified probes (including cyclic ligation and cleavage), (d) pyrosequencing, (e) single-molecule sequencing, (f) mass spectroscopy, or (g) Southern blot analysis.


Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., the method described by Sadri and Hornsby (1996, Nucl. Acids Res. 24:5058-5059), or COBRA (Combined Bisulfite Restriction Analysis) (Xiong and Laird, 1997, Nucleic Acids Res. 25:2532-2534). COBRA analysis is a quantitative methylation assay useful for determining DNA methylation levels at specific gene loci in small amounts of genomic DNA. Briefly, restriction enzyme digestion is used to reveal methylation-dependent sequence differences in PCR products of sodium bisulfite-treated DNA. Methylation-dependent sequence differences are first introduced into the genomic DNA by standard bisulfite treatment according to the procedure described by Frommer et al. (Frommer et al, 1992, Proc. Nat. Acad. Sci. USA, 89, 1827-1831). PCR amplification of the bisulfite converted DNA is then performed using primers specific for the CpG sites of interest, followed by restriction endonuclease digestion, gel electrophoresis, and detection using specific, labeled hybridization probes. Methylation levels in the original DNA sample are represented by the relative amounts of digested and undigested PCR product in a linearly quantitative fashion across a wide spectrum of DNA methylation levels. In addition, this technique can be reliably applied to DNA obtained from micro-dissected paraffin-embedded tissue samples. Typical reagents (e.g., as might be found in a typical COBRA-based kit) for COBRA analysis may include, but are not limited to: PCR primers for specific gene (or methylation-altered DNA sequence or CpG island); restriction enzyme and appropriate buffer; gene-hybridization oligo; control hybridization oligo; kinase labeling kit for oligo probe; and radioactive nucleotides. Additionally, bisulfite conversion reagents may include: DNA denaturation buffer; sulfo nation buffer; DNA recovery reagents or kits (e.g., precipitation, ultrafiltration, affinity column); desulfonation buffer; and DNA recovery components.


In an embodiment, the methylation profile of selected CpG sites is determined using methylation-Specific PCR (MSP). MSP allows for assessing the methylation status of virtually any group of CpG sites within a CpG island, independent of the use of methylation-sensitive restriction enzymes (Herman et al, 1996, Proc. Nat. Acad. Sci. USA, 93, 9821-9826; U.S. Pat. Nos. 5,786,146, 6,017,704, 6,200,756, 6,265,171 (Herman and Baylin); U.S. Pat. Pub. No. 2010/0144836 (Van Engeland et al); which are hereby incorporated by reference in their entirety). Briefly, DNA is modified by a deaminating agent such as sodium bisulfite to convert unmethylated, but not methylated cytosines to uracil, and subsequently amplified with primers specific for methylated versus unmethylated DNA. In some instances, typical reagents (e.g., as might be found in a typical MSP-based kit) for MSP analysis include, but are not limited to: methylated and unmethylated PCR primers for specific gene (or methylation-altered DNA sequence or CpG island), optimized PCR buffers and deoxynucleotides, and specific probes. One may use quantitative multiplexed methylation specific PCR (QM-PCR), as described by Fackler et al. Fackler et al, 2004, Cancer Res. 64(13) 4442-4452; or Fackler et al, 2006, Clin. Cancer Res. 12(11 Pt 1) 3306-3310.


In an embodiment, the methylation profile of selected CpG sites is determined using MethyLight and/or Heavy Methyl Methods. The MethyLight and Heavy Methyl assays are a high-throughput quantitative methylation assay that utilizes fluorescence-based real-time PCR (Taq Man(R)) technology that requires no further manipulations after the PCR step (Eads, C. A. et al, 2000, Nucleic Acid Res. 28, e 32; Cottrell et al, 2007, J. Urology 177, 1753, U.S. Pat. No. 6,331,393 (Laird et al), the contents of which are hereby incorporated by reference in their entirety). Briefly, the MethyLight process begins with a mixed sample of genomic DNA that is converted, in a sodium bisulfite reaction, to a mixed pool of methylation-dependent sequence differences according to standard procedures (the bisulfite process converts unmethylated cytosine residues to uracil). Fluorescence-based PCR is then performed either in an “unbiased” (with primers that do not overlap known CpG methylation sites) PCR reaction, or in a “biased” (with PCR primers that overlap known CpG dinucleotides) reaction. In some cases, sequence discrimination occurs either at the level of the amplification process or at the level of the fluorescence detection process, or both. In some cases, the MethyLight assay is used as a quantitative test for methylation patterns in the genomic DNA sample, wherein sequence discrimination occurs at the level of probe hybridization. In this quantitative version, the PCR reaction provides for unbiased amplification in the presence of a fluorescent probe that overlaps a particular putative methylation site. An unbiased control for the amount of input DNA is provided by a reaction in which neither the primers, nor the probe overlie any CpG dinucleotides. Alternatively, a qualitative test for genomic methylation is achieved by probing of the biased PCR pool with either control oligonucleotides that do not “cover” known methylation sites (a fluorescence-based version of the “MSP” technique), or with oligonucleotides covering potential methylation sites. Typical reagents (e.g., as might be found in a typical MethyLight-based kit) for MethyLight analysis may include, but are not limited to: PCR primers for specific gene (or methylation-altered DNA sequence or CpG island); TaqMan(R) probes; optimized PCR buffers and deoxynucleotides; and Taq polymerase.


Quantitative MethyLight uses bisulfite to convert genomic DNA and the methylated sites are amplified using PCR with methylation independent primers. Detection probes specific for the methylated and unmethylated sites with two different fluorophores provides simultaneous quantitative measurement of the methylation. The Heavy Methyl technique begins with bisulfate conversion of DNA. Next specific blockers prevent the amplification of unmethylated DNA. Methylated genomic DNA does not bind the blockers and their sequences will be amplified. The amplified sequences are detected with a methylation specific probe. (Cottrell et al, 2004, Nuc. Acids Res. 32:e10, the contents of which is hereby incorporated by reference in its entirety).


The Ms-SNuPE technique is a quantitative method for assessing methylation differences at specific CpG sites based on bisulfite treatment of DNA, followed by single-nucleotide primer extension (Gonzalgo and Jones, 1997, Nucleic Acids Res. 25, 2529-2531). Briefly, genomic DNA is reacted with sodium bisulfite to convert unmethylated cytosine to uracil while leaving 5-methylcytosine unchanged. Amplification of the desired target sequence is then performed using PCR primers specific for bisulfite-converted DNA, and the resulting product is isolated and used as a template for methylation analysis at the CpG site(s) of interest. In some cases, small amounts of DNA are analyzed (e.g., micro-dissected pathology sections), and the method avoids utilization of restriction enzymes for determining the methylation status at CpG sites. Typical reagents (e.g., as is found in a typical Ms-SNuPE-based kit) for Ms-SNuPE analysis include, but are not limited to: PCR primers for specific gene (or methylation-altered DNA sequence or CpG island); optimized PCR buffers and deoxynucleotides; gel extraction kit; positive control primers; Ms-SNuPE primers for specific gene; reaction buffer (for the Ms-SNuPE reaction); and radioactive nucleotides. Additionally, bisulfite conversion reagents may include: DNA denaturation buffer; sulfonation buffer; DNA recovery regents or kit (e.g., precipitation, ultrafiltration, affinity column); desulfonation buffer; and DNA recovery components.


In another embodiment, the methylation status of selected CpG sites is determined using differential Binding-based Methylation Detection Methods. For identification of differentially methylated regions, one approach is to capture methylated DNA. This approach uses a protein, in which the methyl binding domain of MBD2 is fused to the Fc fragment of an antibody (MBD-FC) (Gebhard et al, 2006, Cancer Res. 66:6118-6128; and PCT Pub. No. WO 2006/056480 A2 (Relhi), the contents of which are hereby incorporated by reference in their entirety). This fusion protein has several advantages over conventional methylation specific antibodies. The MBD FC has a higher affinity to methylated DNA and it binds double stranded DNA. Most importantly the two proteins differ in the way they bind DNA. Methylation specific antibodies bind DNA stochastically, which means that only a binary answer can be obtained. The methyl binding domain of MBD-FC, on the other hand, binds DNA molecules regardless of their methylation status. The strength of this protein—DNA interaction is defined by the level of DNA methylation. After binding genomic DNA, eluate solutions of increasing salt concentrations can be used to fractionate non-methylated and methylated DNA allowing for a more controlled separation (Gebhard et al, 2006, Nucleic Acids Res. 34: e82). Consequently this method, called Methyl-CpG immunoprecipitation (MCIP), not only enriches, but also fractionates genomic DNA according to methylation level, which is particularly helpful when the unmethylated DNA fraction should be investigated as well.


In an alternative embodiment, a 5-methyl cytidine antibody to bind and precipitate methylated DNA. Antibodies are available from Abcam (Cambridge, Mass.), Diagenode (Sparta, N.J.) or Eurogentec (c/o AnaSpec, Fremont, Calif.). Once the methylated fragments have been separated they may be sequenced using microarray based techniques such as methylated CpG-island recovery assay (MIRA) or methylated DNA immunoprecipitation (MeDIP) (Pelizzola et al, 2008, Genome Res. 18, 1652-1659; O'Geen et al, 2006, BioTechniques 41(5), 577-580, Weber et al, 2005, Nat. Genet. 37, 853-862; Horak and Snyder, 2002, Methods Enzymol, 350, 469-83; Lieb, 2003, Methods Mol Biol, 224, 99-109). Another technique is methyl-CpG binding domain column/segregation of partly melted molecules (MBD/SPM, Shiraishi et al, 1999, Proc. Natl. Acad. Sci. USA 96(6):2913-2918).


In some embodiments, methods for detecting methylation include randomly shearing or randomly fragmenting the genomic DNA, cutting the DNA with a methylation-dependent or methylation-sensitive restriction enzyme and subsequently selectively identifying and/or analyzing the cut or uncut DNA. Selective identification can include, for example, separating cut and uncut DNA (e.g., by size) and quantifying a sequence of interest that was cut or, alternatively, that was not cut. See, e.g., U.S. Pat. No. 7,186,512. Alternatively, the method can encompass amplifying intact DNA after restriction enzyme digestion, thereby only amplifying DNA that was not cleaved by the restriction enzyme in the area amplified. See, e.g., U.S. Pat. Nos. 7,910,296; 7,901,880; and 7,459,274. In some embodiments, amplification can be performed using primers that are gene specific.


For example, there are methyl-sensitive enzymes that preferentially or substantially cleave or digest at their DNA recognition sequence if it is non-methylated. Thus, an unmethylated DNA sample is cut into smaller fragments than a methylated DNA sample. Similarly, a hypermethylated DNA sample is not cleaved. In contrast, there are methyl-sensitive enzymes that cleave at their DNA recognition sequence only if it is methylated. Methyl-sensitive enzymes that digest unmethylated DNA suitable for use in methods of the technology include, but are not limited to, Hpall, Hhal, Maell, BstUI and Acil. In some instances, an enzyme that is used is Hpall that cuts only the unmethylated sequence CCGG. In other instances, another enzyme that is used is Hhal that cuts only the unmethylated sequence GCGC. Both enzymes are available from New England BioLabs(R), Inc. Combinations of two or more methyl-sensitive enzymes that digest only unmethylated DNA are also used. Suitable enzymes that digest only methylated DNA include, but are not limited to, Dpnl, which only cuts at fully methylated 5′-GATC sequences, and McrBC, an endonuclease, which cuts DNA containing modified cytosines (5-methylcytosine or 5-hydroxymethylcytosine or N4-methylcytosine) and cuts at recognition site 5′ . . . PumC(N4o-3ooo) PumC . . . 3′ (New England BioLabs, Inc., Beverly, Mass.). Cleavage methods and procedures for selected restriction enzymes for cutting DNA at specific sites are well known to the skilled artisan. For example, many suppliers of restriction enzymes provide information on conditions and types of DNA sequences cut by specific restriction enzymes, including New England BioLabs, Promega Biochems, Boehringer-Mannheim, and the like. Sambrook et al. (See Sambrook et al. Molecular Biology: A Laboratory Approach, Cold Spring Harbor, N.Y. 1989) provide a general description of methods for using restriction enzymes and other enzymes.


In some instances, a methylation-dependent restriction enzyme is a restriction enzyme that cleaves or digests DNA at or in proximity to a methylated recognition sequence, but does not cleave DNA at or near the same sequence when the recognition sequence is not methylated. Methylation-dependent restriction enzymes include those that cut at a methylated recognition sequence (e.g., Dpnl) and enzymes that cut at a sequence near but not at the recognition sequence (e.g., McrBC). For example, McrBC's recognition sequence is 5′ RmC (N40-3000) RmC 3 where “R” is a purine and “mC” is a methylated cytosine and “N40-3000” indicates the distance between the two RmC half sites for which a restriction event has been observed. McrBC generally cuts close to one half-site or the other, but cleavage positions are typically distributed over several base pairs, approximately 30 base pairs from the methylated base. McrBC sometimes cuts 3′ of both half sites, sometimes 5′ of both half sites, and sometimes between the two sites. Exemplary methylation-dependent restriction enzymes include, e.g., McrBC, McrA, MrrA, Bisl, Glal and Dpnl. One of skill in the art will appreciate that any methylation-dependent restriction enzyme, including homologs and orthologs of the restriction enzymes described herein, is also suitable for use with one or more methods described herein.


In some cases, a methylation-sensitive restriction enzyme is a restriction enzyme that cleaves DNA at or in proximity to an unmethylated recognition sequence but does not cleave at or in proximity to the same sequence when the recognition sequence is methylated. Exemplary methylation-sensitive restriction enzymes are described in, e.g., McClelland et al, 22(17) NUCLEIC ACIDS RES. 3640-59 (1994). Suitable methylation-sensitive restriction enzymes that do not cleave DNA at or near their recognition sequence when a cytosine within the recognition sequence is methylated at position C5 include, e.g., Aat II, Aci I, Acd I, Age I, Alu I, Asc I, Ase I, AsiS I, Bbe I, BsaA I, BsaH I, BsiE I, BsiW I, BsrF I, BssH II, BssK I, BstB I, BstN I, BstU I, Cla I, Eae I, Eag I, Fau I, Fse I, Hha I, HinPl I, HinC II, Hpa II, Hpy99 I, HpyCH4 IV, Kas I, Mbo I, Mlu I, MapAl I, Msp I, Nae I, Nar I, Not I, Pml I, Pst I, Pvu I, Rsr II, Sac II, Sap I, Sau3A I, Sfl I, Sfo I, SgrA I, Sma I, SnaB I, Tsc I, Xma I, and Zra I. Suitable methylation-sensitive restriction enzymes that do not cleave DNA at or near their recognition sequence when an adenosine within the recognition sequence is methylated at position N6 include, e.g., Mbo I. One of skill in the art will appreciate that any methylation-sensitive restriction enzyme, including homologs and orthologs of the restriction enzymes described herein, is also suitable for use with one or more of the methods described herein. One of skill in the art will further appreciate that a methylation-sensitive restriction enzyme that fails to cut in the presence of methylation of a cytosine at or near its recognition sequence may be insensitive to the presence of methylation of an adenosine at or near its recognition sequence. Likewise, a methylation-sensitive restriction enzyme that fails to cut in the presence of methylation of an adenosine at or near its recognition sequence may be insensitive to the presence of methylation of a cytosine at or near its recognition sequence. For example, Sau3AI is sensitive (i.e., fails to cut) to the presence of a methylated cytosine at or near its recognition sequence, but is insensitive (i.e., cuts) to the presence of a methylated adenosine at or near its recognition sequence. One of skill in the art will also appreciate that some methylation-sensitive restriction enzymes are blocked by methylation of bases on one or both strands of DNA encompassing of their recognition sequence, while other methylation-sensitive restriction enzymes are blocked only by methylation on both strands, but can cut if a recognition site is hemi-methylated.


In alternative embodiments, adaptors are optionally added to the ends of the randomly fragmented DNA, the DNA is then digested with a methylation-dependent or methylation-sensitive restriction enzyme, and intact DNA is subsequently amplified using primers that hybridize to the adaptor sequences. In this case, a second step is performed to determine the presence, absence or quantity of a particular gene in an amplified pool of DNA. In some embodiments, the DNA is amplified using real-time, quantitative PCR.


In other embodiments, the methods comprise quantifying the average methylation density in a target sequence within a population of genomic DNA. In some embodiments, the method comprises contacting genomic DNA with a methylation-dependent restriction enzyme or methylation-sensitive restriction enzyme under conditions that allow for at least some copies of potential restriction enzyme cleavage sites in the locus to remain uncleaved; quantifying intact copies of the locus; and comparing the quantity of amplified product to a control value representing the quantity of methylation of control DNA, thereby quantifying the average methylation density in the locus compared to the methylation density of the control DNA.


In some instances, the quantity of methylation of a locus of DNA is determined by providing a sample of genomic DNA comprising the locus, cleaving the DNA with a restriction enzyme that is either methylation-sensitive or methylation-dependent, and then quantifying the amount of intact DNA or quantifying the amount of cut DNA at the DNA locus of interest. The amount of intact or cut DNA will depend on the initial amount of genomic DNA containing the locus, the amount of methylation in the locus, and the number (i.e., the fraction) of nucleotides in the locus that are methylated in the genomic DNA. The amount of methylation in a DNA locus can be determined by comparing the quantity of intact DNA or cut DNA to a control value representing the quantity of intact DNA or cut DNA in a similarly-treated DNA sample. The control value can represent a known or predicted number of methylated nucleotides. Alternatively, the control value can represent the quantity of intact or cut DNA from the same locus in another (e.g., normal, non-diseased) cell or a second locus.


By using at least one methylation-sensitive or methylation-dependent restriction enzyme under conditions that allow for at least some copies of potential restriction enzyme cleavage sites in the locus to remain uncleaved and subsequently quantifying the remaining intact copies and comparing the quantity to a control, average methylation density of a locus can be determined. If the methylation-sensitive restriction enzyme is contacted to copies of a DNA locus under conditions that allow for at least some copies of potential restriction enzyme cleavage sites in the locus to remain uncleaved, then the remaining intact DNA will be directly proportional to the methylation density, and thus may be compared to a control to determine the relative methylation density of the locus in the sample. Similarly, if a methylation-dependent restriction enzyme is contacted to copies of a DNA locus under conditions that allow for at least some copies of potential restriction enzyme cleavage sites in the locus to remain uncleaved, then the remaining intact DNA will be inversely proportional to the methylation density, and thus may be compared to a control to determine the relative methylation density of the locus in the sample. Such assays are disclosed in, e.g., U.S. Pat. No. 7,910,296.


The methylated CpG island amplification (MCA) technique is a method that can be used to screen for altered methylation patterns in genomic DNA, and to isolate specific sequences associated with these changes (Toyota et al, 1999, Cancer Res. 59, 2307-2312, U.S. Pat. No. 7,700,324 (Issa et al), the contents of which are hereby incorporated by reference in their entirety). Briefly, restriction enzymes with different sensitivities to cytosine methylation in their recognition sites are used to digest genomic DNAs from primary tumors, cell lines, and normal tissues prior to arbitrarily primed PCR amplification. Fragments that show differential methylation are cloned and sequenced after resolving the PCR products on high-resolution polyacrylamide gels. The cloned fragments are then used as probes for Southern analysis to confirm differential methylation of these regions. Typical reagents (e.g., as might be found in a typical MCA-based kit) for MCA analysis may include, but are not limited to: PCR primers for arbitrary priming Genomic DNA; PCR buffers and nucleotides, restriction enzymes and appropriate buffers; gene-hybridization oligos or probes; control hybridization oligos or probes.


Additional methylation detection methods include those methods described in, e.g., U.S. Pat. Nos. 7,553,627; 6,331,393; U.S. patent Ser. No. 12/476,981; U.S. Patent Publication No. 2005/0069879; Rein, et al, 26(10) NUCLEIC ACIDS RES. 2255-64 (1998); and Olek et al, 17(3) NAT. GENET. 275-6 (1997).


In another embodiment, the methylation status of selected CpG sites is determined using Methylation-Sensitive High Resolution Melting (HRM). Recently, Wojdacz et al. reported methylation-sensitive high resolution melting as a technique to assess methylation. (Wojdacz and Dobrovic, 2007, Nuc. Acids Res. 35(6) e41; Wojdacz et al. 2008, Nat. Prot. 3(12) 1903-1908; Balic et al, 2009 J. Mol. Diagn. 11 102-108; and US Pat. Pub. No. 2009/0155791 (Wojdacz et al), the contents of which are hereby incorporated by reference in their entirety). A variety of commercially available real time PCR machines have HRM systems including the Roche LightCycler480, Corbett Research RotorGene6000, and the Applied Biosystems 7500. HRM may also be combined with other amplification techniques such as pyrosequencing as described by Candiloro et al. (Candiloro et al, 2011, Epigenetics 6(4) 500-507).


In another embodiment, the methylation status of selected CpG locus is determined using a primer extension assay, including an optimized PCR amplification reaction that produces amplified targets for analysis using mass spectrometry. The assay can also be done in multiplex. Mass spectrometry is a particularly effective method for the detection of polynucleotides associated with the differentially methylated regulatory elements. The presence of the polynucleotide sequence is verified by comparing the mass of the detected signal with the expected mass of the polynucleotide of interest. The relative signal strength, e.g., mass peak on a spectra, for a particular polynucleotide sequence indicates the relative population of a specific allele, thus enabling calculation of the allele ratio directly from the data. This method is described in detail in PCT Pub. No. WO 2005/012578A1 (Beaulieu et al), which is hereby incorporated by reference in its entirety. For methylation analysis, the assay can be adopted to detect bisulfite introduced methylation dependent C to T sequence changes. These methods are particularly useful for performing multiplexed amplification reactions and multiplexed primer extension reactions (e.g., multiplexed homogeneous primer mass extension (hME) assays) in a single well to further increase the throughput and reduce the cost per reaction for primer extension reactions.


Other methods for DNA methylation analysis include restriction landmark genomic scanning (RLGS, Costello et al, 2002, Meth. Mol Biol, 200, 53-70), methylation-sensitive-representational difference analysis (MS-RDA, Ushijima and Yamashita, 2009, Methods Mol Biol 507, 1 17-130). Comprehensive high-throughput arrays for relative methylation (CHARM) techniques are described in WO 2009/021141 (Feinberg and Irizarry). The Roche(R) NimbleGen(R) microarrays including the Chromatin Immunoprecipitation-on-chip (ChIP-chip) or methylated DNA immunoprecipitation-on-chip (MeDIP-chip). These tools have been used for a variety of cancer applications including melanoma, liver cancer and lung cancer (Koga et al, 2009, Genome Res., 19, 1462-1470; Acevedo et al, 2008, Cancer Res., 68, 2641-2651; Rauch et al, 2008, Proc. Nat. Acad. Sci. USA, 105, 252-257). Others have reported bisulfate conversion, padlock probe hybridization, circularization, amplification and next generation or multiplexed sequencing for high throughput detection of methylation (Deng et al, 2009, Nat. Biotechnol 27, 353-360; Ball et al, 2009, Nat. Biotechnol 27, 361-368; U.S. Pat. No. 7,611,869 (Fan)). As an alternative to bisulfate oxidation, Bayeyt et al. have reported selective oxidants that oxidize 5-methylcytosine, without reacting with thymidine, which are followed by PCR or pyro sequencing (WO 2009/049916 (Bayeyt et al). These references for these techniques are hereby incorporated by reference in their entirety.


In some instances, quantitative amplification methods (e.g., quantitative PCR or quantitative linear amplification) are used to quantify the amount of intact DNA within a locus flanked by amplification primers following restriction digestion. Methods of quantitative amplification are disclosed in, e.g., U.S. Pat. Nos. 6,180,349; 6,033,854; and 5,972,602, as well as in, e.g., DeGraves, et al, 34(1) BIOTECHNIQUES 106-15 (2003); Deiman B, et al., 20(2) MOL. BIOTECHNOL. 163-79 (2002); and Gibson et al, 6 GENOME RESEARCH 995-1001 (1996).


Following reaction or separation of nucleic acid in a methylation specific manner, the nucleic acid in some cases are subjected to sequence-based analysis. For example, once it is determined that one particular genomic sequence from a sample is hypermethylated or hypomethylated compared to its counterpart, the amount of this genomic sequence can be determined. Subsequently, this amount can be compared to a standard control value and used to determine the present of liver cancer in the sample. In many instances, it is desirable to amplify a nucleic acid sequence using any of several nucleic acid amplification procedures which are well known in the art. Specifically, nucleic acid amplification is the chemical or enzymatic synthesis of nucleic acid copies which contain a sequence that is complementary to a nucleic acid sequence being amplified (template). The methods and kits may use any nucleic acid amplification or detection methods known to one skilled in the art, such as those described in U.S. Pat. No. 5,525,462 (Takarada et al); U.S. Pat. No. 6,114,117 (Hepp et al); U.S. Pat. No. 6,127,120 (Graham et al); U.S. Pat. No. 6,344,317 (Urnovitz); U.S. Pat. No. 6,448,001 (Oku); U.S. Pat. No. 6,528,632 (Catanzariti et al); and PCT Pub. No. WO 2005/111209 (Nakajima et al); all of which are incorporated herein by reference in their entirety.


In some embodiments, the nucleic acids are amplified by PCR amplification using methodologies known to one skilled in the art. One skilled in the art will recognize, however, that amplification can be accomplished by any known method, such as ligase chain reaction (LCR), Q-replicas amplification, rolling circle amplification, transcription amplification, self-sustained sequence replication, nucleic acid sequence-based amplification (NASBA), each of which provides sufficient amplification. Branched-DNA technology is also optionally used to qualitatively demonstrate the presence of a sequence of the technology, which represents a particular methylation pattern, or to quantitatively determine the amount of this particular genomic sequence in a sample. Nolte reviews branched-DNA signal amplification for direct quantitation of nucleic acid sequences in clinical samples (Nolte, 1998, Adv. Clin. Chem. 33:201-235).


The PCR process is well known in the art and include, for example, reverse transcription PCR, ligation mediated PCR, digital PCR (dPCR), or droplet digital PCR (ddPCR). For a review of PCR methods and protocols, see, e.g., Innis et al, eds., PCR Protocols, A Guide to Methods and Application, Academic Press, Inc., San Diego, Calif. 1990; U.S. Pat. No. 4,683,202 (Mullis). PCR reagents and protocols are also available from commercial vendors, such as Roche Molecular Systems. In some instances, PCR is carried out as an automated process with a thermostable enzyme. In this process, the temperature of the reaction mixture is cycled through a denaturing region, a primer annealing region, and an extension reaction region automatically. Machines specifically adapted for this purpose are commercially available.


In some embodiments, amplified sequences are also measured using invasive cleavage reactions such as the Invader(R) technology (Zou et al, 2010, Association of Clinical Chemistry (AACC) poster presentation on Jul. 28, 2010, “Sensitive Quantification of Methylated Markers with a Novel Methylation Specific Technology; and U.S. Pat. No. 7,011,944 (Prudent et al)).


Suitable next generation sequencing technologies are widely available. Examples include the 454 Life Sciences platform (Roche, Branford, Conn.) (Margulies et al. 2005 Nature, 437, 376-380); lllumina's Genome Analyzer, GoldenGate Methylation Assay, or Infinium Methylation Assays, i.e., Infinium HumanMethylation 27K BeadArray or VeraCode GoldenGate methylation array (Illumina, San Diego, Calif.; Bibkova et al, 2006, Genome Res. 16, 383-393; U.S. Pat. Nos. 6,306,597 and 7,598,035 (Macevicz); U.S. Pat. No. 7,232,656 (Balasubramanian et al.)); QX200™ Droplet Digital™ PCR System from Bio-Rad; or DNA Sequencing by Ligation, SOLiD System (Applied Biosystems/Life Technologies; U.S. Pat. Nos. 6,797,470, 7,083,917, 7,166,434, 7,320,865, 7,332,285, 7,364,858, and 7,429,453 (Barany et al); the Helicos True Single Molecule DNA sequencing technology (Harris et al, 2008 Science, 320, 106-109; U.S. Pat. Nos. 7,037,687 and 7,645,596 (Williams et al); 7, 169,560 (Lapidus et al); U.S. Pat. No. 7,769,400 (Harris)), the single molecule, real-time (SMRT™) technology of Pacific Biosciences, and sequencing (Soni and Meller, 2007, Clin. Chem. 53, 1996-2001); semiconductor sequencing (Ion Torrent; Personal Genome Machine); DNA nanoball sequencing; sequencing using technology from Dover Systems (Polonator), and technologies that do not require amplification or otherwise transform native DNA prior to sequencing (e.g., Pacific Biosciences and Helicos), such as nanopore-based strategies (e.g., Oxford Nanopore, Genia Technologies, and Nabsys). These systems allow the sequencing of many nucleic acid molecules isolated from a specimen at high orders of multiplexing in a parallel fashion. Each of these platforms allows sequencing of clonally expanded or non-amplified single molecules of nucleic acid fragments. Certain platforms involve, for example, (i) sequencing by ligation of dye-modified probes (including cyclic ligation and cleavage), (ii) pyrosequencing, and (iii) single-molecule sequencing.


Pyrosequencing is a nucleic acid sequencing method based on sequencing by synthesis, which relies on detection of a pyrophosphate released on nucleotide incorporation. Generally, sequencing by synthesis involves synthesizing, one nucleotide at a time, a DNA strand complimentary to the strand whose sequence is being sought. Study nucleic acids may be immobilized to a solid support, hybridized with a sequencing primer, incubated with DNA polymerase, ATP sulfurylase, luciferase, apyrase, adenosine 5′ phosphsulfate and luciferin. Nucleotide solutions are sequentially added and removed. Correct incorporation of a nucleotide releases a pyrophosphate, which interacts with ATP sulfurylase and produces ATP in the presence of adenosine 5′ phosphsulfate, fueling the luciferin reaction, which produces a chemiluminescent signal allowing sequence determination. Machines for pyrosequencing and methylation specific reagents are available from Qiagen, Inc. (Valencia, Calif.). See also Tost and Gut, 2007, Nat. Prot. 2 2265-2275. An example of a system that can be used by a person of ordinary skill based on pyrosequencing generally involves the following steps: ligating an adaptor nucleic acid to a study nucleic acid and hybridizing the study nucleic acid to a bead; amplifying a nucleotide sequence in the study nucleic acid in an emulsion; sorting beads using a picoliter multiwell solid support; and sequencing amplified nucleotide sequences by pyrosequencing methodology (e.g., Nakano et al, 2003, J. Biotech. 102, 117-124). Such a system can be used to exponentially amplify amplification products generated by a process described herein, e.g., by ligating a heterologous nucleic acid to the first amplification product generated by a process described herein.


CpG Methylation Data Analysis Methods

In certain embodiments, the methylation values measured for biomarkers of a biomarker panel are mathematically combined and the combined value is correlated to the underlying diagnostic question. In some instances, methylated biomarker values are combined by any appropriate state of the art mathematical method. Well-known mathematical methods for correlating a biomarker combination to a disease status employ methods like discriminant analysis (DA) (e.g., linear-, quadratic-, regularized-DA), Discriminant Functional Analysis (DFA), Kernel Methods (e.g., SVM), Multidimensional Scaling (MDS), Nonparametric Methods (e.g., k-Nearest-Neighbor Classifiers), PLS (Partial Least Squares), Tree-Based Methods (e.g., Logic Regression, CART, Random Forest Methods, Boosting/Bagging Methods), Generalized Linear Models (e.g., Logistic Regression), Principal Components based Methods (e.g., SIMCA), Generalized Additive Models, Fuzzy Logic based Methods, Neural Networks and Genetic Algorithms based Methods. The skilled artisan will have no problem in selecting an appropriate method to evaluate an epigenetic marker or biomarker combination described herein. In one embodiment, the method used in a correlating methylation status of an epigenetic marker or biomarker combination, e.g. to diagnose liver cancer or a liver cancer subtype, is selected from DA (e.g., Linear-, Quadratic-, Regularized Discriminant Analysis), DFA, Kernel Methods (e.g., SVM), MDS, Nonparametric Methods (e.g., k-Nearest-Neighbor Classifiers), PLS (Partial Least Squares), Tree-Based Methods (e.g., Logic Regression, CART, Random Forest Methods, Boosting Methods), or Generalized Linear Models (e.g., Logistic Regression), and Principal Components Analysis. Details relating to these statistical methods are found in the following references: Ruczinski et al., 12 J. OF COMPUTATIONAL AND GRAPHICAL STATISTICS 475-511 (2003); Friedman, J. H., 84 J. OF THE AMERICAN STATISTICAL ASSOCIATION 165-75 (1989); Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, The Elements of Statistical Learning, Springer Series in Statistics (2001); Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. Classification and regression trees, California: Wadsworth (1984); Breiman, L., 45 MACHINE LEARNING 5-32 (2001); Pepe, M. S., The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford Statistical Science Series, 28 (2003); and Duda, R. O., Hart, P. E., Stork, D. O., Pattern Classification, Wiley Interscience, 2nd Edition (2001).


In one embodiment, the correlated results for each methylation panel are rated by their correlation to the disease or tumor type positive state, such as for example, by p-value test or t-value test or F-test. Rated (best first, i.e. low p- or t-value) biomarkers are then subsequently selected and added to the methylation panel until a certain diagnostic value is reached. Such methods include identification of methylation panels, or more broadly, genes that were differentially methylated among several classes using, for example, a random-variance t-test (Wright G. W. and Simon R, Bioinformatics 19:2448-2455,2003). Other methods include the step of specifying a significance level to be used for determining the epigenetic markers that will be included in the biomarker panel. Epigenetic markers that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the panel. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction is achieved by being more liberal about the biomarker panels used as features. In some cases, the panels are biologically interpretable and clinically applicable, however, if fewer markers are included. Similar to cross-validation, biomarker selection is repeated for each training set created in the cross-validation process. That is for the purpose of providing an unbiased estimate of prediction error. The methylation panel for use with new patient sample data is the one resulting from application of the methylation selection and classifier of the “known” methylation information, or control methylation panel.


Models for utilizing methylation profile to predict the class of future samples can also be used. These models may be based on the Compound Covariate Predictor (Radmacher et al. Journal of Computational Biology 9:505-511, 2002), Diagonal Linear Discriminant Analysis (Dudoit et al. Journal of the American Statistical Association 97:77-87, 2002), Nearest Neighbor Classification (also Dudoit et al.), and Support Vector Machines with linear kernel (Ramaswamy et al. PNAS USA 98:15149-54, 2001). The models incorporated markers that were differentially methylated at a given significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). The prediction error of each model using cross validation, preferably leave-one-out cross-validation (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003 can be estimated. For each leave-one-out cross-validation training set, the entire model building process is repeated, including the epigenetic marker selection process. In some instances, it is also evaluated in whether the cross-validated error rate estimate for a model is significantly less than one would expect from random prediction. In some cases, the class labels are randomly permuted and the entire leave-one-out cross-validation process is then repeated. The significance level is the proportion of the random permutations that gives a cross-validated error rate no greater than the cross-validated error rate obtained with the real methylation data.


Another classification method is the greedy-pairs method described by Bo and Jonassen (Genome Biology 3(4):research0017.1-0017.11, 2002). The greedy-pairs approach starts with ranking all markers based on their individual t-scores on the training set. This method attempts to select pairs of markers that work well together to discriminate the classes.


Furthermore, a binary tree classifier for utilizing methylation profile is optionally used to predict the class of future samples. The first node of the tree incorporated a binary classifier that distinguished two subsets of the total set of classes. The individual binary classifiers are based on the “Support Vector Machines” incorporating markers that were differentially expressed among markers at the significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). Classifiers for all possible binary partitions are evaluated and the partition selected is that for which the cross-validated prediction error is minimum. The process is then repeated successively for the two subsets of classes determined by the previous binary split. The prediction error of the binary tree classifier can be estimated by cross-validating the entire tree building process. This overall cross-validation includes re-selection of the optimal partitions at each node and re-selection of the markers used for each cross-validated training set as described by Simon et al. (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003). Several-fold cross validation in which a fraction of the samples is withheld, a binary tree developed on the remaining samples, and then class membership is predicted for the samples withheld. This is repeated several times, each time withholding a different percentage of the samples. The samples are randomly partitioned into fractional test sets (Simon R and Lam A. BRB-ArrayTools User Guide, version 3.2. Biometric Research Branch, National Cancer Institute).


Thus, in one embodiment, the correlated results for each marker b) are rated by their correct correlation to the disease, preferably by p-value test. It is also possible to include a step in that the markers are selected d) in order of their rating.


In additional embodiments, factors such as the value, level, feature, characteristic, property, etc. of a transcription rate, mRNA level, translation rate, protein level, biological activity, cellular characteristic or property, genotype, phenotype, etc. can be utilized in addition prior to, during, or after administering a therapy to a patient to enable further analysis of the patient's cancer status.


In some embodiments, a diagnostic test to correctly predict status is measured as the sensitivity of the assay, the specificity of the assay or the area under a receiver operated characteristic (“ROC”) curve. In some instances, sensitivity is the percentage of true positives that are predicted by a test to be positive, while specificity is the percentage of true negatives that are predicted by a test to be negative. In some cases, an ROC curve provides the sensitivity of a test as a function of 1-specificity. The greater the area under the ROC curve, for example, the more accurate or powerful the predictive value of the test. Other useful measures of the utility of a test include positive predictive value and negative predictive value. Positive predictive value is the percentage of people who test positive that are actually positive. Negative predictive value is the percentage of people who test negative that are actually negative.


In some embodiments, one or more of the biomarkers disclosed herein show a statistical difference in different samples of at least p<0.05, p<10−2, p<10−3, p<10−4 or p<10−5. Diagnostic tests that use these biomarkers may show an ROC of at least 0.6, at least about 0.7, at least about 0.8, or at least about 0.9. In some instances, the biomarkers are differentially methylated in different subjects with or without liver cancer. In additional instances, the biomarkers for different subtypes of liver cancer are differentially methylated. In certain embodiments, the biomarkers are measured in a patient sample using the methods described herein and compared, for example, to predefined biomarker levels and are used to determine whether the patient has liver cancer, which liver cancer subtype does the patient have, and/or what is the prognosis of the patient having liver cancer. In other embodiments, the correlation of a combination of biomarkers in a patient sample is compared, for example, to a predefined set of biomarkers. In some embodiments, the measurement(s) is then compared with a relevant diagnostic amount(s), cut-off(s), or multivariate model scores that distinguish between the presence or absence of liver cancer, between liver cancer subtypes, and between a “good” or a “poor” prognosis. As is well understood in the art, by adjusting the particular diagnostic cut-off(s) used in an assay, one can increase sensitivity or specificity of the diagnostic assay depending on the preference of the diagnostician. In some embodiments, the particular diagnostic cut-off is determined, for example, by measuring the amount of biomarker hypermethylation or hypomethylation in a statistically significant number of samples from patients with or without liver cancer and from patients with different liver cancer subtypes, and drawing the cut-off to suit the desired levels of specificity and sensitivity.


Kits/Article of Manufacture

In some embodiments, provided herein include kits for detecting and/or characterizing the methylation profile of a biomarker described herein. In some instances, the kit comprises a plurality of primers or probes to detect or measure the methylation status/levels of one or more samples. Such kits comprise, in some instances, at least one polynucleotide that hybridizes to at least one of the methylation marker sequences described herein and at least one reagent for detection of gene methylation. Reagents for detection of methylation include, e.g., sodium bisulfate, polynucleotides designed to hybridize to sequence that is the product of a marker sequence if the marker sequence is not methylated (e.g., containing at least one C-U conversion), and/or a methylation-sensitive or methylation-dependent restriction enzyme. In some cases, the kits provide solid supports in the form of an assay apparatus that is adapted to use in the assay. In some instances, the kits further comprise detectable labels, optionally linked to a polynucleotide, e.g., a probe, in the kit.


In some embodiments, the kits comprise one or more (e.g., 1, 2, 3, 4, or more) different polynucleotides (e.g., primers and/or probes) capable of specifically amplifying at least a portion of a DNA region of a biomarker described herein. Optionally, one or more detectably-labeled polypeptides capable of hybridizing to the amplified portion are also included in the kit. In some embodiments, the kits comprise sufficient primers to amplify 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different DNA regions or portions thereof, and optionally include detectably-labeled polynucleotides capable of hybridizing to each amplified DNA region or portion thereof. The kits further can comprise a methylation-dependent or methylation sensitive restriction enzyme and/or sodium bisulfite.


In some embodiments, the kits comprise sodium bisulfite, primers and adapters (e.g., oligonucleotides that can be ligated or otherwise linked to genomic fragments) for whole genome amplification, and polynucleotides (e.g., detectably-labeled polynucleotides) to quantify the presence of the converted methylated and or the converted unmethylated sequence of at least one cytosine from a DNA region of an epigenetic marker described herein.


In some embodiments, the kits comprise methylation sensing restriction enzymes (e.g., a methylation-dependent restriction enzyme and/or a methylation-sensitive restriction enzyme), primers and adapters for whole genome amplification, and polynucleotides to quantify the number of copies of at least a portion of a DNA region of an epigenetic marker described herein.


In some embodiments, the kits comprise a methylation binding moiety and one or more polynucleotides to quantify the number of copies of at least a portion of a DNA region of a marker described herein. A methylation binding moiety refers to a molecule (e.g., a polypeptide) that specifically binds to methyl-cytosine.


Examples include restriction enzymes or fragments thereof that lack DNA cutting activity but retain the ability to bind methylated DNA, antibodies that specifically bind to methylated DNA, etc.).


In some embodiments, the kit includes a packaging material. As used herein, the term “packaging material” can refer to a physical structure housing the components of the kit. In some instances, the packaging material maintains sterility of the kit components, and is made of material commonly used for such purposes (e.g., paper, corrugated fiber, glass, plastic, foil, ampules, etc.). Other materials useful in the performance of the assays are included in the kits, including test tubes, transfer pipettes, and the like. In some cases, the kits also include written instructions for the use of one or more of these reagents in any of the assays described herein.


In some embodiments, kits also include a buffering agent, a preservative, or a protein/nucleic acid stabilizing agent. In some cases, kits also include other components of a reaction mixture as described herein. For example, kits include one or more aliquots of thermostable DNA polymerase as described herein, and/or one or more aliquots of dNTPs. In some cases, kits also include control samples of known amounts of template DNA molecules harboring the individual alleles of a locus. In some embodiments, the kit includes a negative control sample, e.g., a sample that does not contain DNA molecules harboring the individual alleles of a locus. In some embodiments, the kit includes a positive control sample, e.g., a sample containing known amounts of one or more of the individual alleles of a locus.


Certain Terminology

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in the art to which the claimed subject matter belongs. It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of any subject matter claimed. In this application, the use of the singular includes the plural unless specifically stated otherwise. It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. In this application, the use of “or” means “and/or” unless stated otherwise. Furthermore, use of the term “including” as well as other forms, such as “include”, “includes,” and “included,” is not limiting.


As used herein, ranges and amounts can be expressed as “about” a particular value or range. About also includes the exact amount. Hence “about 5 μL” means “about 5 μL” and also “5 μL.” Generally, the term “about” includes an amount that would be expected to be within experimental error.


The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.


As used herein, the terms “individual(s)”, “subject(s)” and “patient(s)” mean any mammal. In some embodiments, the mammal is a human. In some embodiments, the mammal is a non-human. None of the terms require or are limited to situations characterized by the supervision (e.g. constant or intermittent) of a health care worker (e.g. a doctor, a registered nurse, a nurse practitioner, a physician's assistant, an orderly or a hospice worker).


A “site” corresponds to a single site, which in some cases is a single base position or a group of correlated base positions, e.g., a CpG site. A “locus” corresponds to a region that includes multiple sites. In some instances, a locus includes one site.


EXAMPLES

These examples are provided for illustrative purposes only and not to limit the scope of the claims provided herein.


Example 1

Table 1 illustrates top 100 cg markers per cancer type, subdivided based on tissue comparison categories. Table 1 is included at the end of the Examples section.


Table 2 illustrates exemplary 20 cg makers per cancer type.



















BLCA
Marker
CpG
BRCA
Marker
CpG





Hypermethylated
17-75447478
cg14517217
Hypermethylated
6-33739558
cg18005901



6-106958474
cg09184502

9-129445599
cg14100888



17-75447504
cg18104645

6-33739406
cg07979401



17-17686141
cg19872299

1-6508954
cg25097074



17-17685407
cg25927164

1-6508592
cg15274864



17-75447785
cg11991516

2-241034966
cg18663033



22-21319661
cg07960806

9-129373499
cg19337180



17-17685582
cg15906409

2-100175805
cg17165836



17-48619308
cg00698906

11-65405492
cg27068330



6-28911474
cg05127899

2-100175704
cg18261069



2-20866242
cg19113641

9-129388668
cg04693928



17-48619672
cg27437806

21-46352817
cg17648210



6-106958645
cg22231056

1-155043322



17-8055756
cg23363911

11-128392102
cg18565473



2-20866231
cg21039778

9-36037454



17-46671298
cg02293936

6-160769754
cg07237939



6-106958303
cg04067276

11-65405760
cg01685242



22-21319668
cg11782550

2-100175768
cg24797187



20-34189411
cg13544006

19-11354132



6-106958640
cg25944720

9-36037340


Hypomethylated
20-50419274
cg12288473
Hypomethylated
8-128403369
cg11118198



16-1576146
cg27316811

6-32135186
cg11108115



7-1932321
cg23618217

19-45575023
cg15732530



17-767334
cg25236028

2-121036440
cg20349803



4-141188341
cg06775361

2-121036760
cg25950520



8-123966105
cg01886570

12-132286017
cg13251533



13-114187990
cg13241681

1-204531767
cg00827973



2-25427108
cg20422417

2-85622094
cg27090078



17-767301
cg13476078

9-108424905
cg06119575



6-18368796
cg23221431

6-33288804
cg15070677



6-106582669
cg25581448

2-121036705
cg27505472



6-106582638
cg17356964

1-245925961
cg23610994



12-90313611
cg20490773

8-126082720



1-8384659
cg10508347

14-81865007
cg10334928



5-180219707
cg18114890

4-141564017
cg03135535



19-8670086
cg11745755

1-168250628
cg17095936



3-12332780
cg16827534

1-162694184
cg19313373



2-219256018
cg23212579

9-97856079
cg13763287



17-25619946
cg07564598

7-127984393
cg08309135



2-219256101
cg16926316

15-39472102
cg13182816















CHOL
Marker
CpG
COAD
Marker
CpG





Hypermethylated
6-32547019
cg07984380
Hypermethylated
2-29338432
cg08808128



1-198904195
cg09517106

2-66803033
cg02764245



5-174150712
cg26092471

15-28351906
cg04803843



12-108733275
cg12142354

4-55991782
cg07893544



1-113498106
cg12551029

1-76080727



12-102872937
cg11612617

5-38556435
cg12587766



6-27782628
cg05444312

20-61051039
cg14980983



3-120627406
cg05293820

7-149411652
cg08131100



6-27840105
cg05511483

20-34894648



2-158114424


15-28352098
cg03061682



2-136743460
cg18110444

20-61051032
cg20265733



6-32522683
cg25140213

12-45444895



6-27782484
cg16347724

8-97506675
cg04261408



1-113497932
cg16251079

1-115880865
cg09008705



3-157824209
cg14961949

6-127440000
cg19365062



4-147443095
cg20910303

9-707166
cg14399863



1-113498174
cg17020695

9-707387
cg13877502



5-111115809
cg11657018

4-55991860
cg00989765



1-113497999
cg07423363

4-157997554
cg11163620



12-4381788
cg15993083

1-57889035


Hypomethylated
13-111553101
cg23052386
Hypomethylated
19-39306244
cg12028674



1-23418405
cg15897613

3-150238257
cg02504622



6-36237037
cg27319730

10-518370
cg26718707



7-1946447
cg25786640

20-57616191
cg20726575



6-105854768
cg18879177

4-187629336
cg24820270



1-117489843
cg23093116

4-187629387
cg25352342



13-101241430


4-187629328
cg21399040



3-24213704
cg25087352

17-48845009
cg25518968



1-202559755
cg11143857

7-150069026
cg17677524



20-36033352
cg14637027

4-6945745
cg16574807



10-5542941
cg26552733

5-179393917
cg26722544



6-34231501
cg15495717

2-106959257
cg24419520



11-134118834
cg22756211

17-80535428
cg03454705



6-16147669
cg15404791

1-1095607
cg02971920



10-135097477
cg27633010

2-8850020
cg00240378



10-76801843
cg23077498

7-73408058
cg16233515



15-102261251
cg14588686

11-2287734
cg14907310



14-65210900
cg11849213

7-73408101
cg18780627



10-119101310
cg17182036

2-106959384
cg13656404



11-126072352
cg26351663

2-112808038
cg25890678















ESCA
Marker
CpG
GBM
Marker
CpG





Hypermethylated
19-57078765

Hypermethylated
6-43237330
cg23514619



19-56915732


19-19336240
cg25814383



19-56915650


17-55938871
cg22451358



19-56915658


12-57504948
cg01063813



19-56915655


19-1468943
cg10565187



19-56915653


6-43237511
cg20434586



7-37960873


4-81123613
cg22054918



19-57078780


19-35531817
cg24139898



19-12175935


14-70038925
cg02380334



19-57078783


6-74233510
cg09681335



19-53696642


17-55938748
cg00023001



19-12175631


19-35531859
cg21484586



15-48470516


19-7937107
cg00438215



19-40315143
cg00498691

1-27894955
cg12610832



7-37960317


11-65314161



7-37960731


19-2614086
cg19250790



2-80531483


1-26690537
cg12813724



12-94543449


16-744328
cg05542681



14-60097247


16-4233246
cg05006755



19-53696649
cg01620580

2-58273552
cg18998670


Hypomethylated
1-1563500
cg09159050
Hypomethylated
12-131452297
cg23617848



5-131725325
cg23475112

1-10240017
cg27508545



X-2836027


7-4747216
cg01275887



7-157204364
cg18362281

2-30041338
cg00073794



7-112431576
cg04876978

18-42325086
cg07015525



1-19280186
cg11697440

4-146804010
cg01578875



19-57105832
cg19641839

7-101849120
cg06010390



7-63772811


1-10240025
cg23661000



9-139913436


4-3372206



1-156232301
cg10199857

1-6390793
cg00081799



19-57082536


6-24140899
cg08908855



5-138032270


7-4747261
cg07076109



21-44195410


16-50897271
cg00900231



16-84446919
cg00838040

5-140865433
cg11830096



11-47213196
cg05584439

10-405225
cg20595750



8-19026658


17-76220929
cg07366188



6-163730283
cg10584587

17-63535223
cg06405341



1-228362509
cg18277682

22-42611379
cg26799416



9-139907691
cg14173587

8-9537322
cg10419849



1-43971496
cg25211006

18-42324965
cg03722909















KICH
Marker
CpG
KIRC
Marker
CpG





Hypermethylated
1-184357556
cg10061342
Hypermethylated
13-113623844
cg09678836



16-4103492
cg10177032

5-66299786
cg02632185



15-70767570
cg16973527

5-66300503
cg27116842



4-74570525
cg23099587

13-113623639
cg00127894



5-43772975


13-113623659
cg15179805



X-78201182


5-66300406
cg17449851



7-28726216
cg05083033

5-66300433
cg11831286



X-107020894


12-54427636
cg16983211



1-9224198
cg22373770

12-52214119
cg19495013



15-96957617


13-113623300
cg16737670



16-474271
cg09555736

12-54422350
cg27441225



10-34320548
cg10772185

8-107669856
cg17176732



6-168134760
cg01797450

5-66299875
cg18118497



16-30347333


8-107669783
cg17136799



10-17273187
cg26063719

13-113623233
cg02223067



12-10022688
cg12591315

8-107669787
cg26622232



6-166970727
cg11387340

17-18908235
cg14807365



16-15766942


2-73151879
cg00486564



16-474430
cg02102075

12-54427700
cg15700739



11-7728814
cg13563074

12-54422488
cg22378817


Hypomethylated
4-185362534
cg02354388
Hypomethylated
13-96086098
cg20278383



16-87873837
cg06665333

2-239169591



22-35694370
cg27436324

8-30209391
cg14895559



8-6616240
cg24485696

16-81565917
cg03212183



20-57471672


13-111880998
cg21980364



11-66494177
cg22015277

4-87993109
cg27000934



5-173307471
cg13384849

16-15127193
cg06978461



20-57471660


1-217724849
cg10325088



4-38006787
cg23289854

20-6025272
cg21209310



17-81045501
cg14521546

7-101556684
cg15600935



16-479994
cg10357909

17-78700928
cg06975080



10-126314680
cg04426802

11-64323816
cg21039563



7-98667581
cg09221667

17-18855313
cg00306284



7-76871793


3-149051159
cg11011913



3-196693809
cg21429725

14-89628141
cg02212698



19-45316509
cg21978694

15-39958882



7-23723014
cg20200811

17-40064155
cg03950716



19-45316493
cg12249345

1-101094117
cg00902464



14-95585023
cg23488578

7-101566857
cg02307880



13-33768013


5-111017556
cg01885839





LIHC
Marker
CpG
LUAD
Marker
CpG





Hypermethylated
2-264178
cg20749741
Hypermethylated
6-152623304
cg05917732



1-145395753
cg17952661

10-8092165
cg15803869



19-54369556


6-152623479
cg12023625



3-183543564
cg19155007

1-23668691
cg00907427



2-264199
cg00108164

10-119304102



19-50096610
cg15032314

1-23668792
cg08718097



6-43044771
cg21663580

21-40984780
cg08448665



1-145395846
cg02395363

10-119304104
cg12894449



6-56819429


7-2271722
cg03694580



3-183543587
cg09738156

21-40984777
cg17900854



19-54369576
cg23305567

11-31832246
cg26848086



2-233792394
cg08165971

17-70119120
cg21126344



2-264204
cg25945732

10-8096370
cg15187550



19-6740952
cg03705926

21-40984886
cg03701992



5-176831297
cg14126493

1-166134346



11-75917844
cg17583449

11-31832353
cg05840031



5-176831638
cg05876864

10-8092264
cg16710894



1-160370208
cg13428480

6-50680997
cg17489695



2-264164
cg03468349

10-8091753
cg22783180



1-21616619
cg17216478

6-152623387
cg02682457


Hypomethylated
22-50644755
cg25934700
Hypomethylated
7-2473529
cg17907628



5-16794882
cg12852139

7-2473859
cg18259487



15-99427016
cg03310087

5-141722174
cg02084814



10-1334207
cg04992974

10-1120831



16-87970367
cg05640992

6-25726912
cg07077277



11-46741373
cg26453360

16-3139060
cg20364187



7-1197113
cg03923535

2-232113244
cg27273946



12-69199037
cg10635494

14-74957673



8-142263802
cg00971369

6-25727291



11-6462110
cg14164596

14-91691190
cg14304073



5-17275783
cg01860297

13-113445870



16-87970376
cg03550864

6-25727107



7-20269648
cg19086110

13-98869844
cg16689193



1-167882440
cg24461337

14-91691129
cg16996571



2-44065725
cg20926720

4-37962455



7-4012372
cg04609841

6-25726953



11-102638470
cg03367136

10-52156632



17-26694939
cg18536123

6-25726956



10-93390347
cg24288527

1-207975699



22-21128411
cg09634469

16-3641320





LUSC
Marker
CpG
OV
Marker
CpG





Hypermethylated
19-10406145
cg23936587
Hypermethylated
19-58281450
cg14038484


Hypomethylated
2-132348705
cg11467141

19-58281016
cg18575209



15-73889528
cg06332339

19-58280832
cg03584535



2-239970152
cg08781549

19-58281117
cg22956410



2-239970075
cg26126749

19-58280927
cg13636880



1-183204836
cg14420230

19-58280891
cg07685728



12-21547892
cg14191244

19-58280801
cg27046034



12-63195796
cg27185063

19-58281019
cg20751795



1-183204789
cg22684969

19-58280994
cg24298255



5-14160853


11-133582402
cg11368617



15-73889561
cg08435157

12-130529951
cg16236851



3-185704206
cg09562655

10-95327305
cg05613447



12-11286360
cg24741148

10-95327292
cg10905401



12-132643897
cg16860971

10-74870311
cg01701415



12-93101181
cg17344080

17-36104218



7-148481030
cg25477176

8-54569668
cg05483021



22-30572326
cg24211826

10-74870240
cg05898953



19-5789699
cg22961623

1-205399945
cg06849719



7-101603335
cg05855116

13-110899996
cg15416440



1-64992433


1-205400197
cg06968878



15-73524906
cg08177743





PAAD
Marker
CpG
PRAD
Marker
CpG





Hypermethylated
14-105715025
cg01237565
Hypermethylated
19-51416098
cg17355294



16-1202468
cg16427096

11-58940866
cg05415131



14-105714843
cg16561543

19-16187631
cg16232979



14-105714589
cg23034818

6-146136749
cg09094393



20-55204593
cg12521353

7-32981826
cg08946731



10-11059577
cg23858040

10-74020976
cg12799885



13-32605406
cg22219254

2-191045309
cg08350814



13-32605611
cg16941656

7-120628874
cg01029638



2-177053345
cg19001226

5-43193681
cg10641714



4-141293985


19-16187889
cg26149167



16-86231948
cg07989221

9-27529339
cg21244846



5-172756558
cg16443866

15-35014270
cg24922143



5-172756550
cg08711858

17-46830260
cg13132370



1-182584341
cg00153856

12-89744701
cg05769889



14-105714833
cg15474367

19-51416517
cg20115266



11-61276678


6-146136563
cg23095615



4-141294009


10-111767379
cg05098590



22-17849639
cg26796679

1-116711077
cg17933722



9-126773879
cg10755973

2-191045668
cg15313226



2-107503672


1-33938224
cg18085998















SKCM
Marker
CpG
UCEC
Marker
CpG





Hypermethylated
7-5468436

Hypermethylated
17-42092403
cg06511389



2-166650606


17-42092315
cg27116819



11-19798742


17-42092187
cg18801599



17-77815779


19-50836561
cg01017355



6-10419927


17-42092370
cg08372947



11-441986


17-42092247
cg24638647



12-112204829


17-42092450
cg13847066



14-61188431


18-5238310
cg20757519



12-48398374
cg17054969

17-42091909
cg04879755



6-1311342


18-5238589
cg10885961



10-25241427


18-5238219
cg15258847



6-10420079


17-42092488
cg25868286



1-111747228


17-42092472
cg01635193



6-26332204


3-169482895
cg15494117



21-38630728


6-26045663
cg25438963



10-25241462


17-42092431
cg12259256



7-5469535


19-50836910
cg03251287



11-19798914


3-169482358
cg27020690



1-6265894


3-10206731
cg11029301



4-186732926
cg04392082

3-164914575














normal blood
Marker
CpG







Hypermethylated
3-188012918
cg02524983




1-43814358
cg18856478




22-51136325
cg18982286




12-58129855
cg16402452




19-15568360
cg08846870




15-22798986




19-39086923
cg05258935




4-159969703
cg10393744




16-67687754
cg09316954




11-57090226
cg11180921




6-31510729




10-17391003
cg05576619




1-3135712




2-30737741




1-221613658
cg24686918




1-164581169
cg20812370




10-63809073
cg14789659




17-43318735




6-12718370




5-10551547
cg24100671










Table 3 illustrates cancer name and its respective abbreviation.













Abbreviation
Name







LAML
Acute Myeloid Leukemia


ACC
Adrenocortical carcinoma


BLCA
Bladder Urothelial Carcinoma


LGG
Brain Lower Grade Glioma


BRCA
Breast invasive carcinoma


CESC
Cervical squamous cell carcinoma and endocervical



adenocarcinoma


CHOL
Cholangiocarcinoma


LCML
Chronic Myelogenous Leukemia


COAD
Colon adenocarcinoma


CNTL
Controls


ESCA
Esophageal carcinoma


FPPP
FFPE Pilot Phase II


GBM
Glioblastoma multiforme


HNSC
Head and Neck squamous cell carcinoma


KICH
Kidney Chromophobe


KIRC
Kidney renal clear cell carcinoma


KIRP
Kidney renal papillary cell carcinoma


LIHC
Liver hepatocellular carcinoma


LUAD
Lung adenocarcinoma


LUSC
Lung squamous cell carcinoma


DLBC
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma


MESO
Mesothelioma


MISC
Miscellaneous


OV
Ovarian serous cystadenocarcinoma


PAAD
Pancreatic adenocarcinoma


PCPG
Pheochromocytoma and Paraganglioma


PRAD
Prostate adenocarcinoma


READ
Rectum adenocarcinoma


SARC
Sarcoma


SKCM
Skin Cutaneous Melanoma


STAD
Stomach adenocarcinoma


TGCT
Testicular Germ Cell Tumors


THYM
Thymoma


THCA
Thyroid carcinoma


UCS
Uterine Carcinosarcoma


UCEC
Uterine Corpus Endometrial Carcinoma


UVM
Uveal Melanoma










FIG. 1 illustrates the methylation status of marker 7-1577016. In some instances, marker 7-1577016 is hypomethylated. In some cases, marker 7-1577016 is used as a pan cancer marker.



FIG. 2 illustrates the methylation status of marker 11-67177103. In some cases, marker 11-67177103 is used as a pan cancer marker.



FIG. 3 illustrates the methylation status of marker 19-10445516 (cg17126555). In some cases, marker 19-10445516 (cg17126555) is used as a pan cancer marker.



FIG. 4 illustrates the methylation status of marker 12-122277360. In some cases, marker 12-122277360 is used as a liver cancer diagnostic marker.



FIG. 5 illustrates the methylation status of marker 6-72130742 (cg24772267). In some cases, marker 6-72130742 (cg24772267) is used as a colon cancer diagnostic marker.



FIG. 6 illustrates the methylation status of marker 3-15369681. In some cases, marker 3-15369681 is used as a liver cancer diagnostic marker.



FIG. 7 illustrates the methylation status of marker 3-131081177. In some cases, marker 3-131081177 is used as a breast cancer diagnostic marker.


Example 2—Generating a Reference Cg Marker Panel

Data Sources


DNA methylation data from initial training set and first testing set were obtained from The Cancer Genome Atlas (TCGA).


Generating a Reference Cg Marker Set


Cancer type specific signature was identified by comparing the pair-wise methylation difference between a particular cancer type versus its corresponding normal tissue, the difference between two different cancer types, as well as difference between two different normal tissues, with a total of 12 tissue groups including 6 tumor groups and 6 normal tissue groups. Patient samples were randomly divided from the TCGA representing 9 cancer types from 6 different tissues with matched adjacent-normal tissue into training and validation cohorts. To do this, a total of 12*11/2=66 unique pair-wise comparisons were performed. Using an Illumina 450,000 CpG methylation microarray, 450k markers were compared from one group to another group using the [column t test] colttests( ) function in the R genefilter package. Markers with the lowest p values by t-statistic and the largest difference in a mean methylation fraction between each comparison were ranked and the top ten markers in each group were selected for further validation analysis.


Calculate Weights for Top Ten Markers in Each Comparison


The Principle Component analysis was applied to the top ten markers in each comparison group using the function in the stats environment: prcomp( ) and the weights in the first principle component of each group were extracted and matched with the ten corresponding markers in each group. There were 45 groupings of weights with markers. These markers were used to classify the samples with several algorithms including Neural Networks, Logistic Regression, Nearest Neighbor (NN) and Support Vector Machines (SVM), all of which generated consistent results. Analyses using SVM were found to be most robust and were therefore used in all subsequent analyses.


For each tumor type, samples were divided into two groups based on the resulting methylation signatures and their survival was plotted using Kaplan-Meier curves. Subgroups based on tumor stage and the presence of residual tumor following treatment was also analyzed. These methylation profiles were able to predict highly statistically significant differences in survival in all tumor types and most subgroups examined.


Generate Variables


45 variables for each of the samples in the data were generated. Using the weight/marker combination, each variable V was calculated using the following equation:






V
=




1

0

1



(

W
*
M

)






where W is the weight and M is the methylation Beta-value between 0 and 1 of the corresponding marker. A matrix was generated where the dimensions are (1) the number of samples by (2) 190 variables.


Classifying Samples


The above mentioned matrix was used to classify the samples. There are several classification algorithms that were used here including Logistic Regression, Nearest Neighbor (NN) and Support Vector Machines (SVM). Analysis using SVM were used in all subsequent analyses.


The Kernel-Based Machine Learning Lab (kernlab) library for R was used to generate the Support Vector Machines. The best results were with the “RBF” kernel. The Crammer, Singer algorithm had slightly better results than the Weston, Watson algorithm. In the analysis, four potential types of classification errors were seen.

    • 1. Incorrect Tissue; e.g. colon tissue is identified as lung tissue.
    • 2. False negative; e.g. lung cancer is identified as normal lung
    • 3. False positive; e.g. normal colon is identified as colon cancer
    • 4. Correct tissue, incorrect cancer type; e.g. kidney renal clear cell carcinoma is identified as kidney renal papillary cell carcinoma.


Three methods were used to validate the results:

    • 1. The samples were divided into five equal parts and 4 of the parts were used for training and the fifth part was used to test the results.
    • 2. Leave one out scenario was used where all of the samples were used for training except one. The one left out was used for testing. This was repeated for each sample until they had all been tested.
    • 3. Two stage replication study: The samples were divided into two sets at the beginning of the process. With the training set, 10 markers in each comparison with the highest t-test scores were identified. These markers were then used to generate principal components and then used these variables to create a SVM. The obtained markers were applied to the test set, and principal components and SVM results were generated.


Tumor DNA Extraction


Genomic DNA extraction from pieces of freshly frozen healthy or cancer tissues was performed with QIAamp DNA Mini Kit (Qiagen) according to manufacturer's recommendations. Roughly 0.5 mg of tissue was used to obtain on average 5 mg of genomic DNA. DNA was stored at −20° C. and analyzed within one week of preparation.


DNA Extraction from FFPE Samples


Genomic DNA from frozen FFPE samples was extracted using QIAamp DNA FFPE Tissue Kit with several modifications. DNA was stored at −20° C. for further analysis.


Bisulfite Conversion of Genomic DNA


1 μg of genomic DNA was converted to bis-DNA using EZ DNA Methylation-Lightning™ Kit (Zymo Research) according to the manufacturer's protocol. Resulting bis-DNA had a size distribution of ˜200-3000 bp, with a peak around ˜500-1000 bp. The efficiency of bisulfite conversion was >99.8% as verified by deep-sequencing of bis-DNA and analyzing the ratio of C to T conversion of CH (non-CG) dinucleotides.


Padlock Probe Designs


CpG markers whose methylation levels significantly differed in any of the comparison between a cancer tissue and normal tissue were used to design padlock probes for sequencing.


Padlock probes were designed using the ppDesigner software. The average length of the captured region was 70 bp, with the CpG marker located in the central portion of the captured region. To prevent bias introduced by unknown methylation status of CpG markers, capturing arms were positioned exclusively within sequences devoid of CG dinucleotides. Linker sequence between arms contained binding sequences for amplification primers separated by a variable stretch of Cs to produce probes of equal length. The average length of probes was 91 bp. Probes incorporated a 6-bp unique molecular identifier (UMI) sequence to allow for the identification of individual molecular capture events and accurate scoring of DNA methylation levels.


Probes were synthesized as separate oligonucleotides using standard commercial synthesis methods. For capture experiments, probes were mixed, in-vitro phosphorylated with T4 PNK (NEB) according to manufacturer's recommendations and purified using P-30 Micro Bio-Spin columns (Bio-Rad).


Bis-DNA Capture


20 ng of bisulfite-converted DNA was mixed with a defined molar ratio of padlock probes in 20 μl reactions containing 1× Ampligase buffer (Epicentre). The optimal molar ratio of probes to DNA was determined experimentally to be 20,000:1. Reactions were covered with 50 μl of mineral oil to prevent evaporation. To anneal probes to DNA, 30 second denaturation at 95° C. was followed by a slow cooling to 55° C. at a rate of 0.02° C. per second. Hybridization was left to complete for 15 hrs at 55° C. To fill gaps between annealed arms, 5 μl of the following mixture was added to each reaction: 2U of PfuTurboCx polymerase (pre-activated for 3 min at 95° C. (Agilent)), 0.5 U of Ampligase (Epicentre) and 250 pmol of each dNTP in 1× Ampligase buffer. After 5 hour incubation at 55° C., reactions were denatured for 2 minutes at 94° C. and snap-cooled on ice. 5 μl of exonuclease mix (20 U of Exo I and 100 U of ExoIII, both from Epicentre) was added and single-stranded DNA degradation was carried out at 37° C. for 2 hours, followed by enzyme inactivation for 2 minutes at 94° C.


Circular products of site specific capture were amplified by PCR with concomitant barcoding of separate samples. Amplification was carried out using primers specific to linker DNA within padlock probes, one of which contained specific 6 bp barcodes. Both primers contained Illumina next-generation sequencing adaptor sequences. PCR was done as follows: 1× Phusion Flash Master Mix, 3 μl of captured DNA and 200 nM final [c] of primers, using the following cycle: 10s @ 98° C., 8× of (1s @ 98° C., 5s @ 58° C., 10s @ 72° C.), 25× of (1s @ 98° C., 15s @ 72° C.), 60s @ 72° C. PCR reactions were mixed and the resulting library was size selected to include effective captures (˜230 bp) and exclude “empty” captures (˜150 bp) using Agencourt AMPure XP beads (Beckman Coulter). Purity of the libraries was verified by PCR using Illumina flowcell adaptor primers (P5 and P7) and the concentrations were determined using Qubit dsDNA HS assay (Thermo Fisher). Libraries were sequenced using MiSeq and HiSeq2500 systems (Illumina).


Optimization of Capture Coverage Uniformity


Deep sequencing of the original pilot capture experiments showed significant differences between number of reads captured by most efficient probes and non-efficient probes (60-65% of captured regions with coverage >0.2 of average). To ameliorate this, relative efficiencies were calculated from sequencing data and probes were mixed at adjusted molar ratios. This increased capture uniformity to 85% of regions at >0.2 of average coverage.


Sequencing Data Analysis


Mapping of sequencing reads was done using the software tool bisReadMapper (Diep, D, Nat Methods. 2012 Feb. 5; 9(3):270-272) with some modifications. First, UMI were extracted from each sequencing read and appended to read headers within FASTQ files using a custom script generously provided by D. D. Reads were on-the-fly converted as if all C were non-methylated and mapped to in-silico converted DNA strands of the human genome, also as if all C were non-methylated, using Bowtie2 (Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359). Original reads were merged and filtered for single UMI, i.e. reads carrying the same UMI were discarded leaving a single one. Methylation frequencies were extracted for all CpG markers for which padlock probes were designed. Markers with less than 20 reads in any sample were excluded from analysis.


Example 3—Generating a Cg Marker Panel for Diagnosis and Prognosis of Hepatocellular Carcinoma from cfDNA

Patient Data


Tissue DNA methylation data was obtained from The Cancer Genome Atlas (TCGA). Complete clinical, molecular, and histopathological datasets are available at the TCGA website. Individual institutions that contributed samples coordinated the consent process and obtained informed written consent from each patient in accordance to their respective institutional review boards.


A second independent Chinese cohort consisted of HCC patients at the Sun Yat-sen University Cancer Center in Guangzhou, Xijing Hospital in Xi'an and the West China Hospital in Chengdu, China. Those who presented with HCC from stage I-IV were selected and enrolled in this study. Patient characteristics and tumor features are summarized in Supplementary Table 1. The TNM staging classification for HCC is according to the 7th edition of the AJCC cancer staging manual. The TNM Staging System is one of the most commonly used tumor staging systems. This system was developed and is maintained by the American Joint Committee on Cancer (AJCC) and adopted by the Union for International Cancer Control (UICC). The TNM classification system was developed as a tool for oncologists to stage different types of cancer based on certain standard criteria. The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). This project was approved by the IRBs of Sun Yat-sen University Cancer Center, Xijing Hospital, and West China Hospital. Informed consent was obtained from all patients. Tumor and normal tissues were obtained as clinically indicated for patient care and were retained for this study. Human blood samples were collected by venipuncture and plasma samples were obtained by taking supernatant after centrifugation and stored at −80° C. before cfDNA extraction. The key raw data were verified and uploaded onto the Research Data Deposit public platform with an approval number RDDB2017000132.


Tumor DNA Extraction


Genomic DNA extraction from freshly frozen healthy or cancer tissues was performed with QIAamp DNA Mini Kit (Qiagen) according to manufacturer's recommendations. Roughly 0.5 mg of tissue was used to obtain on average 5 μg of genomic DNA. DNA was stored at −20° C. and analyzed within one week of preparation.


DNA Extraction from FFPE Samples


Genomic DNA from frozen FFPE samples was extracted using QIAamp DNA FFPE Tissue Kit with several modifications. DNA were stored at −20° C. for further analysis.


Cell-Free DNA Extraction from Plasma Samples


cfDNA extraction from 1.5 ml of plasma samples was performed with QIAamp cfDNA Kit (Qiagen) according to manufacturer's recommendations.


Bisulfite Conversion of Genomic DNA


About 10 ng of cf DNA was converted to bis-DNA using EZ DNA Methylation-Lightning™ Kit (Zymo Research) according to the manufacturer's protocol. Resulting bis-DNA had a size distribution of ˜200-3000 bp, with a peak around ˜500-1000 bp. The efficiency of bisulfate conversion was >99.8% as verified by deep-sequencing of bis-DNA and analyzing the ratio of C to T conversion of CH (non-CG) dinucleotides.


Determination of DNA Methylation Levels by Deep Sequencing of Bis-DNA Captured with Molecular-Inversion (Padlock) Probes


CpG markers whose methylation levels significantly differed in any of the comparisons between any cancer tissue and any normal tissue were used to design padlock probes for capture and sequencing of cfDNA. Padlock-capture of bis-DNA was based on the technique on published methods with modifications.


Probe Design and Synthesis


Padlock probes were designed using the ppDesigner software. The average length of the captured region was 100 bp, with the CpG marker located in the central portion of the captured region. Linker sequence between arms contained binding sequences for amplification primers separated by a variable stretch of Cs to produced probes of equal length. A 6-bp unique molecular identifier (UMI) sequence was incorporated in probe design to allow for the identification of unique individual molecular capture events and accurate scoring of DNA methylation levels.


Probes were synthesized as separate oligonucleotides using standard commercial synthesis methods (ITD). For capture experiments, probes were mixed, in-vitro phosphorylated with T4 PNK (NEB) according to manufacturer's recommendations and purified using P-30 Micro Bio-Spin columns (Bio-Rad).


Bis-DNA Capture


About 10 ng of bisulfite-converted DNA was mixed with padlock probes in 20 μl reactions containing 1× Ampligase buffer (Epicentre). To anneal probes to DNA, 30 second denaturation at 95° C. was followed by a slow cooling to 55° C. at a rate of 0.02° C. per second. Hybridization was left to complete for 15 hrs at 55° C. To fill gaps between annealed arms, 5 μl of the following mixture was added to each reaction: 2U of PfuTurboCx polymerase (Agilent), 0.5 U of Ampligase (Epicentre) and 250 pmol of each dNTP in 1× Ampligase buffer. After 5 hour incubation at 55° C., reactions were denatured for 2 minutes at 94° C. 5 μl of exonuclease mix (20 U of Exo I and 100 U of ExoIII, both from Epicentre) was added and single-stranded DNA degradation was carried out at 37° C. for 2 hours, followed by enzyme inactivation for 2 minutes at 94° C.


Circular products of site-specific capture were amplified by PCR with concomitant barcoding of separate samples. Amplification was carried out using primers specific to linker DNA within padlock probes, one of which contained specific 6 bp barcodes. Both primers contained Illumina next-generation sequencing adaptor sequences. PCR was done as follows: 1× Phusion Flash Master Mix, 3 μl of captured DNA and 200 nM primers, using the following cycle: 10s @ 98° C., 8× of (1s @ 98° C., 5s @ 58° C., 10s @ 72° C.), 25× of (1s @ 98° C., 15s @ 72° C.), 60s @ 72° C. PCR reactions were mixed and the resulting library was size selected to include effective captures (˜230 bp) and exclude “empty” captures (˜150 bp) using Agencourt AMPure XP beads (Beckman Coulter). Purity of the libraries was verified by PCR using Illumina flowcell adaptor primers (P5 and P7) and the concentrations were determined using Qubit dsDNA HS assay (Thermo Fisher). Libraries were sequenced using MiSeq and HiSeq2500 systems (Illumina).


Optimization of Capture Coverage Uniformity


Deep sequencing of the original pilot capture experiments showed significant differences between number of reads captured by most efficient probes and non-efficient probes (60-65% of captured regions with coverage >0.2× of average). To ameliorate this, relative efficiencies were calculated from sequencing data and probes were mixed at adjusted molar ratios. This increased capture uniformity to 85% of regions at >0.2× of average coverage.


Sequencing Data Analysis


Mapping of sequencing reads was done using the software tool bisReadMapper with some modifications. First, UMI were extracted from each sequencing read and appended to read headers within FASTQ files using a custom script. Reads were on-the-fly converted as if all C were non-methylated and mapped to in-silico converted DNA strands of the human genome, also as if all C were non-methylated, using Bowtie2[28]. Original reads were merged and filtered for single UMI, i.e. reads carrying the same UMI were discarded leaving a single, unique read. Methylation frequencies were calculated for all CpG dinucleotides contained within the regions captured by padlock probes by dividing the numbers of unique reads carrying a C at the interrogated position by the total number of reads covering the interrogated position.


Identification of Blocks of Correlated Methylation (BCM)


Pearson correlation coefficients between methylation frequencies of each pair of CpG markers separated by no more than 200 bp were calculated separately across 50 cfDNA samples from each of the two diagnostic categories, ie normal health blood and HCC. A value of Pearson's r<0.5 was used to identify transition spots (boundaries) between any two adjacent markers indicating uncorrelated methylation. Markers not separated by a boundary were combined into Blocks of Correlated Methylation (BCM). This procedure identified a total of 1550 BCM in each diagnostic category within our padlock data, combining between 2 and 22 CpG positions in each block. Methylation frequencies for entire BCMs were calculated by summing up the numbers of Cs at all interrogated CpG positions within a BCM and dividing by the total number of C+Ts at those positions.


DNA Isolation and Digital Quantitative PCR


Tumor and corresponding plasma samples were obtained from patients undergoing surgical tumor resection; samples were frozen and preserved in at −80° C. until use. Isolation of DNA and RNA from samples was performed using AllPrep DNA/RNA Mini kit and a cfDNA extraction kit, respectively (Qiagen, Valencia, Calif.).


Data Sources


DNA methylation data of 485,000 sites generated using the Infinium 450K Methylation Array were obtained from the TCGA and dataset generated from our previous study (GSE40279) in which DNA methylation profiles for HCC and blood were analyzed. IDAT format files of the methylation data were generated containing the ratio values of each scanned bead. Using the minfi package from Bioconductor, these data files were converted into a score, referred to as a Beta value. Methylation values of the Chinese cohort were obtained by targeted bisulfate sequencing using a molecular inversion probe and analyzed as described below.


Statistical Analysis—DNA Methylation Marker Pre-Selection for Diagnostic and Prognostic Analysis


A differential methylation analysis on TCGA data using a “moderated t-statistics shrinking” approach was first performed and the P-value for each marker was then corrected by multiple testing by the Benjamini-Hochberg procedure to control FDR at a significance level of 0.05. The list was ranked by adjusted P-value and selected the top 1000 markers for designing padlock probes. cfDNA samples with low quality or fewer than 20,000 reads per sample were also eliminated. Methylation values for each marker were defined as the proportion of read counts with methylation divided by total read counts. Methylation markers with a range of methylation values less than 0.1 in matched tumor tissue and tumor blood samples were eliminated.


Building a Cg Marker Panel from cfDNA


The cfDNA dataset was randomly split into training and validation cohorts with a 2:1 ratio. Two variable selection methods suitable for high-dimensionality on the prescreened training dataset were applied: Least Absolute Shrinkage and Selection Operator (LASSO) and Random Forest based variable selection method using OOB error. As results can depend strongly on the arbitrary choice of a random sample split for sparse high-dimensional data, an analysis of the “multi-split” method were adopted, which improves variable selection consistency while controlling finite sample error. For LASSO selection operator, 75 percent of the dataset was subsampled without replacement 500 times and selected the markers with repeat occurrence frequency more than 450. The tuning parameters was determined according to the expected generalization error estimated from 10-fold cross-validation and information-based criteria AIC/BIC, and the largest value of lambda was adopted such that the error was within one standard error of the minimum, known as “1-se” lambda. For the random forest analysis, using the OOB error as a minimization criterion, variable elimination from the random forest was carried out by setting variable a dropping fraction of each iteration at 0.3. Ten overlapping methylation markers were chosen by the two methods for model building a binary prediction. A logistic regression model was fitted using these 10 markers as the covariates and obtained a combined diagnosis score (designated as cd-score) by multiplying the unbiased coefficient estimates and the marker methylation value matrix in both the training and validation datasets. The predictability of the model was evaluated by area under ROC (AUC, also known as C-index), which calculated the proportions of concordant pairs among all pairs of observations with 1.0 indicating a perfect prediction accuracy. Confusion tables were generated using an optimized cd-score cutoff with a maximum Youden's index.


The pre-treatment or initial methylation level was evaluated at baseline, and the post-methylation level was evaluated approximately 2 months after treatment, where the treatment referred to either chemotherapy or surgical resection of tumor. The primary endpoint (including response to treatment: progressive disease (PD), partial response (PR) and stable disease (SD)) were defined according to the RECIST guideline. For patients treated with surgical removal and no recurrence at time of evaluation, it was assumed that they had complete response (CR). The difference of cd-score distribution between clinical categories was examined by Wilcoxon Rank Sum test as cd-score was tested to be non-normally distributed using a Shapiro-Wilk Test.


Patient and Sample Characteristics


Clinical characteristics and molecular profiling including methylation data for comparison between HCC and blood lymphocytes were assembled from sources including 377 HCC tumor samples from The Cancer Genome Atlas (TCGA) and 754 blood leukocyte samples of healthy control individuals from a dataset on aging (GSE40279). To study ctDNA in HCC, plasma samples were obtained from Chinese patients with HCC and randomly selected healthy controls undergoing routine health care maintenance, resulting in a training cohort of 715 HCC patients and 560 normal healthy controls and a validation cohort of 383 HCC patients and 275 healthy controls. All participants provided written informed consent.


Identification of Methylation Markers Differentiating HCC and Blood


It was hypothesized that CpG markers with a maximal difference in methylation between HCC and blood leukocytes in normal individuals would be most likely to demonstrate detectable methylation differences in the cfDNA of HCC patients when compared to that of normal controls. The “moderated t-statistics” method was used with Empirical Bayes for shrinking the variance, and the Benjamini-Hochberg procedure to control the FDR at a significance level of 0.05 to identify the top 1000 markers with the most significantly different rates of methylation (i.e. those with the lowest p-values) between HCC and blood. Unsupervised hierarchical clustering of these top 1000 markers was able to distinguish between HCC and blood leukocytes in normal individuals (FIG. 12). Molecular inversion (padlock) probes was designed corresponding to these 1000 markers and tested them in 28 pairs of HCC tissue DNA and matched plasma ctDNA from the same patient. The methylation profiles in HCC tumor DNA and matched plasma ctDNA were consistent (FIG. 9A, FIG. 9B). 401 markers with a good experimental amplification profile and dynamic methylation range were selected for further analysis.


Methylation Block Structure for Improved Allele Calling Accuracy


The concept of genetic linkage disequilibrium (LD block) was employed to study the degree of co-methylation among different DNA strands, with the underlying assumption that DNA sites in close proximity are more likely to be co-methylated than distant sites. A paired-end Illumina sequencing reads was used to identify each individual methylation block (mBlock). A Pearson correlation method to quantify co-methylation or mBlock was used. All common mBlocks of a region were compiled by calculating different mBlock fractions. The genome was then partitioned into blocks of tightly co-methylated CpG sites termed methylation correlated blocks (MCBs), using an r2 cutoff of 0.5. MCBs was surveyed in cfDNA of 500 normal samples and found that MCBs are highly consistent. It was next determined methylation levels within an MCB in the cfDNA from 500 HCC samples. It was found that a highly consistent methylation pattern in MCBs when comparing normal versus HCC cfDNA samples, which significantly enhanced allele-calling accuracy (FIG. 13). This technique was employed in all subsequent sequencing analysis.


cfDNA Diagnostic Prediction for HCC


The methylation values of the 401 selected markers that showed good methylation ranges in cfDNA samples were analyzed by Random Forest and Least Absolute Shrinkage and Selection Operator (LASSO) methods to further reduce the number of markers by modeling them in 715 HCC ctDNA and 560 normal cfDNA samples (FIG. 8). 24 markers were obtained using the Random Forest analysis. 30 markers were obtained using a LASSO analysis in which selected markers were required to appear over 450 times out of a total of 500 repetitions. There were 10 overlapping markers between these two methods (Table 4). Using a logistic regression method, a diagnostic prediction model was constructed with these 10 markers. Applying the model yielded a sensitivity of 94.3% and specificity of 85.7% for HCC in the training dataset of 715 HCC and 560 normal samples (FIG. 9C) and a sensitivity of 90.5% and specificity of 83.2% in the validation dataset of 383 HCC and 275 normal samples (FIG. 9D). It was also demonstrated this model could differentiate HCC from normal controls both in the training dataset (AUC=0.966) and the validation dataset (AUC=0.944) (FIG. 9E, FIG. 9F). Unsupervised hierarchical clustering of these 10 markers was able to distinguish HCC from normal controls with high specificity and sensitivity (FIG. 9G, FIG. 10H, FIG. 14).














TABLE 4





Markers
Ref Gene
Coefficients
SE
z value
p value






















15.595
2.395
6.513
<0.001


cg10428836
BMPRIA
11.543
0.885
−13.040
<0.001


cg26668608
PSD
4.557
0.889
5.129
<0.001


cg25754195
ARHGAP25
2.519
0.722
3.487
<0.001


cg05205842
KLF3
−3.612
0.954
−3.785
<0.001


cg11606215
PLAC8
6.865
1.095
6.271
<0.001


cg24067911
ATXN1
−5.439
0.868
−6.265
<0.001


cg18196829
Chr 6:170
−9.078
1.355
−6.698
<0.001


cg23211949
Chr 6:3
−5.209
1.081
−4.819
<0.001


cg17213048
ATAD2
6.660
1.422
4.683
<0.001


cg25459300
Chr 8:20
1.994
1.029
1.938
0.053





SE: standard errors of coefficient;


z value: Wald z-statistic value






It was next assessed that a combined diagnostic score (cd-score) of the model for differentiating between liver diseases (HBV/HCV infection, and fatty liver) and HCC, since these liver diseases are known major risk factors for HCC. It was found that the cd-score could differentiate HCC patients from those with liver diseases or healthy controls (FIG. 10A). These results were consistent and comparable with those predicted by AFP levels (FIG. 10B).


Methylation Markers Predicted Tumor Load, Treatment Response, and Staging


It was next studied that the utility of the cd-score in assessing treatment response, the presence of residual tumor following treatment, and staging of HCC. Clinical and demographic characteristics, such as age, gender, race, and AJCC stage were included in the analysis. The cd-scores of patients with detectable residual tumor following treatment (n=828) were significantly higher than those with no detectable tumor (n=270), and both were significantly greater than normal controls (n=835) (p<0.0001, FIG. 10C). Similarly, cd-scores were significantly higher in patients before treatment (n=109) or with progression (n=381) compared to those with treatment response (n=248) (p<0.0001, FIG. 10D). In addition, cd-scores were significantly lower in patients with complete tumor resection after surgery (n=170) compared with those before surgery (n=109), yet were higher in patients with recurrence (n=155) (p<0.0001, FIG. 10E). Furthermore, there is good correlation between the cd-scores and tumor stage. Patients with early stage disease (I, II) had substantially lower cd-scores compared to those with advanced stage disease (III, IV) (p<0.05, FIG. 10F). Collectively, these results suggest that the cd-score (i.e., the amount of ctDNA in plasma) correlates well with tumor burden and may have utility in predicting tumor response and surveillance for recurrence.


Utility of ctDNA Diagnostic Prediction and AFP


In some instances, the blood biomarker for risk assessment and surveillance of HCC is serum AFP levels. However, its low sensitivity makes it inadequate to detect all patients that will develop HCC and severely limits its clinical utility. In fact, many cirrhotic patients develop HCC without any increase in AFP levels. Strikingly, 40% patients of the HCC study cohort have a normal serum AFP (<25 ng/ml).


In biopsy-proven HCC patients, the cd-score demonstrated superior sensitivity and specificity than AFP for HCC diagnosis (AUC 0.969 vs 0.816, FIG. 10G). In patients with treatment response, tumor recurrence, or progression, cd-score showed more significant changes compared to testing at initial diagnosis than AFP (FIG. 10H, FIG. 10I). In patients with serial samples, those with a positive treatment response had a concomitant significant decrease in cd-score compared to that prior to treatment, and there was an even further decrease in patients after surgery. By contrast, our patients with progressive or recurrent disease all had an increase in cd-score (FIG. 15). By comparison, AFP was less sensitive for assessing treatment efficacy in individual patients (FIG. 16). In addition, while cd-score correlated well with tumor stage (FIG. 10J), particularly among patients with stage I, II and III, there was no significant difference in AFP values in patients with different stages, except between patients with stage III and IV (FIG. 10K), indicating an advantage of cd-score over AFP in differentiation of early stage HCC.


ctDNA Prognostic Prediction for HCC


It was then investigated the potential of using methylation markers in ctDNA for prediction of prognosis in HCC in combination with clinical and demographic characteristics including age, gender, race, and AJCC stage. The 1049 HCC patients were randomly split with complete survival information into training and validation datasets with an allocation of 2:1. Unicox and LASSO-cox methods were implemented to reduce the dimensionality and constructed a cox-model to predict prognosis with an 8-marker panel (Table 5). Kaplan-Meier curves was generated in training and validation datasets using a combined prognosis score (cp-score) with these markers. The high-risk group (cp-score >−0.24) had 341 observations with 53 events in training dataset and 197 observations with 26 events in validation dataset; and the low-risk group (cp-score ≤−0.24) has 339 observations with 7 events in training dataset and 172 observations with 9 events in validation dataset. Median survival was significantly different in both the training set (p<0.0001) and the validation set (p=0.0014) by log-rank test (FIG. 11A, FIG. 11B).
















TABLE 5











z
p


Markers
Ref Gene
Coefficients
HR
CI
SE
value
value






















cg23461741
SH3PXD2A
−1.264
0.282
0.024-3.340
1.2604
−1.003
0.316


cg06482904
Cllorf9
−0.247
0.781
0.067-9.100
1.2530
−0.197
0.844


cg25574765
PPFIA1
1.026
2.790
 0.488-15.900
0.8894
1.153
0.249


cg07459019
Chr 17:78
−8.156
0.000
0.000-0.012
1.9112
−4.267
<0.001


cg20490031
SERPINB5
6.082
438.000
  13.200-14600.000
1.7885
3.400
0.001


cg01643250
NOTCH3
−5.368
0.005
0.000-0.140
1.7357
−3.093
0.002


cg11397370
GRHL2
1.497
4.470
 1.030-19.400
0.7506
1.994
0.046


cg11825899
TMEM8B
2.094
8.120
 0.957-68.900
1.0909
1.920
0.055





HR: Hazard Ratio;


CI: 95.0% confidence interval;


SE: standard errors of coefficients;


z value: Wald z-statistic value






Multivariate variable analysis showed that the cp-score was significantly correlated with risk of death both in the training and validation dataset and that the cp-score was an independent risk factor of survival (hazard ratio [HR]: 2.512; 95% confidence interval [CI]: 1.966-3.210; p<0.001 in training set; HR: 1.553, CI: 1.240-1.944; p<0.001 in validation set. Interestingly, AFP was no longer significant as a risk factor when cp-score and other clinical characteristics were taken into account.


As expected, TNM stage predicted the prognosis of patients in the training and validation dataset (FIG. 11C, FIG. 11D). However, the combination of cp-score and TNM staging improved the ability to predict prognosis in both the training (AUC 0.7935, FIG. 11E) and validation datasets (AUC 0.7586, FIG. 11F). Kaplan-Meier curves also showed that patients separated by both cp-score and staging have different prognosis (p<0.0001, FIG. 11G). These results demonstrate that ctDNA methylation analysis may contribute to risk stratification and prediction of prognosis in patients with HCC.


In this study, differentially methylated CpG sites were first determined between HCC tumor samples and blood leukocytes in normal individuals for an HCC-specific panel. A diagnostic prediction model then constructed was using a 10-methylation marker panel (cd-score) for use in cfDNA; the cd-score effectively discriminated patients with HCC from individuals with HBV/HCV infection, cirrhosis, and fatty liver as well as healthy controls. Given that patients with these liver diseases are the target screening population under current guidelines, it is important that a serum test reliably distinguish these disease states from HCC. In the study, the sensitivity of the cd-score for HCC is comparable to liver ultrasound, the current standard for HCC screening, markedly superior to AFP, and may represent a more cost-effective and less resource-intensive approach. Furthermore, the cd-score of the model showed high correlation with HCC tumor burden, treatment response, and stage, and is superior to the performance of AFP in the instant cohort. In some cases, the cd-score is useful for assessment of treatment response and surveillance for recurrence.


Example 4—Cell-Free DNA Methylation Markers for Diagnosis and Prognosis of Lung Cancer and Hepatocellular Carcinoma

Primary Tissue Patient Data


Both primary solid tissues from cancer patients and blood tissues from healthy donor were measured by Illumina 450k infimum bead chip. Primary tumor DNA methylation data of 485,000 sites was obtained from The Cancer Genome Atlas (TCGA). Complete clinical, molecular, and histopathological datasets are available at the TCGA website. Individual institutions that contributed samples coordinated the consent process and obtained informed written consent from each patient in accordance to their respective institutional review boards. Blood tissue DNA methylation data from healthy donor were obtained and generated based on study from Hannum et al., 2013, Mol Cell 49, 359-367 (GSE40279) in which DNA methylation profiles for HCC and blood were analyzed.


Serum Sample Patient Data


A second independent Chinese cohort consisted of LUNC and HCC patients at the Sun Yat-sen University Cancer Center in Guangzhou, Xijing Hospital in Xi'an, and the West China Hospital in Chengdu, China. Patients who presented with LUNC and HCC from stage I-IV were selected and enrolled in this study. Patient characteristics and tumor features are summarized in Table 9. The TNM staging classification for LUNC and HCC is according to the 7th edition of the AJCC cancer staging manual. This project was approved by the IRBs of Sun Yat-sen University Cancer Center, Xijing Hospital, and West China Hospital. Informed consent was obtained from all patients. Two prospective trials on early detection of LUNC and HCC patients using methylation markers for predicting cancer occurrence in high-risk populations were conducted. In the first study, patients were recruited from a group of smokers that were undergoing CT scan-based lung cancer screening from December 2015 to December 2016. Patients presenting with lung nodules (<10 mm, n=232, Table 12) were selected to undergo methylation profiling at the time of screening and were subsequently followed through secondary testing to determine whether nodules were due to cancer or inflammatory or infectious conditions by tissue biopsy and pathology diagnosis verification. In the second trial, high risk patients with liver cirrhosis were enrolled (n=242).


Tumor and normal tissues were obtained as clinically indicated for patient care and were retained for this study. Human blood samples were collected by venipuncture, and plasma samples were obtained by taking the supernatant after centrifugation and stored at −80° C. before cfDNA extraction.


The pre-treatment serum samples were obtained at the initial diagnosis, and the post-treatment serum samples were evaluated approximately 2 months after treatment, where the treatment referred to either chemotherapy or surgical resection of tumor. The primary endpoint (including response to treatment: progressive disease (PD), partial response (PR) and stable disease (SD)) was defined according to the RECIST guideline. For patients treated with surgical removal and no recurrence at time of evaluation, it was assumed that they had complete response (CR).


Extraction of cfDNA from Plasma


It was determined that the minimal volume of plasma required to get consistent amounts of cfDNA for targeted sequencing. As a rough guide, it was aimed at ˜20× coverage at 90% of markers covered by the padlock probe panel (see below). It was observed that 20,000 or more total unique reads per sample fulfilled this criterion. It was found that 1.5 ml or more plasma could reliably yield enough cfDNA to produce >20,000 unique reads. The relationship between amount of cfDNA in 1.5 ml plasma and detected copy numbers was further investigated using digital droplet PCR. It was found that 1.5 ml of plasma yielded >10 ng, what produced at least 140 copies of detected amplicons in each digital droplet PCR assay. It was therefore settled on using 15 ng/1.5 ml as a cutoff in all of our experiments to obtain consistent and reliable measurements of DNA methylation.


cfDNA from 1.5 ml of plasma was extracted using EliteHealth cfDNA extraction Kit (EliteHealth, Guangzhou Youze, China) according to manufacturer's recommendations.


Bisulfite Conversion of Genomic or cfDNA


10 ng of DNA was converted to bis-DNA using EZ DNA Methylation-Lightning™ Kit (Zymo Research) according to the manufacturer's protocol. Resulting bis-DNA had a size distribution of ˜200-3000 bp, with a peak around ˜500-1000 bp. The efficiency of bisulfite conversion was >99.8% as verified by deep-sequencing of bis-DNA and analyzing the ratio of C to T conversion of CH (non-CG) dinucleotides.


Marker Selection for Padlock Probe Panel Design


To identify markers to differentiate HCC, LUNC and normal blood methylation signatures the “moderated t-statistics shrinking” approach on 450k methylation data with Benjamini-Hochberg procedure was employed to control FDR at a significance level of 0.05 using pairwise comparisons of 377 HCC samples and 827 LUNC samples (TCGA) and 754 normal blood samples (GSE40279, our previous study (HANNUM REF)). The lists was ranked by adjusted p-value and selected the top 1000 markers for designing padlock probes for differentiating cancer (both LUNC and HCC) versus normal samples and a separate group of 1000 markers for differentiating LUNC versus HCC (FIG. 25).


All 2000 markers were used to design padlock probes for capture and sequencing of cfDNA. Padlock-capture of bis-DNA was based on the technique on methods of Deng, et al., 2009, Nature Biotechnology 27, 353-360; Diep, et al., 2012, Nature Methods 9, 270-272; and Porreca, et al., 2007, Nature Methods 4, 931-936; and with further modifications. Because of a relatively modest total size of captured regions/cg markers, this approach offers much lower cost of sequencing than any current methods including whole methylome-wide seqnencing, therefore enabling us to evaluate a large number of samples. Furthermore, the direct targeted sequencing approach offers digital readout, and requires much less starting cfDNA material (10-15 ng) than more traditional recent methods based on hybridization on a chip (eg. Infinium, Illumina) or target-enrichment by hybridization (eg. SureSelect, Agilent). This approach is also less sensitive to unequal amplification as it utilizes molecular identifiers (UMIs).


Padlock Probe Design, Synthesis and Validation


All probes were designed using the ppDesigner software. The average length of the captured region was 70 bp, with the CpG marker located in the central 80% of the captured region. A 6 bp 6-bp unique molecular identifier (HMI) flanked capture arms to aid in eliminating amplification bias in determination of DNA methylation frequencies. Linker sequence between arms contained binding sequences for amplification primers separated by a variable stretch of Cs to produce probes of equal length. Probes were synthesized as separate oligonucleotides (IDT). For capture experiments, probes were mixed in equimolar quantities and purified on Qiagen columns.


Deep sequencing of the original pilot capture experiments showed significant differences between number of reads captured by most efficient probes and non-efficient probes (60-65% of captured regions with coverage >0.2× of average). To ameliorate this, relative efficiencies were calculated from sequencing data and probes were mixed at adjusted molar ratios. This increased capture uniformity to 85% of regions at >0.5× of average coverage.


Targeted Methylation Sequencing Using Bis-DNA Padlock Probe Capture


10 ng of bisulfite-converted DNA was mixed with padlock probes in 20 μl reactions containing 1× Ampligase buffer (Epicentre). To anneal probes to DNA, 30 second denaturation at 95° C. was followed by a slow cooling to 55° C. at a rate of 0.02° C. per second and incubation for 15 hrs at 55° C. To fill gaps between annealed arms, 5 μl of the following mixture was added to each reaction: 2U of PfuTurboCx polymerase (Agilent), 0.5 U of Ampligase (Epicentre) and 250 pmol of each dNTP in 1× Ampligase buffer. After 5-hour incubation at 55° C., reactions were denatured for 2 minutes at 94° C. 5 μl of exonuclease mix (20 U of Exo I and 100 U ExoIII of, Epicentre) was added and single-stranded DNA degradation was carried out at 37° C. for 2 hours, followed by enzyme inactivation for 2 minutes at 94° C.


Circular capture products were amplified by PCR using primers specific to linker DNA within padlock probes. Both primers contained 10 bp barcodes for unique dual-index multiplexing, and Illumina next-generation sequencing adaptor sequences. PCR was performed as follows: 1× Phusion Flash Master Mix, 3 μl of captured DNA and 200 nM primers, using the following cycle: 10s @ 98° C., 8× of (1s @ 98° C., 5s @ 58° C., 10s @ 72° C.), 25× of (1s @ 98° C., 15s @ 72° C.), 60s @ 72° C. PCR reactions were mixed and the resulting library was size selected on 2.5% agarose gels to include effective captures (˜230 bp) and exclude “empty” captures (˜150 bp). Purity of the libraries was verified by TapeStation (Agilent) and PCR using Illumina flowcell adaptor primers (p5 and p′7) and the concentrations were determined using Qubit dsDNA HS assay (Thermo Fisher). Libraries were sequenced on MiSeq and HiSeq2500 systems (Illumina) using PE100 reads. Median total reads for each sample was 500,000 and on-target mappability 25% (˜125,000 on-target non-unique reads).


Optimization of Capture Coverage Uniformity


Deep sequencing of the original pilot capture experiments showed significant differences between number of reads captured by most efficient probes and non-efficient probes (60˜65% of captured regions with coverage >0.2× of average). To ameliorate this, relative efficiencies were calculated from sequencing data and probes were mixed at adjusted molar ratios. This increased capture uniformity to 85% of regions at >0.5× of average coverage.


Sequencing Data Analysis


Mapping of sequencing reads was done using the software tool bisReadMapper with some modifications. First, UMI were extracted from each sequencing read and appended to read headers within FASTQ files using a custom script. Reads were on-the-fly converted as if all C were non-methylated and mapped to in-silico converted DNA strands of the human genome, also as if all C were non-methylated, using Bowtie2. Original reads were merged and filtered for single UMI, i.e. reads carrying the same UMI were discarded leaving a single, unique read. Methylation frequencies were calculated for all CpG dinucleotides contained within the regions captured by padlock probes by dividing the numbers of unique reads carrying a C at the interrogated position by the total number of reads covering the interrogated position.


DNA Isolation and Digital Quantitative PCR


Tumor and corresponding plasma samples were obtained from patients undergoing surgical tumor resection; samples were frozen and preserved in at −80° C. until use. Isolation of DNA and RNA from samples was performed using DNA/RNA MiniPrep kit and a cfDNA extraction kit, respectively (EliteHealth, Guangzhou Youze, China). To estimate tumor cfDNA fractions, mixing experiments were performed with various fractions of normal cfDNA and HCC tumor genomic DNA (gDNA) and assayed methylation values and copy numbers by dPCR (see next section for details). Digital droplet PCR (ddPCR) was performed according to the manufacturer's specifications (Bio-Rad, Hercules, Calif.). The following ddPCR assay was used in this study: cg10590292-forward primer 5′-TGTTAGTTTTTATGGAAGTTT, reverse primer 5′-AAACIAACAAAATACTCAAA; fluorescent probe for methylated allele detection 5′/6-FAM/TGGGAGAGCGGGAGAT/BHQ1/-3; probe for unmethylated allele detection, 5′/HEX/TTTGGGAGAGTGGGAGATTT/BHQ1/-3′. ddPCR was performed according to the manufacturer's specifications (Bio-Rad, Hercules, Calif.). using the following cycling conditions: 1× of 10 mins @ 98° C., 40× of (30s @ 98° C., 60s @ 53° C.), 1× of 10 mins @ 98° C.


Calculation of Tumor cfDNA Fraction


It was assumed that a particular methylation value observed for an HCC cfDNA sample results from the combined contribution of normal and tumor cfDNA. The fraction of cfDNA originating from the tumor was estimated using the following formula: fraction contributed from tumor DNA in sample i=[methylation value in HCC cfDNA in sample i−mean methylation value of normal cfDNA]/[mean methylation value of tumor DNA−mean methylation value of normal cfDNA]. Using this approach, it was estimated that on average the tumor fraction is around 23% in HCC cfDNA samples. Samples were then grouped according to factors that evaluate tumor load, such as an advanced stage and pre-treatment status, since these factors are expected to affect the tumor fraction in ctDNA. Indeed, it was observed that conditions associated with a higher tumor staging and severity also tended to have a larger tumor fraction. To further vet this approach, a mixing experiment with different fractions of normal cfDNA (0-100%) and tumor genomic DNA (0-100%) was performed and assayed methylation values using digital PCR. It was shown that incremental addition of tumor genomic DNA can increase methylation fraction percentage up to the values observed in the HCC patient samples. Specifically, addition of 10%, 20%, 40%, 60% or 100% fraction of tumor genomic DNA can be predicted by the above formula, when using methylation values obtained from the experiment.


Statistical Analysis


DNA Methylation Marker Selection for Diagnostic and Prognostic Analysis


Out of 2000 initially designed padlock probes, only 1673 were informative, i.e able to give positive and specific PCR amplification signals, and thus were used as capture probes in the subsequent experiments in cfDNA samples. Sequencing depth was used as a sample inclusion criterion. Samples where less than 100 MCB (see below) showed 10× read coverage were excluded from further analysis. Since each MCB incorporated on average ˜3 CG markers, the 10× coverage ensured at least 30 methylation measurements per MCB. Using these criteria, 73% of all samples with a median of 34K mapped reads per sample were included.


After having obtained DNA methylation data for 1673 CG markers, the concept of MCBs to merge proximal CpG markers into a MCB was used, resulting in a total of 888 MCBs. For each MCB, the MCB-specific methylation value was quantified with two numbers: log 10 (total methylated read count+1) and log 10 (total unmethylated read count+1), using the log transform to reduce outlier effects.


About 1673 informative padlock probes were obtained that were able to give positive and specific PCR amplification signals and they were used as capture probes in the subsequent experiments in cfDNA samples. cfDNA samples with less than 100 MCB of >30× coverage were also eliminated. Methylated reads for each marker were defined as total unique methylated reads and methylation values for each marker were defined as the proportion of read counts with methylation divided by total read counts.


cfDNA-Based Diagnostic Classifier Construction Using MCBs (Cd-Score)


cfDNA sample data obtained from patients diagnosed with liver cancer (HCC), lung cancer (LUNC) and normal controls were divided into training and validation cohorts. The full dataset was randomly split with a 1:1 ratio to form the training and validation cohorts. Marker selections: Within the training cohort, the “randomized lasso” scheme was adopted to reduce the sampling dependency and stabilize variable selection in order to select biomarkers with high confidence. The training set was first randomly divided with 1:1 ratio. The variable selection procedure on two thirds of the samples was conducted and withheld a third of the samples for evaluating performance of the feature selection process. The feature selection process consisted of two steps repeated 50 times. MCBs were included for training the final model if they were selected in 40 out of 50 feature selection iteration. A multi-class prediction system based on Friedman et al., 2010, J Stat Softw 33, 1-22 was constructed to predict the group membership of samples in the test data using the panel of MCBs selected. A confusion matrix and ROC curves were also provided to evaluate sensitivity and specificity, in addition to prediction accuracy based on the held out partition of the training set.


Classification process: a two-step classification process was employed: cancer vs normal, LUNC vs HCC by building two binary multinomial logistic regression models. The multinomial logistic regression has the advantage where it can yield an intuitive probability score and allow for easier interpretation. For example, if the cancer-vs-normal model yield a probability score of 70% for a given methylation profile, it suggests that the patient has a 70% chance of having cancer. In order to minimize the number of false cancer predictions, the cancer prediction confidence threshold was set to 80%. For patients with at least 80% chance of cancer, the cancer-vs-cancer regression model was applied for classifying between LUNC and HCC, the classification model would decide only if the classified sample has a confidence of over 55%.


Building a Predictive Model for Prognosis and Survival


The potential to use a combined prognosis score (cp-score) system based on both methylation reads and non-methylated reads was investigated for each MCB in cfDNA for prediction of prognosis in LUNC and HCC in combination with clinical and demographic characteristics including age, gender, and AJCC stage. For each type of cancer, a cp-score model was build and validate it by randomly selecting half of the observations from the full dataset as the training cohort, and treated the rest as the validation cohort. Variable selection on the training cohort was conducted and built the composite score on the validation cohort. Within the training cohort, the “randomized lasso” scheme was adopted to reduce the sampling dependency to stabilize the variable selection in order to select biomarkers with a high confidence. The entire cohort was randomly divided with a 1:1 ratio. The variable selection procedure was conducted on two-thirds of the training cohort. LASSO was implemented with an optimal tuning parameter determined by either the expected generalization error from the 10-fold cross validation or the information based criteria AIC/BIC, whichever yielded the highest (the proportion of explained randomness) with the selected biomarkers. The 10 most recurring features from HCC and in LUNC (Table 8) was then aggregated. To evaluate the predictability of each panel externally, a composite score was obtained for each patient in the validation cohort by multiplying the unbiased coefficient estimates from the Cox regression and the methylation reads. A Kaplan-Meier curve and log-rank test were generated using the dichotomized composite score, which formed a high-risk and low-risk group membership assignment according to its median. This segmentation was compatible with that formed by AJCC stage. Time-dependent ROC was used to summarize the discrimination potential of the composite score, AJCC stage and the combination of two, with ROC curves varying as a function of time and accommodating censored data. Finally, a multivariate Cox regression model was also fitted to assess the significance of potential risk factors.


All the analysis was conducted in R (version 3.2.3) and python (version 2.7.13) with the following packages used: ‘glmnet’, ‘limma’, ‘survival’, ‘sklearn’, ‘lifeline’, ‘survival ROC’, ‘survcomp’,


All hypothesis testing was done by two-sided with p-value <0.05 considered to be statistically significant unless specifically stated otherwise.


Patient and Sample Characteristics


Clinical characteristics and molecular DNA methylation profiles were collected for 827 LUNC and 377 HCC tumor samples from The Cancer Genome Atlas (TCGA) and 754 normal samples from a dataset used in our previous methylation study on aging (GSE40279) (Hannum et al., 2013). Two cohorts of patients were studied. The first cohort was from solid tumor samples from TCGA and the second cohort was from plasma samples from China. To study cfDNA in LUNC and HCC, plasma samples were obtained from 2,396 Chinese patients with HCC or LUNC, and from randomly selected, population-matched healthy controls undergoing routine health care maintenance, resulting in a cohort of 892 LUNC and 1504 HCC patients and 2247 normal healthy controls. Informed written consent was obtained from each study participant. Clinical characteristics of all patients and controls are listed in Table 9.


Identification of Methylation Markers Differentiating LUNC and HCC and Blood


Previous reports indicate that plasma contains DNA released from tissues within the body. It was hypothesized that because cfDNA originating from tumor cells can be detected in a background of cfDNA predominantly released from leukocytes, CpG markers with a maximal difference in methylation values between LUNC or HCC versus normal leukocytes would be most likely to demonstrate detectable methylation differences in the cfDNA of HCC or LUNC patients when compared to that of normal controls. To identify putative markers, methylation data derived from cancer tissue DNA from the TCGA and normal blood including 827 LUNC, 377 HCC, and 754 blood samples from healthy controls were compared. In order to identify DNA sites with significantly different rates of methylation between LUNC or HCC and normal blood, a t-statistic with Empirical Bayes was used for shrinking the variance and selected the top 1000 significant markers, using the Benjamini-Hochberg procedure to control the FDR at a significance level of 0.05. Unsupervised hierarchical clustering of these top 1000 markers was able to distinguish between LUNC, HCC, and normal blood, and between LUNC and HCC (FIG. 25). About 2,000 molecular inversion (padlock) probes corresponding to these 2000 markers for capture-sequencing cfDNA from plasma (1000 for cancer versus normal and 1000 for LUNC versus HCC) were then designed.


cfDNA Diagnostic Prediction Model for LUNC and HCC


The methylation data of the 888 selected Methylation Correlated Blocks (MCB) that showed good methylation ranges in cfDNA samples were further analyzed to identify MCBs that showed significantly different methylation between cancer samples (LUNC and HCC) versus normal control samples. Unsupervised hierarchical clustering of these selected MCBs using methylated reads across samples is shown in FIG. 25C, and distributions of MCB methylated read values for normal, LUNC and HCC samples is shown in FIG. 26. The entire methylation dataset of 888 MCBs was therefore analyzed by Least Absolute Shrinkage and Selection Operator (LASSO) method and further reduced the number of MCBs. LASSO-based feature selection identified 28 MCBs for discriminating LUNC versus HCC and normal, 27 MCBs for discriminating of HCC versus LUNC and normal, 22 MCBs for discriminating of normal vs HCC and LUNC, resulting in 77 unique markers (5 MCBs overlap between models). This approach combined the information captured by the MCBs into a composite cfDNA-based score (composite diagnostic score: cd-score). The utility of this score was evaluated for predicting the presence of LUNC or HCC using a hold-out strategy where samples were randomly assigned to a training set and a validation set with a 1:1 ratio. The scoring system was trained using 229 LUNC, 444 HCC and 1123 normal control cfDNA samples and then validated on 300 LUNC, 445 HCC and 1124 normal samples. Applying the fitted model to the validation set samples yielded a sensitivity of 92.4% for HCC and 85.8% for LUNC, and a specificity of 99.0% for normal controls in a multi-classification scheme (Table 6A). It was found that this model could successfully differentiate LUNC and HCC samples from normal controls in the validation cohort (AUC cancer vs normal=0.979; AUC LUNC vs HCC=0.924; FIG. 19A, Table 6B, Table 6C). Unsupervised hierarchical clustering of the 77 MCBs was able to distinguish HCC and LUNC from normal controls with high specificity and sensitivity (FIG. 19C and FIG. 19D).


Liver diseases, such acirrhosis, and fatty liver, are major risk factors for HCC. Thus, the cd-score of the model was assessed for differentiating between liver diseases and. It was found that the cd-score was able to differentiate HCC patients from those with liver diseases or healthy controls (FIG. 20A). These results were consistent and comparable with those predicted by AFP levels in HCC (FIG. 20B). The cd-score could also differentiate between LUNC patients and non-LUNC patients with a smoking history (>1 pack/day for ten years) who were at an increased risk of LUNC (FIG. 20C). These results were consistent and comparable with those predicted by AFP levels in HCC (FIG. 20D).


Methylation Profiles Predicted Tumor Burden, Treatment Response and Staging


Next, the utility of the cd-score was studied in assessing treatment response, the presence of residual tumor following treatment, and staging of LUNC and HCC. In LUNC, the cd-scores of patients with detectable residual tumor following treatment (n=559) were significantly higher than those with no detectable tumor (n=160) (p<0.001, FIG. 21A). Similarly, there was good correlation between the cd-scores and tumor stage. Patients with early stage disease (I, II) had substantially lower cd-scores compared to those with advanced stage disease (III, IV) (p=<0.005, FIG. 21B). In addition, the cd-scores were significantly lower in patients with complete tumor resection after surgery (n=158) compared with those before surgery (n=67), yet became higher in patients with recurrence (n=56) (p<0.01, FIG. 21C). Furthermore, the cd-scores were significantly higher in patients before treatment (n=67) or with progression (n=136) compared to those with a positive treatment response (n=328) (p<0.001, FIG. 21D). In HCC, The cd-scores of patients with detectable residual tumor following treatment (n=889) were significantly higher than those with no detectable tumor (n=314) (p<0.0001, FIG. 21E). Similarly, there was a highly positive correlation between cd-scores and tumor stage. Patients with early stage disease (I, II) had substantially lower cd-scores compared to those with advanced stage disease (III, IV) (p<0.001, FIG. 21F). In addition, the cd-scores were significantly lower in patients with complete tumor resection after surgery (n=293) compared with those before intervention (n=109), yet became higher in patients with recurrence (n=155) (p<0.01, FIG. 21G). Furthermore, the cd-scores were significantly higher in patients before treatment (n=109) or with progression (n=381) compared to those with treatment response (n=249) (p<0.001, FIG. 21H). Serial longitudinal dynamic changes were obtained of methylation values of CpG site cg10673833 in several individuals with LUNC or HCC patient in order to monitor treatment response and found there was a high correlation between methylation values and treatment outcomes (FIG. 28, FIG. 29 and FIG. 30). Collectively, the results showed the significant correlation between he cd-score (i.e., the amount of cfDNA in plasma) and tumor burden, demonstrating its utility for the prediction of tumor response and for surveillance to detect recurrence.


Diagnostic Utility of cfDNA as Compared with AFP and CEA


Despite enormous efforts, an effective noninvasive blood-based biomarker for surveillance and diagnosis of LUNC and HCC is still lacking. CEA (cancer embryonic antigen) and AFP have filled this role for lung cancer and HCC for decades, but its sensitivity and specificity are inadequate. Moreover, some patients with squamous cell carcinoma or small cell lung cancer will not have increased blood CEA levels. AFP has low sensitivity of 60%, making it inadequate for detection of all patients that will develop HCC and thus severely limiting its clinical utility. In fact, it is common for cirrhotic patients with HCC to show no increase in AFP levels. Strikingly, 30% patients of the HCC study cohort have a normal AFP value (<25 ng/ml). In biopsy-proven LUNC patients of the entire cohort, the cd-score demonstrated superior sensitivity and specificity to CEA for LUNC diagnosis (AUC 0.977 (cd-score) vs 0.856 (CEA), FIG. 21Q). Both cd-score and CEA values were highly correlated with tumor stage (FIG. 21J, FIG. 21B). On other hand, the cd-score demonstrated superior sensitivity and specificity to AFP for HCC diagnosis (AUC 0.993 vs 0.835, FIG. 4R) in biopsy-proven HCC patients. Both cd-score and AFP values were highly correlated with tumor stage (FIG. 21F and FIG. 21N). In patients with treatment response, tumor recurrence, or progression, the cd-score showed more changes from initial diagnosis than that of AFP (FIG. 21G and FIG. 21H, FIG. 21O and FIG. 21P). In LUNC patients with treatment response, tumor recurrence, or progression, the cd-score showed more significant changes from initial diagnosis than that of CEA (FIG. 21C and FIG. 21D, FIG. 21K and FIG. 21L). In LUNC and HCC patients with serial samples, there was a concomitant and significant decrease in cd-score in patients with a positive treatment response than in patients prior to treatment. There was an even further reduction in cd-score in patients after surgery. In contrast, there was an increase in methylation rate in patients with progressive or recurrent disease (FIG. 28). By comparison, CEA and AFP were less sensitive for assessing treatment efficacy in individual patients (FIG. 29 and FIG. 30).


cfDNA Prognostic Model for HCC and LUNC


The potential of using a combined prognosis score (cp-score) based on cfDNA methylation analysis for prediction of prognosis in LUNC and HCC in combination with clinical and demographic characteristics including age, gender, AJCC stage, and AFP value was investigated. Totally 599 LUNC patients and 867 HCC patients enrolled in prognosis analysis (patients without tumor burden are excluded from the analysis). The median follow up time was 9.5 months (rang 0.6-26 months) in LUNC cohort and 6.7 months (rang 1.2-21.0 months) in HCC cohort. In the HCC cohort, the training dataset contained 433 observations with 41 events and the validation dataset contained 434 observations with 58 events. By using statistical learning methods, a predictive model was constructed using 10 CpG MCBs (Table 8) that can separate the HCC cohort into high and low risk groups, with median survival significantly greater in the low-risk group than in the high-risk group (log-rank test=24.323, df=1, p<0.001) (FIG. 22A). In the LUNC cohort, the training dataset contained 299 observations with 61 events and the validation dataset contained 434 observations with 58 events. A panel of 10 CpG markers (Table 8) was able to divide the LUNC cohort into high and low risk groups, with median survival significantly greater in the low-risk group than in the high-risk group (log-rank test=6.697, df=1, p<0.001) (FIG. 22B).


Multivariate Cox regression model showed that the cp-score was significantly correlated with incidence of mortality in both HCC and LUNC. The cp-score was an independent risk factor of survival both in HCC and borderline in LUNC validation cohorts (harzard ratio=2.4881, p=0.000721 in HCC; hazard ratio=1.74, p=0.068 in LUNC; p=0.0017 in LUNC; Table 10). Interestingly, when cp-score and other clinical characteristics were taken into account in HCC, AFP was no longer significant as a risk factor (Table 11). As expected, TNM stage (as defined by AJCC guidelines) predicted the prognosis of patients both in HCC (FIG. 22C) and LUNC (FIG. 22D). The combination of cp-score and TNM staging improved our ability to predict prognosis in both HCC (AUC 0.867, FIG. 22E) and LUNC cohort (AUC 0.825, FIG. 22F).


Methylation Markers in Early Diagnosis of LUNC and HCC


Since LUNC and HCC are very aggressive cancers with poor prognosis and survival, and surgical removal of cancer at stage 1 carries a much more favorable prognosis, early detection becomes a key strategy in reducing morbidity and mortality. The method of using methylation markers for predicting cancer occurrence in high-risk populations was investigated in two prospective studies. In the first study, consecutive patients were recruited from a group of patients with solid lung nodules >10 mm identified on chest CT scans. These patients were enrolled in a study for early lung detection in smokers and underwent CT scan-based lung cancer screening. Patients presenting with a solid lung nodule (between 10 mm and 30 mm in size, n=208, Table 11) were selected to undergo methylation profiling at the time of screening. These patients were subsequently followed through secondary testing to determine whether nodules were due to LUNC or a benign condition due to inflammation or infection by tissue biopsy and pathology verification. The methylation profile was sufficient to differentiate patients with biopsy-proven stage 1 LUNC lesions compared to patients with benign nodules due to inflammatory or infectious conditions (FIG. 23, Table 12). Among the patients with at least 59% confidence for diagnosis, Positive predictive value (PPV) of stage I cancer was 95.9% and negative predictive value (NPV) was 97.4%. Similarly, high risk HCC patients with liver cirrhosis (n=236, Table 11) were prospectively enrolled. The methylation profile was able to predict progression to stage 1 HCC with 89.5% sensitivity and 98.2% specificity (FIG. 24, Table 12) among the patients with at least 58% confidence for diagnosis. PPV was 80.9% and NPV was 99.1%.


In this study, differentially methylated CpG sites were first determined in LUNC and HCC tumor samples versus normal blood. Then, these markers were interrogated in the cfDNA of a large cohort of LUNC and HCC patients as well normal controls. A diagnostic model (cd-score) was developed using methylation of cfDNA to predict the presence of cancer, while at the same time differentiating between LUNC and HCC.


The cd-score discriminated patients with HCC from individuals with HBV/HCV infection, cirrhosis, and fatty liver disease as well as healthy controls. In some instances, it is important that a serum test reliably distinguish these disease states from HCC. According to the results, the sensitivity of the cd-score for HCC is comparable to liver ultrasound, the current standard for HCC screening. In addition, in some instances it is superior to AFP, the only clinically used biomarker for HCC, making cd-score a more cost-effective and less resource-intensive approach. Furthermore, by showing its high correlation with HCC tumor burden, treatment response, and stage, the cd-score of the model demonstrated superior performance than AFP in the instant cohort (AFP values were within a normal range for 40% of our HCC patients during the entire course of their disease). In some cases, the cd-score may be particularly useful for assessment of treatment response and surveillance for recurrence in HCC. Since nearly all of the HCC patients had hepatitis (most likely hepatitis B) in the study, HCC arising from other etiologies may have different cfDNA methylation patterns, Similar to HCC, screening for lung cancer has a high cost, involving CT imaging of the chest, which has an associated radiation exposure and a high false-positive rate. In some cases, the cd-score reliably distinguished smokers and patients with lung cancer and may also have utility in improving screening and surveillance.


Prognostic prediction models were also constructed for HCC and LUNC from the cp-score. The cp-score effectively distinguished HCC and LUNC patients with different prognosis and was validated as an independent prognostic risk factor in a multi-variable analysis in our cohorts. Of note, for predicting prognosis in HCC, cfDNA analysis was again superior to AFP. In some cases, this type of analysis is helpful for identification of patients for whom more or less aggressive treatment and surveillance is needed.









TABLE 6A







Contingency table of multi-classification


diagnosis in validation cohort












Prediction
HCC
LUNC
Normal
















HCC
329
19
6



LUNC
23
145
6



Normal
4
5
921



Undecided
89
131
191



Totals
354
174
930



Correct
329
145
921



Sensitivity (%)
92.4
85.8



Specificity (%)


99.0

















TABLE 6B







Contingency table of binary classification


diagnosis between HCC and normal











Prediction
HCC
Normal















cancer
371
16



normal
4
921



undecided
70
187



Totals
375
937



Correct
371
921



Sensitivity (%)
98.9



Specificity (%)

98.3

















TABLE 6C







Contingency table of binary classification


diagnosis between LUNC and normal











Prediction
LUNC
Normal















cancer
188
16



normal
5
921



undecided
107
187



Totals
193
937



Correct
188
921



Sensitivity (%)
97.4



Specificity (%)

98.3

















TABLE 7





List of MCBs selected by multi-class LASSO and used


for cd-score generation. A, normal; B, LUNC; C, HCC.























Logistic regression



Read


coefficients













Diagnosis
MCBs
counts
Target ID
RefGene
LUNC vs HCC
cancer vs normal










A, Normal













Normal
1-1693966
mc
cg00100121
C1orf114
0.058289236
−0.460201686




non_mc
cg00100121
C1orf114
−0.264294546
−0.362527983



11-474165
mc
cg13912307
SLC39A13
−0.019821436
−0.342748675




non_mc
cg13912307
SLC39A13
−0.119495297
−0.097830945



11-791245
mc
cg08794954
ODZ4
−0.502459754
−0.119001342




non_mc
cg08794954
ODZ4
−0.050537884
−0.237868292



12-561356
mc
cg00344358
GDF11
0.013443025
−0.152292508




non_mc
cg00344358
GDF11
0.210631385
−0.24781796



16-7128
mc
cg05773599
WDR90
−0.220759943
−0.438656833




non_mc
cg05773599
WDR90
0.123092944
−0.14690629



16-856199
mc
cg10174683

0.05314766
0.335012824




non_mc
cg10174683

−0.182838798
−0.279797616



17-579157
mc
cg12054453
TMEM49
0.062499156
−0.478242327




non_mc
cg12054453
TMEM49
0.549358501
0.925660526



17-742484
mc
cg24166450

0.240682421
0.067853997




non_mc
cg24166450

0.443449771
0.566346654



2-1139315
mc
cg09366118
PSD4
0.098868936
0.269196214




non_mc
cg09366118
PSD4
0.039975437
−0.005683265



2-293390
mc
cg21972382
CLIP4
−0.099097051
−0.61331161




non_mc
cg21972382
CLIP4
0.074745373
0.2650865



22-321497
mc
cg03550506
DEPDC5
−0.068645581
−0.755902786




non_mc
cg03550506
DEPDC5
−0.046222783
−0.124629981



3-1138223
mc
cg06722069

−0.427422216
−0.26910555




non_mc
cg06722069

0.089047399
−0.071525276



4-13248
mc
cg07748255
MAEA
0.217866978
−0.457591409




non_mc
cg07748255
MAEA
−0.011474644
−0.340582424



5-1489298
mc
cg22928002
CSNK1A1
−0.19633718
−0.006367323




non_mc
cg22928002
CSNK1A1
−0.131881054
0.459975236



5-429520
mc
egl7757602

0.041309712
0.029262464




non_mc
egl7757602

−0.133713183
−0.355503949



5-429521
mc
egl7757602

0.127773182
0.169659022




non_mc
egl7757602

−0.3026458
−0.526377272



6-276491
mc
cg03161803

−0.134554024
0.040255772




non_mc
cg03161803

−0.076945207
−0.517353281



6-912971
mc
cg01087382
MAP3K7
0.132083258
−0.263202543




non_mc
cg01087382
MAP3K7
0.023651284
0.314783378



7-1017626
mc
cg06721601
CUX1
0.305704307
0.347231675




non_mc
cg06721601
CUX1
−0.061678409
0.340362292



7-1577443
mc
cg27104173
PTPRN2
0.029132702
−0.207025939




non_mc
cg27104173
PTPRN2
0.004420766
−0.098341178



9-885141
mc
cg13740515

0.028390415
−0.269846773




non_mc
cg13740515

−0.208939054
−0.193859618



X-27307
mc
cg13176022
XG
0.511857283
0.042177783




non_mc
cg13176022
XG
0.048888339
−0.232292001







B: LUNC













LUNC
1-1693966
mc
cg00100121
C1orf114
0.058289236
−0.460201686




non_mc
cg00100121
C1orf114
−0.264294546
−0.362527983



11-1169673
mc
cg16858415
SIK3
0.374145093
0.446746019




non_mc
cg16858415
SIK3
0.297361321
0.42281836



11-476248
mc
cg05585544

0.570677409
0.329004001




non_mc
cg05585544

0.127193074
−0.357434967



11-646423
mc
cg18518074
EHD1
0.305179647
0.273679057




non_mc
cg18518074
EHD1
−0.170377122
−0.416439448



11-779080
mc
cg03423942
USP35
0.049737225
0.05832888




non_mc
cg03423942
USP35
−0.152714594
0.278866832



11-791245
mc
cg08794954
ODZ4
−0.502459754
−0.119001342




non_mc
cg08794954
ODZ4
−0.050537884
−0.237868292



12-687588
mc
cg20323175

0.148739937
0.171957693




non_mc
cg20323175

−0.249738102
0.397951832



12-72767
mc
cg16959747
RBP5
0.09713964
−0.079430598




non_mc
cg16959747
RBP5
0.055654584
0.2763291



13-256210
mc
cg25366582

0.168798263
−0.174946818




non_mc
cg25366582

0.031754087
0.303827389



16-571473
mc
cg06880930
CPNE2
0.235333764
0.168616245




non_mc
cg06880930
CPNE2
−0.072680275
−0.290080025



17-48361
mc
cg25526759
GP1BA
−0.01244657
0.090902282




non_mc
cg25526759
GP1BA
0.14744695
0.344718909



17-579157
mc
cg12054453
TMEM49
0.062499156
−0.478242327




non_mc
cg12054453
TMEM49
0.549358501
0.925660526



17-742484
mc
cg24166450

0.240682421
0.067853997




non_mc
cg24166450

0.443449771
0.566346654



19-460568
mc
cg27391679
OPA3
0.044319948
0.082373402




non_mc
cg27391679
OPA3
−0.243746298
−0.096052603



2-293390
mc
cg21972382
CLIP4
−0.099097051
−0.61331161




non_mc
cg21972382
CLIP4
0.074745373
0.2650865



2-382010
mc
cg20626840
FAM82A1
0.237250003
0.741677389




non_mc
cg20626840
FAM82A1
0.272198397
0.04877136



2-95267
mc
cg15545942
ASAP2
−0.557516889
−0.185093352




non_mc
cg15545942
ASAP2
−0.11738118
−0.171014921



3-1138223
mc
cg06722069

−0.427422216
−0.26910555




non_mc
cg06722069

0.089047399
−0.071525276



5-1767847
mc
cg04466840
RGS14
−0.206022964
−0.032074773




non_mc
cg04466840
RGS14
−0.029935092
0.110871368



5-429520
mc
cg17757602

0.041309712
0.029262464




non_mc
cg17757602

−0.133713183
−0.355503949



6-1574304
mc
cg17475813
ARID1B
0.129475454
0.045380298




non_mc
cg17475813
ARID1B
−0.398696708
−0.14398793



6-283040
mc
cg08343881
ZNF323
0.002401379
0.00651448




non_mc
cg08343881
ZNF323
0.255186684
0.345026825



6-352655
mc
cg02919168
DEF6
−0.372390965
−0.054727228




non_mc
cg02919168
DEF6
0.070785826
−0.098822441



6-912971
mc
cg01087382
MAP3K7
0.132083258
−0.263202543




non_mc
cg01087382
MAP3K7
0.023651284
0.314783378



7-1017626
mc
cg06721601
CUX1
0.305704307
0.347231675




non_mc
cg06721601
CUX1
−0.061678409
0.340362292



7-1228399
mc
cg22024657
SLC13A1
0.013826947
0.222664259




non_mc
cg22024657
SLC13A1
−0.056854833
0.037841165



8-1025044
mc
cg18004756
GRHL2
−0.293288901
0.034342458




non_mc
cg18004756
GRHL2
−0.043777572
−0.203017346



8-1444164
mc
cg12188860
TOP1MT
0.094522888
−0.083700307




non_mc
cg12188860
TOP1MT
−0.083425117
−0.026557506











C: HCC



















Logistic regression





Read


coefficients













Diagnosis
MCBs
Counts
Target ID
RefGene
LUNC vs
Cancer vs





HCC
1-1695560
mc
cg16054275
F5
−0.214291458
−0.606905543




non_mc
cg16054275
F5
0.524710881
−0.648261504



1-2035950
mc
cg06637618
ATP2B4
−0.198233007
−0.368854771




non_mc
cg06637618
ATP2B4
0.023774113
0.157886336



1-2130901
mc
cg04607844

−0.293894301
−0.012727778




non_mc
cg04607844

0.119395918
0.246150856



10-80958
mc
cg18187680
FLJ45983
−0.028534778
0.369256194




non_mc
cg18187680
FLJ45983
−0.130812947
−0.247877838



12-1222773
mc
cg26386472
HPD
−0.472436655
0.005424793




non_mc
cg26386472
HPD
0.131065837
−0.084249928



15-555695
mc
cg02712036
RAB27A
−0.178526212
0.136059681




non_mc
cg02712036
RAB27A
−0.136916211
0.13904572



16-724595
mc
cg07864976

0.159138061
0.220936294




non_mc
cg07864976

−0.174358026
0.041182013



17-579157
mc
cg12054453
TMEM49
0.062499156
−0.478242327




non_mc
cg12054453
TMEM49
0.549358501
0.925660526



17-800195
mc
cg20651080
DUS1L
0.16564247
0.179098878




non_mc
cg20651080
DUS1L
−0.072821844
0.405200855



17-803588
mc
cg11252953
C17orf101
0.009872935
0.197906181




non_mc
cg11252953
C17orf101
−0.088619833
−0.047902639



19-459097
mc
cg06663668
CD3EAP
0.026258194
0.077247979




non_mc
cg06663668
CD3EAP
−0.209537326
0.55413999



19-546460
mc
cg11441617
CNOT3
−0.272150647
0.31583956




non_mc
cg11441617
CNOT3
−0.485831577
0.011388271



19-546461
mc
cg11441617
CNOT3
0.345456601
0.601027916




non_mc
cg11441617
CNOT3
0.252775143
0.151478291



2-1139315
mc
cg09366118
PSD4
0.098868936
0.269196214




non_mc
cg09366118
PSD4
0.039975437
−0.005683265



21-364214
mc
cg01519261
RUNX1
0.350276252
0.086780025




non_mc
cg01519261
RUNX1
0.205448988
−0.230292808



21-364215
mc
cg01519261
RUNX1
−0.068910744
0.175787941




non_mc
cg01519261
RUNX1
−0.470630509
−0.686309747



22-185277
mc
cg02415779

0.056697461
0.313717112




non_mc
cg02415779

0.122239689
−0.243806777



22-378130
mc
cg00107982
ELFN2
−0.03021978
0.143360514




non_mc
cg00107982
ELFN2
−0.063348947
0.490504707



4-13248
mc
cg07748255
MAEA
0.217866978
−0.457591409




non_mc
cg07748255
MAEA
−0.011474644
−0.340582424



4-840359
mc
cg19255783
PLAC8
0.024851345
0.211399591




non_mc
cg19255783
PLAC8
−0.154998544
−0.217029094



5-1715385
mc
cg25650256
STK10
−0.296078256
0.069437129




non_mc
cg25650256
STK10
−0.134013643
0.178025221



6-262504
mc
cg05414338
HIST1H3F
−0.291906307
0.385669376




non_mc
cg05414338
HIST1H3F
−0.188285503
0.279553121



6-315278
mc
cg06393830
NFKBIL1
0.120351939
0.122094253




non_mc
cg06393830
NFKBIL1
0.206309569
−0.082505936



6-329093
mc
cg00862588
HLA-DMB
−0.322857861
0.005678509




non_mc
cg00862588
HLA-DMB
0.382920486
−0.054366936



6-912971
mc
cg01087382
MAP3K7
0.132083258
−0.263202543




non_mc
cg01087382
MAP3K7
0.023651284
0.314783378



7-1000913
mc
cg03113878
C7orf51
−0.173651065
−0.291203774




non_mc
cg03113878
C7orf51
−0.009576582
0.090427416



7-759325
mc
cg21217886
HSPB1
−0.135144273
−0.558395663




non_mc
cg21217886
HSPB1
−0.163997602
−0.133445148
















TABLE 8







Characteristics of 10 MCBs in LUNC prognosis prediction


and 10 MCBs in HCC prognosis prediction











Features
Target ID
RefGene














LUNC
mc_17-742484
cg24166450




non_mc_2-741532
cg02478828
DGUOK



mc_2-2355288
cg08436738




mc_1-295863
cg04933208
PTPRU



non_mc_22-358223
cg20146967
MCM5



non_mc_16-900927
cg07860918
GAS8



mc_20-374337
cg16119522
PPP1R16B



non_mc_3-1960650
cg05556202
TM4SF19



mc_12-687588
cg20323175




non_mc_10-299484
cg13324103
SVIL


HCC
mc_6-262503
cg05414338
HIST1H3F



mc_6-733300
cg17126142
KCNQ5



non_mc_7-450187
cg06787669
MYO1G



mc_19-185898
cg06747543
ELL



mc_12-1222773
cg26386472
HPD



non_mc_1-20665
cg00866690
PRKCZ



mc_12-939663
cg11225410
SOCS2



mc_7-450187
cg06787669
MYO1G



mc_6-283040
cg08343881
ZNF323



mc_6-733299
cg17126142
KCNQ5
















TABLE 9







Clinical characteristics of study cohort














TCGA
TCGA







HCC
LUNC
GSE
HCC
LUNC
Normal


Characteristic
tissue
tissue
Normal
blood
blood
blood





Total (n)
377 
827 
754
1504  
892
2247


Gender


Female-no. (%)
122(32.4)
340(41.1)
401(53.2)
146(9.7) 
263(29.5)
507(22.6)


Male-no. (%)
255(67.6)
487(58.9)
353(46.8)
991(65.9)
487(54.6)
480(21.4)


NA
0
0
 0
367(24.4)
142(15.9)
1260(56.1) 


Age (years)


Mean
61 
68 
 63
54 
 58
 48


Range
16-90
33-90
19-101
11-85
19-85
19-90


Pathology


Hepatocellular text missing or illegible when filed
367(97.3)
0
NA
1504(100)  
 0
NA


Adenocarcinoma(%)
0
458(55.4)
NA
0
402(45.1)
NA


Squamous cell text missing or illegible when filed
0
369(44.6)
NA
0
138(15.5)
NA


Small Cell Lung text missing or illegible when filed
0
0
NA
0
79(8.9)
NA


Others(%)
10(2.7)
0
NA
0
273(30.6)
NA


Stage


I (%)
175(46.4)
424(51.3)
NA
206(13.7)
58(6.5)
NA


II (%)
 87(23.1)
115(13.9)
NA
202(13.4)
52(5.8)
NA


III (%)
 86(22.8)
261(31.6)
NA
612(40.7)
148(16.6)
NA


IV (%)
 6(1.6)
25(3.0)
NA
134(8.9) 
463(51.9)
NA


NA (%)
23(6.1)
 2(0.2)
NA
350(23.3)
171(19.2)
NA


Tumor burden


Tumor free (%)
236(62.6)
503(60.8)
NA
314(20.9)
160(17.9)
NA


With tumor (%)
114(30.2)
159(19.2)
NA
889(59.1)
599(67.2)
NA


NA (%)
27(7.2)
165(20.0)
NA
301(20.0)
133(14.9)
NA


EGFR status


Wide type (%)
NA
400(48.4)
NA
NA
102(11.4)
NA


Mutation (%)
NA
100(12.1)
NA
NA
69(7.7)
NA


NA (%)
NA
327(39.5)
NA
NA
721(80.8)
NA


Hepatitis


Positive (%)
120(31.8)
NA
NA
623(95.3)
NA
379(16.9)


Negative (%)
119(31.6)
NA
NA
10(1.5)
NA
571(25.4)


NA (%)
138(36.6)
NA
NA
21(3.2)
NA
1297(57.7) 


Smoking (%)


Current smoker (%)
NA
725(87.7)
NA
NA
NA
192(8.5) 


Non-smoker (%)
NA
80(9.7)
NA
NA
NA
670(29.8)


NA (%)
NA
22(2.6)
NA
NA
NA
1385(61.6) 






text missing or illegible when filed indicates data missing or illegible when filed














TABLE 10







Multivariate survival analysis for HCC patients and LUNC


patients with composite-score of methylation markers


(cp-score) and relevant variables in validation cohorts















coef
exp(coef)
se(coef)
z
p
lower 0.95
upper 0.95











HCC














cp-score
0.9115
2.4881
0.2696
3.3813
0.0007
0.3830
1.4400


stage
0.3336
1.3960
0.3034
1.0995
0.2715
−0.2612
0.9284


AFP
−0.0861
0.9175
0.2115
−0.4069
0.6841
−0.5008
0.3286


age
−330.8028
0.0000
75.7050
−4.3696
0.0000
−479.2145
−182.3911


Gender
−0.0885
0.9153
0.2361
−0.3748
0.7078
−0.5512
0.3743







LUNC














cp-score
0.5577
1.7467
0.3056
1.8249
0.0680
−0.0414
1.1569


stage
0.4647
1.5916
0.8422
0.5518
0.5811
−1.1863
2.1157


CEA
0.2695
1.3093
0.6035
0.4466
0.6552
−0.9135
1.4525


age
−321.0835
0.0000
94.6796
−3.3913
0.0007
−506.6928
−135.4742


Gender
0.0897
1.0938
0.3858
0.2325
0.8162
−0.6666
0.8460
















TABLE 11







Clinical characteristics and sensitivity/specificity


for detection of stage I LUNC and benign lung nodules


Sensitivity and Specificity for the Detection of Lung Cancer










Stage I




lung cancer
Benign nodules


Characteristic
N = 116
N = 116












Age-yr




Median
61
52


Range
29-83
26-86


Gender


Female
80
53


male
36
63


Pathology


Adenocarcinoma
100
NA


Squamous cell carcinoma
6
NA


Small Cell Lung Cancer
0
NA


Others
0
NA


Nodule size-mm


Mean
14.6 ± 6.1
7.2 ± 2.1


Median
15.0
  6.4


Range
 3-58
 2.7-18.2


Stage


AIS
7
NA


MIA
8
NA


IA
78
NA


IB
13
NA


Sensitivity - % (95% CI)
77.8


Specificity - % (95% CI)
85.3


Positive predictive value - % (95% CI)
81.0


Negative predictive value - % (95% CI)
84.1
















TABLE 12







Clinical characteristics and sensitivity/specificity for


detection of progression to stage I HCC from liver cirrhosis


Sensitivity and Specificity for the Detection of Liver Cancer










Stage I




liver cancer
Cirrhosis


Characteristic
N = 204
N = 242












Age-yr




Median
62
52


Range
21-81
25-82


Gender


Female
32
47


Male
172
195


Hepatitis B


Positive
198
193


Negative
4
13


NA
2
36


AFP


<25 ng/ml
113
196


>25
43
29


NA
48
17


Sensitivity - % (95% CI)
94.9


Specificity - % (95% CI)
92.6


Positive predictive value - % (95% CI)
94.9


Negative predictive value - % (95% CI)
92.8









Example 5—Circulating Tumor DNA Methylation Profiles for Diagnosis and Prognosis of Colorectal Cancer

Colorectal cancer (CRC) is one of the most common cancers in the world. In some instances, detection of CRC at early and/or intermediate stage provides a better prognosis than detection at an advanced stage. Serum Carcinoembryonic antigen (CEA) quantification is a non-invasive tool for the detection of cancer. However, in some instances, the CEA test has a low sensitivity for the detection of CRC.


Circulating tumor DNA (ctDNA) is tumor-derived fragmented DNA in the circulatory system, comes, e.g., from dead tumor cells through necrosis and apoptosis. In this study, DNA methylation status of genes obtained from ctDNA samples are determined and are further utilized for detection of CRC.


Patient Data


Tissue DNA methylation data was obtained from The Cancer Genome Atlas (TCGA). Complete clinical, molecular, and histopathological datasets are available at the TCGA website. Whole blood DNA methylation profiles from healthy donors were generated in an aging study (GSE40279) in which DNA methylation profiles for CRC and blood were analyzed. Individual institutions that contributed samples coordinated the consent process and obtained informed written consent from each patient in accordance to their respective institutional review boards.


The cfDNA cohort consisted of 801 CRC patients and 1021 normal control from the Sun Yat-sen University Cancer Center in Guangzhou, Xijing Hospital in Xi'an, and the West China Hospital in Chengdu, China. Patients who presented with CRC from stage I-IV were selected and enrolled in this study. Patient characteristics and tumor features are summarized in Table 18. The TNM staging classification for CRC is according to the 7th edition of the AJCC cancer staging manual. This project was approved by the institutional review board of Sun Yat-sen University Cancer Center, Xijing Hospital, and West China Hospital. Informed consent was obtained from all patients. Human blood samples were collected by venipuncture, and plasma samples were obtained by taking the supernatant after centrifugation and stored at −80° C. before cfDNA extraction.


Prospective CRC Screening Cohort Study


A CRC study was conducted using plasma samples on screening and early detection of a high-risk screening population in order to assess feasibility of using methylation markers for predicting CRC occurrence in high-risk populations. The cohort included individuals whom based on questionnaires were definition as high risk with CRC, and individuals whom are asymptomatic and aged 45 or above scheduled for screening colonoscopy from September 2015-December 2017. A total of 1450 subjects at high risk of CRC were scheduled for colonoscopy and cfDNA methylation test with the following situation: (i) age >45 years, (ii) ever smoking, alcohol consumption, diabetes mellitus; (iii) family history present (two or more first degree relatives with CRC or one or more with CRC at age 50 years or less; or known Lynch syndrome or familial adenomatous polyposis); (iv) had positive results on fecal blood testing or change in bowel habit. excluded subjects were those who had a personal history of colorectal neoplasia, digestive cancer, or inflammatory bowel disease; had undergone colonoscopy within the previous 10 years or a barium enema, computed tomographic colonography, or sigmoidoscopy within the previous 5 years; had undergone colorectal resection for any reason other than sigmoid diverticula; had overt rectal bleeding within the previous 30 days. The present study prospectively recruited screening subjects who gave informed consent and received colonoscopy in this program. This project was approved by the IRBs of Sun Yat-sen University Cancer Center, Xijing Hospital, and West China Hospital.


Statistical Analysis


Solid Tumor and Whole Blood Methylation Profiles


DNA methylation data of 485,000 sites generated using the Infinium 450K Methylation Array were obtained from the TCGA and a dataset generated from an aging study (GSE40279) in which DNA methylation profiles for CRC and blood were analyzed. Both primary solid tissues from cancer patients and whole blood from healthy donor were measured by Illumina 450k infimum bead chip. IDAT format files of the methylation data were generated containing the ratio values of each scanned bead. Using the minfi package from Bioconductor, these data files were converted into a score, referred to as a Beta value. Methylation values of the Chinese cohort were obtained by targeted bisulfate sequencing using a molecular inversion probe and analyzed as described below.


DNA Methylation Marker Pre-Selection for Diagnostic and Prognostic Analysis


A differential methylation analysis on TCGA data using a “moderated t-statistics shrinking” approach was performed and the p-value for each marker was then corrected by multiple testing by the Benjamini-Hochberg procedure to control FDR at a significance level of 0.05. The list was ranked by adjusted p-value and the top 1000 markers were selected for designing padlock probes for differentiating CRC versus normal samples. 1,000 molecular inversion (padlock) probes were then designed corresponding to these 1000 markers for capture-sequencing of cfDNA in CRC plasma. All padlock probe design and capture of bis-DNA were based on published techniques with some modification. Only 544 padlock probes were able to give positive and specific PCR amplification signals and they were therefore used as capture probes in the subsequent experiments in cfDNA samples. cfDNA samples with low quality or fewer than 30,000 reads per sample were also eliminated. About 1822 cfDNA samples were included in the subsequent study (801 CRC blood samples and 1021 normal blood samples). Methylated reads for each marker were defined as total unique methylated reads and methylation values for each marker were defined as the proportion of read counts with methylation divided by total read counts. For particular methylation markers with less than 20 unique reads, an imputed mean methylation value of CRC or normal healthy controls were used.


Building a Diagnostic Model


cfDNA sample data obtained from patients diagnosed with CRC (n=801) and healthy control (n=1021) was randomly split into training and validation cohorts with a 2:1 ratio. Next, two variable selection methods were applied which were suitable for high-dimensionality on the prescreened training dataset: Least Absolute Shrinkage and Selection Operator (LASSO) and Random Forest based variable selection method using OOB error. As results can depend strongly on the arbitrary choice of a random sample split for sparse high-dimensional data, an analysis of the “multi-split” method was adopted, which improved variable selection consistency while controlling finite sample error. For LASSO selection operator, 75 percent of the dataset were subsampled replacement 500 times and the markers were selected with repeat occurrence frequency of more than 450. The tuning parameters was determined according to the expected generalization error estimated from 10-fold cross-validation and information-based criteria AIC/BIC, and the largest value of lambda was adopted such that the error was within one standard error of the minimum, known as “1-se” lambda. For the random forest analysis, using the OOB error as a minimization criterion, variable elimination was carried out from the random forest by setting variable a dropping fraction of each iteration at 0.3. The overlapping methylation markers were then selected by the two methods for model building a binary prediction. A logistic regression model was fitted using these 9 markers as the covariates and obtained a combined diagnosis score (designated as cd-score) by multiplying the unbiased coefficient estimates and the marker methylation value matrix in both the training and validation datasets. The predictability of the model was evaluated by area under ROC (AUC, also known as C-index), which calculated the proportions of concordant pairs among all pairs of observations with 1.0 indicating a perfect prediction accuracy. Confusion tables were generated using an optimized cd-score cutoff with a maximum Youden's index.


The pre-treatment or initial methylation level was obtained at the initial diagnosis, and the post-treatment level was evaluated approximately 2 months after treatment, where the treatment referred to either chemotherapy or surgical resection of tumor. The primary endpoint (including response to treatment: progressive disease (PD), partial response (PR) and stable disease (SD)) was defined according to the RECIST guideline. For patients treated with surgical removal and no recurrence at time of evaluation, it was assumed that they had complete response (CR). The difference of cd-score distribution between clinical categories was examined by two-sided t-test as the cd-score was shown to be non-normally distributed using a Shapiro-Wilk Test.


Building a Predictive Model for Prognosis and Survival


The potential to use a combined prognosis score (cp-score) system for prediction of prognosis in CRC combination with clinical and demographic characteristics including age, gender, and AJCC stage was investigated. Tumor samples were selected from diagnosis's training cohort as training data set, and the tumor samples from validation cohort as the validation data set. Variable selection was conducted on the training data set and built the composite score on the validation cohort. A univariate pre-screening procedure was first performed to remove excessive noise to facilitate the computational analysis, which is generally recommended prior to applying any variable selection method. A marker with p-value <0.05 from the Wald statistic was retained in the dataset. Second, a similar subsampling strategy in a diagnosis marker selecting process based on LASSO-cox method was used to shrink the marker numbers to a reasonable range (less than events). Slightly different from the binary classifier, subsampling was carried out in the training dataset with replacement in case that the event proportion was too low for a model construction. The frequency cutoff was set as 50 to retain approximately 1/10th of total events. The above analysis generated 2 final markers to construct a prognostic signature (Table 14). The cp-score was then calculated from the linear predictor of multi variate Cox regression model with the 2 markers. A Kaplan-Meier curve and log-rank test were generated using the dichotomized cp-score, which formed a high-risk and low-risk group membership assignment according to its median. This segmentation was compatible with that formed by AJCC stage. Time-dependent ROC was used to summarize the discrimination potential of the cp-score, AJCC stage, CEA level, primary tumor location and the combination of all factors, with ROC curves varying as a function of time and accommodating censored data. Finally, a multivariate Cox regression model was fitted to assess the significance of potential risk factors.


All hypothesis testing in the prognostic analysis section were done by two-sided with p-value <0.05 considered to be statistically significant. All the analysis was conducted in R version 3.4.3 with the following packages used: ‘glmnet’, ‘pROC’, ‘limma’, ‘survival’, ‘survival ROC’, ‘survcomp’.


Unsupervised Discovery of ctDNA Methylation-Based Subtypes


Training dataset (n=528) was adopted to discover CRC groups/subtypes. Aiming to narrow down markers and get meaningful clustering by methylation information provided itself, an algorithm was used with iterative refinement of key features that was modified for better represents each cluster. This algorithm was modified as follows. Briefly, the markers were first filtered out with low variability using a threshold of median absolute deviation <0.5. The matrix was then adjusted by mean centering on markers and samples in training dataset. Secondly, an iterator procedure was used to classify and get a marker list for predicting subtypes: (i) Consensus Clustering was used to initially cluster the dataset and Cluster numbers was determined by relative change in area under CDF curve (AA, cutoff=0.05); (ii) Calculated centroid of each cluster and get correlation coefficient vector of each sample by multiply methylation matrix and centroid matrix. (iii) Re-cluster samples based on correlation coefficient vectors. The number of new clusters was determined by the maximum average silhouette width; (iv) Get differentially methylated markers among the new clusters by using a moderated F-test (from ‘limma’ package); Then feed the subset of matrix in terms of differential methylated markers into next iteration. The iteration stopped when differential methylated marker list did not change from the previous run and this marker set was finally used as subtype signature for predicting CRC subtypes.


Validation was performed on the predefined validation dataset (n=273). To get the subtype of validation samples, centroid methylation value of each cluster was first calculated with the methylation level of selected subtyping signature in the training dataset. Here, the centroid methylation value was defined as a representative methylation value of a cluster of samples by get mean value of signature across samples. Secondly, samples from validation cohort were assigned to clusters according to maximum Pearson correlation coefficient to each centroid value.


Cell-Free DNA Extraction from Plasma Samples


Plasma samples were frozen and preserved in at −80° C. until use. 20,000 or more total unique reads per sample were observed to fulfilled this criterion, and in order to reliably produce >20,000 unique reads, a volume of 1.5 ml or more plasma was required. This relationship between the amount of cfDNA in plasma and detected copy number was further accessed using digital droplet PCR. In addition, it was found that 1.5 ml of plasma yielded >10 ng of cfDNA, which produced at least 140 copies of detected amplicons in each digital droplet PCR assay. Based on these findings, 15 ng/1.5 ml was used as a cutoff to obtain reliable measurements of DNA methylation for all experiments in this study. For all cfDNA extractions, the EliteHealth cfDNA extraction Kit was used (EliteHealth, Guangzhou, China) according to the manufacturer's recommendations.


Bisulfite Conversion of Genomic or cfDNA


10 ng of DNA was converted to bis-DNA using EZ DNA Methylation-Lightning™ Kit (Zymo Research) according to the manufacturer's protocol. Resulting bis-DNA had a size distribution of ˜200-3000 bp, with a peak around ˜500-1000 bp. The efficiency of bisulfite conversion was >99.8% as verified by deep-sequencing of bis-DNA and analyzing the ratio of C to T conversion of CH (non-CG) dinucleotides.


Determination of DNA Methylation Levels by Deep Sequencing of Bis-DNA Captured with Molecular-Inversion (Padlock) Probes


In order to identify DNA sites with significantly different rates of methylation between CRC and normal blood, the variance was shrank using a t-statistic with Empirical Bayes36 and selected the top 1000 significant markers with the Benjamini-Hochberg procedure 37 controlling the FDR at a significance level of 0.05. CRC and normal blood were successfully distinguished by unsupervised hierarchical clustering of these top 1000 markers. 1,000 molecular inversion (padlock) probes corresponding to these 1000 markers were then designed for capture-sequencing of cfDNA. Capture and sequencing were performed in bisulfite-converted cfDNA samples.


Probe Design, Synthesis and Validation


All probes were designed using the ppDesigner software. The average length of the captured region was 100 bp, with the CpG marker located in the central portion of the captured region. Linker sequence between arms contained binding sequences for amplification primers separated by a variable stretch of Cs to produce probes of equal length. A 6-bp unique molecular identifier (UMI) sequence was incorporated in probe design to allow for the identification of unique individual molecular capture events and accurate scoring of DNA methylation levels. For capture experiments, probes were synthesized as separate oligonucleotides using standard commercial synthesis methods (IDT) and mixed in equimolar quantities, followed by purification using Qiagen columns.


There were significant differences between the number of reads captured by the most efficient probes and the non-efficient probes (60-65% of captured regions with coverage >0.2× of average) as shown by deep sequencing of the original pilot capture experiments. To ameliorate this, relative efficiencies were calculated from sequencing data and probes were mixed at adjusted molar ratios, which increased capture uniformity to 85% of regions at >0.5× of average coverage.


Bis-DNA Capture


To anneal probes to DNA, 10 ng of bisulfite-converted DNA was mixed with padlock probes in 20 μl reactions containing 1× Ampligase buffer (Epicentre), followed by 30 second denaturation at 95° C. was followed by a slow cooling to 55° C. at a rate of 0.02° C. per second. Hybridization was left to complete for 15 hrs at 55° C. To fill gaps between annealed arms, 5 μl of the following mixture was added to each reaction: 2U of PfuTurboCx polymerase (Agilent), 0.5 U of Ampligase (Epicentre) and 250 pmol of each dNTP in 1× Ampligase buffer. After 5-hour incubation at 55° C., reactions were denatured for 2 minutes at 94° C. 5 μl of exonuclease mix (20 U of Exo I and 100 U of ExoIII, both from Epicentre) was added and single-stranded DNA degradation was carried out at 37° C. for 2 hours, followed by enzyme inactivation for 2 minutes at 94° C.


Circular products of site-specific capture were amplified by PCR with concomitant barcoding of separate samples. Amplification was carried out using primers specific to linker DNA within padlock probes, one of which contained specific 6 bp barcodes. Both primers contained Illumina next-generation sequencing adaptor sequences. PCR was performed as follows: 1× Phusion Flash Master Mix, 3 μl of captured DNA and 200 nM primers, using the following cycle: 10s @ 98° C., 8× of (1s @ 98° C., 5s @ 58° C., 10s @ 72° C.), 25× of (1s @ 98° C., 15s @ 72° C.), 60s @ 72° C. PCR reactions were mixed and the resulting library was size selected to include effective captures (˜230 bp) and exclude “empty” captures (˜150 bp) using Agencourt AMPure XP beads (Beckman Coulter). Purity of the libraries was verified by PCR using Illumina flowcell adaptor primers (P5 and P7) and the concentrations were determined using Qubit dsDNA HS assay (Thermo Fisher). Libraries were sequenced using MiSeq and HiSeq2500 systems (Illumina).


Optimization of Capture Coverage Uniformity


Deep sequencing of the original pilot capture experiments showed significant differences between number of reads captured by most efficient probes and non-efficient probes (60˜65% of captured regions with coverage >0.2× of average). To ameliorate this, relative efficiencies were calculated from sequencing data and probes were mixed at adjusted molar ratios. This increased capture uniformity to 85% of regions at >0.2× of average coverage.


Sequencing Data Analysis


Mapping of sequencing reads was done using the software tool bisReadMapper with some modifications. First, UMI were extracted from each sequencing read and appended to read headers within FASTQ files using a custom script. Reads were on-the-fly converted as if all C were non-methylated and mapped to in-silico converted DNA strands of the human genome, also as if all C were non-methylated, using Bowtie2. Original reads were merged and filtered for single UMI, i.e. reads carrying the same UMI were discarded leaving a single, unique read. Methylation frequencies were calculated for all CpG dinucleotides contained within the regions captured by padlock probes by dividing the numbers of unique reads carrying a C at the interrogated position by the total number of reads covering the interrogated position.


Identification of Methylation Correlated Blocks (MCB)


In order to maximize the ability to measure small differences in DNA methylation advantage was taken of the notion that closely positioned CpG tend to have similar methylation levels, what is believed to be a result of the processivity and lack of sequence-specificity of DNA methyltransferases and demethylases, as well as the concept of haplotype blocks in genetic linkage analysis. Sequencing data was used to a generate MCB map using CRC and normal cfDNA samples. Pearson correlation coefficients between methylation frequencies of each pair of CpG markers separated by no more than 200 bp were calculated separately across 50 cfDNA samples from each of the two diagnostic categories, e.g. normal blood and CRC. A value of Pearson's r<0.5 was used to identify boundaries between adjacent markers with uncorrelated methylation. Markers not separated by a boundary were combined into Methylation Correlated Blocks (MCBs). This procedure identified a total of ˜1550 MCBs in each diagnostic category within the padlock data, combining between 2 and 22 CpG positions in each block. Methylation frequencies for entire MCBs were calculated by summing up the numbers of Cs at all interrogated CpG positions within a MCB and dividing by the total number of C+Ts at those positions.


Droplet Digital PCR


The methylation status of cg10673833 was determined for each of these samples using a droplet digital PCR paradigm featuring a Bio-Rad (Carlsbad, Calif.) QX-200 Droplet Reader and an Automated Droplet Generator (AutoDG). In brief, 10 ng of DNA from each subject was bisulfite converted using EZ DNA Methylation-Lightning™ Kit (Zymo Research). An aliquot of each sample was pre-amplified, diluted 1:3,000, and then PCR amplified using fluorescent, dual labeled primer probe sets specific for cg10673833 from Behavioral Diagnostics and Universal Digital PCR reagents and protocols from Bio-Rad. The number of droplets was determined using a QX-200 droplet counter and analyzed using QuantiSoft software. The results were expressed as a percent methylation. Reactions were excluded if fewer than 10,000 droplets were counted.


Patient and Sample Characteristics


The clinical characteristics and the methylation profiling of 459 CRC tumor samples were collected from The Cancer Genome Atlas (TCGA) and 754 normal samples from a dataset used in a previous methylation study on aging (GSE40279) to identify the CRC specific methylation markers. To study cfDNA in CRC, plasma samples were obtained from 801 Chinese patients with CRC, and from randomly selected, population-matched 1021 healthy controls undergoing routine health care maintenance (FIG. 31A). Informed written consent was obtained from each participant. Clinical characteristics of all patients and controls are listed in Table 18.


A total of 1450 participants with high-risks of CRC were enrolled in the prospective screening cohort study and undergoing colonoscopy and cfDNA methylation test (FIG. 31B). Among them 18 participants were found to have colorectal cancer on colonoscopy (prevalence, 1.2%), 78 participants had advanced precancerous lesions (prevalence, 5.3%). Clinical characteristics of all participants from this cohort are listed in Table 19.


Identification of Methylation Markers Differentiating CRC and Blood


To identify putative markers that differentiate between CRC and normal blood samples, methylation data derived from CRC tissue DNA from the TCGA and normal blood were compared. The methylation data was obtained from 459 CRC and 754 blood samples from healthy controls. A “moderated t-statistic” analysis with Empirical Bayes for shrinking the variance was used and the top 1000 significant markers were selected by controlling the false discovery rate (FDR) at significance level 0.05 using the Benjamini-Hochberg procedure. The molecular-inversion (padlock) probes corresponding to these 1000 markers were designed for capture-sequencing cfDNA from plasma and 544 markers were selected with good experimental amplification profile and dynamic methylation range for further analysis. The genetic linkage disequilibrium (LD block) concept was used to study the degree of co-methylation among different DNA strands, with the underlying assumption that DNA sites in close proximity are more likely to be co-methylated than distant sites.


cfDNA Diagnostic Prediction Model for CRC


The entire methylation dataset of 544 markers was analyzed by Least Absolute Shrinkage and Selection Operator (LASSO) and Random Forest to reduce the number of markers (FIG. 36). 801 CRC samples and 1021 normal control samples were randomly assigned to a training set and a validation set with a 2:1 ratio (FIG. 32A). LASSO-based feature selection identified 13 markers and Random Forest-based feature selection identified 22 markers for discriminating CRC versus normal. There were 9 overlapping markers between these two methods (Table 13). A diagnostic prediction model was constructed with these 9 markers and a combined diagnostic score system (cd-score) was formulated according to the coefficients from the multinomial logistic regression. Applying this model, a high consistency was observed between predicted results and pathological diagnosis results in both the training dataset and validation dataset (FIGS. 32B and 32C). The area of AUC was 0.96 (95% CI 0.95-0.97) in training dataset (FIG. 32D) and 0.96 (95% CI 0.94-0.97) in validation dataset (FIG. 32E) respectively. By using a best cutoff value determined via Youden index method, the model yielded a sensitivity of 87.5% and a specificity of 89.9% for discriminating CRC from normal controls in the training dataset and a sensitivity of 87.9% and a specificity of 89.6% in the validation dataset (FIGS. 32G and 32H). While these results clearly demonstrated the potential of ctDNA methylation markers for predicting the presence of CRC.


The utility of the cd-score in assessing the staging of CRC was then examined, the presence of residual tumor after treatment, the response of treatment (such as surgery or chemotherapy). The cd-scores of patients with detectable residual tumor following treatment were significantly higher than those without detectable tumor (p=<0.001, FIG. 37A). Similarly, there was good correlation between the cd-scores and tumor stage. Patients with early stage disease (I, II) had substantially lower cd-scores compared to those with advanced stage disease (III, IV) (p=<0.001, FIG. 37B). Furthermore, the cd-scores were significantly higher in patients before treatment compared to those received surgery (p<0.001, FIG. 37E). When the tumor recurrence, the cd-score increase again (FIG. 37E).


CEA has been used in diagnosis and surveillance of CRC for decades. But its sensitivity and specificity are not satisfied, which led to the necessity of invasive approaches, like colonoscopy, in most suspected CRC patients. In biopsy-proven CRC patients of the cohort, the cd-score demonstrated superior sensitivity and specificity to CEA for CRC diagnosis (AUC 0.96 vs 0.72, FIG. 37F). Both cd-score and CEA values were highly correlated with tumor stage (FIG. 37B and FIG. 37D). In patients with treatment response or tumor recurrence, the cd-score showed more significant changes from initial diagnosis than that of CEA (FIG. 37E and FIG. 37F).


cfDNA Prognostic Prediction Model for CRC


Next the potential to use a combined prognosis score (cp-score) based on ctDNA methylation analysis was investigated for prediction of prognosis of CRC in combination with clinical and demographic characteristics including age, gender, primary tumor site and AJCC stage. Follow-up information of 801 CRC patients for prognostic score-based analysis was completed. The median follow-up times were 11.9 months (rang 0.5-25.7 months). The same training dataset as diagnosis section containing 528 observations with 73 events and validation dataset containing 273 observations with 32 events was applied. A variable selection was conducted on the training set and built the composite score on the validation set. UniCox and LASSO-Cox methods were implemented to reduce the dimensionality and constructed a Cox-model to predict prognosis with a two-markers panel (FIG. 33A and Table 14). A Kaplan-Meier curve and log-rank test were generated using the dichotomized composite score, which resulted in a high-risk and low-risk group membership assignment according to its median. Median survival time in low-risk group was significantly better than that in high-risk group (p<0.001) (FIG. 33B and FIG. 33C).


Time-depended ROC was used to summarize the discrimination potential of the composite score, AJCC stage, CEA level, primary tumor location, and the combination of all the existing biomarkers. Multivariate Cox regression indicated that the cp-score significantly correlated with risk of death and was an independent risk factor of survival both in training set and validation set (Table 15). TNM stage (as defined by AJCC guidelines), CEA level, primary tumor location also predicted the 12-month survival of patients with CRC (Table 15). Furthermore, the combination of cp-score and clinical characteristics improved the ability to predict prognosis (AUC 0.79, 95% CI 0.70-0.88 in training cohort and 0.85, 95% CI 0.75-0.96) in validation cohort) (FIG. 33D and FIG. 33E).


Nomograms create a simple graphical representation of a statistical predictive model that generates a numerical probability of a clinical event. It can reduce statistical predictive models into a single numerical estimate of the probability of death. Multivariate Cox regression analysis identified four variables as independent predictive factors (cp-score, CEA level, TNM stage and primary tumor location, Table 15) in both training and validation cohort. As such, a nomogram was developed with point scales of these 4 variables to predict overall survival for CRC patients (FIG. 34A). The sum of each variable point was plotted on the total point axis, and the estimated median 1- and 2-year overall survival rates were obtained by drawing a vertical line from the plotted total point axis straight down to the outcome axis. In validation cohort, the c-index of this model was 0.839, indicating a good discrimination. FIG. 34B showed the calibration graph for the nomogram, in which the probability of 1-year overall survival as predicted by the nomogram is plotted against the corresponding observed survival rates obtained by the Kaplan-Meier method.


cfDNA Methylation Based Subtyping of CRC


To generate a cfDNA methylation based subtypes of CRC, an unsupervised clustering method was utilized. The method applied an iteration strategy that iteratively got rid of markers had less contributions to the initial clusters (FIG. 35A). Using the same training dataset as in building prognosis model, two clusters of CRC samples were obtained with a combined total of 45 markers that were differential methylated between clusters (FIG. 35B). From in silhouette analysis of the last round of iterations, a high separation distance was observed (FIG. 35C). The validation set was also classified by computing their correlation coefficients to centroid profile of clusters from training set.


The validation set was divided into two groups that shows a distinctly different methylation profile of the 45 markers (FIG. 35D). To explore the clinical relevance of the two subtypes, associations between the subtype and all available clinical factors including TNM stage, tumor site, MMR status, MSS status, tumor burden, sex, mutation status of a limited gene panels and survival outcomes were systematically tested. As a result, the second cluster in both training and validation data set have a significant poor survival rate than that of the first cluster (FIG. 35E, upper panel, both p<0.01, Log-rank Test). It was also found that the proportion of high stage CRC in cluster 2 were significantly higher than cluster 1 (FIG. 35E, lower panel, both p<0.05, Chi-squared Test). Considering the potential multicollinearity of the subtype and TNM stage in discriminating cohort with poor survival, multivariate cox regression analysis was performed and both factors were mutually independent for predicting overall survivals (Table 16).


The 45 markers for subtyping was classified into two groups according to the hypo- or hyper-methylation in clusters (27 hypo and 18 hyper in cluster 2, Table 20). Among these markers, three was also identified in the list of diagnosis markers and one was identified in both the diagnosis and prognosis marker lists (FIG. 38). Further analysis showed that cp-scores in cluster2 were significantly higher than cluster1 in two datasets (FIG. 40B, p<0.001, Wilcox Test). Given that 7-45018848 was identified in the three separate marker lists and was shown to be hyper-methylated in cluster 2, its presence may contribute to the vary survival outcomes between the two subtypes.


As CpG site cg10673833 (7-45018848) was the only overlapping marker in diagnosis, prognosis, and molecular subtype analysis, the utility of cg10673833 as a potential marker in monitoring treatment response was investigate. Serial longitudinal dynamic changes of methylation values of cg10673833 were obtained in a group of CRC patients. The results showed that there was a high correlation between methylation values of cg10673833 and treatment outcomes, which was greater than that observed for CEA (FIG. 39). In patients with serial samples, those with a positive treatment response had a concomitant and significant decrease in methylation values of cg10673833 compared to that prior to treatment, and there was an even further reduction in methylation values of cg10673833 in patients after surgery. In contrast, patients with progressive or recurrent disease had an increase methylation rate (FIG. 40).


Methylation Markers for Screening and Early Diagnosis of CRC in High-Risk Populations Using 7-45018848


The potential of cg10673833 as a methylation marker in detection of CRC and precancerous lesions in high-risk population based on plasma samples was investigated. From January 2015 through June 2017, 1450 participants who were recognized with high risks of CRC, were scheduled to undergo screening colonoscopy and methylation test of cg10673833 (FIG. 31B).


Table 17 showed the screening results of colonoscopy and cg10673833 methylation testing. cg10673833 methylation testing identified 9/10 participants with CRC and 7/8 participants with CRC in situ, for sensitivity of 88.9% (95% CI, 0.74-1.00) and specificity of 86.5% (95% CI, 0.85-0.88). Positive predictive value (PPV) and negative predictive value (NPV) were 0.077 (95% CI, 0.041 to 0.113) and 0.998 (95% CI, 0.996-1.00) (Table 21), respectively. For advanced precancerous lesions, the sensitivity was 33.3% (95% CI, 0.229-0.438), significant higher than the positively rate for subjects without any pathology (12.3%, 95% CI, 0.106-0.141).









TABLE 13







Characteristics of night methylation markers


and their coefficients in diagnosis.












MCBs
Target ID
Ref Gene
Coefficients

















Intersect

−2.76



7-45018848
cg10673833
MYO1G
15.98



1-161169007
cg10493436
ADAMTS4
1.88



10-88684020
cg10428836
BMPR1A
−6.05



11-60738995
cg27284288
CD6
3.29



12-7276714
cg16959747
RBP5
1.45



13-106834900
cg17494199
Chr 13:10
−2.24



14-55647474
cg23678254
LGAP5
2.71



6-16729606
cg24067911
ATXN1
−3.41



8-20375578
cg25459300
Chr 8:20
4.06

















TABLE 14







Characteristics of two methylation markers


and their coefficients in prognosis.












MCBs
Target ID
Ref Gene
Coefficients
HR
CI (lower)















7-45018848
cg10673833
MYO1G
2.06
7.81
2.11


3-111852156
cg25462303
GCET2
3.05
21.13
5.98
















TABLE 15







Multivariable cox regression analysis with covariates


including cp-score, gender, age, tumor location,


TNM stage, CEA for overall survival.









Training dataset















Exp
Se





Factor
coef
(coef)
(coef)
z
p
coef
















cp-score: H vs L
0.75
2.11
0.13
5.88
<0.001
0.72


Gender: M vs F
−0.31
0.73
0.26
−1.19
0.232
−0.15


Age: >=65 vs <65
0.02
1.02
0.01
1.91
0.056
−0.01


Stage: III/IV vs I/II
0.92
2.51
0.27
3.45
0.001
1.95


Tumor site: R vs L
0.62
1.85
0.26
2.36
0.018
0.86


CEA: >= vs <100
0.7
2.02
0.31
2.26
0.024
1.33
















TABLE 16







Association between cfDNA methylation based CRC subtypes


and CRC prognosis in both the training and validation


set (the same cohort as the prognosis model analysis).










Univariate
Multivariate












HR (95% CI)
P
HR (95% CI)
P















Training Set






Subtype:
 6.36 (3.98-10.17)
>0.001
5.24 (3.26-8.4)
>0.001


2 vs 1


TNM stage:
7.46 (2.35-23.7)
>0.001
 4.95 (1.54-15.9)
0.007


III-IV vs I-II


Validation


Set


Subtype:
2.61 (1.25-5.14)
0.01
2.28 (1.1-4.73)
0.027


2 vs 1


TNM stage:
10.65 (1.45-78.02)
0.02
 9.54 (1.3-70.12)
0.027


III-IV vs I-II
















TABLE 17







Sensitivity and Specificity of the ctDNA methylation test for the findings on colonoscopy.









Colonoscopy Finding










Persons with Finding (n = 1450)












Positive
Negative
ctDNA methylation test (n = 1450)














Results
Results
Sensitivity
Specificity


Colorectal cancer
No.
No.
No.
(95% CI)
(95% CI)

















Stage I-III
10
9
1
90.0
(71.4-108)




colorectal cancer


High grade
8
7
1
87.5
(64.6-110)


dysplasia


All Colorectal



88.9
(74.4-103)


Cancer


Advanced
78
26
52
33.3
(22.9-43.8)
66.7
(0.56-0.77)


precancerous


lesion*


Non-advanced
100
24
76
24.0
(15.6-32.2)
76.0
(0.68-0.84)


adenoma


Polyps
250
20
230
8.0
(4.6-11.4)
92.0
(88.6-95.4)













Inflammatory
10
0
10


100


bowel disease














Negative on
994
123
871


87.6
(85.6-89.7)


colonoscopy


All non-advanced
1354
167
1187


87.7
(85.9-89.4)


adenoma, non-


neoplastic finding,


and negative


results on


colonoscopy
















TABLE 18







Clinical characteristics of study cohort.












TCGA
GSE





CRC
Normal
CRC
Normal


Characteristic
tissue
blood
serum
control














Total (n)
459
754
801
1021














Gender









Female-no.(%)
216
(47.1)
401 (53.2)
305
(38.1)
486
(47.6)


Male-no.(%)
243
(52.9)
353 (46.8)
496
(61.9)
470
(46.0)












NA
0
 0
0
65
(6.4)














Age (years)


















Mean
68
 63
58
 47











Range
33-90
19-101
24-85
19-90














Stage




















I
76
(16.6)
NA
38
(4.7)
NA


II
179
(39.0)
NA
139
(17.4)
NA


III
131
(28.5)
NA
209
(26.1)
NA


IV
65
(14.2)
NA
406
(50.7)
NA


NA
8
(1.7)
NA
9
(1.1)
NA


Tumor Burden


Tumor free
199
(43.4)
NA
290
(36.2)
NA


With tumor
215
(46.8)
NA
511
(63.8)
NA












NA
45
(9.8)
NA
0
NA













RAS Status








Wide type
24
(5.3)
NA
122
(15.2)
NA


Mutation
23
(5.0)
NA
78
(9.8)
NA


NA
412
(89.7)
NA
601
(75.0)
NA


MMR Status


Proficient
81
(17.7)
NA
476
(59.4)
NA


Deficient
12
(2.6)
NA
35
(4.4)
NA


NA
366
(79.7)
NA
290
(36.2)
NA


Tumor Site


Right colon
257
(55.9)
NA
197
(24.6)
NA


Left colon
182
(39.7)
NA
593
(74.0)
NA


NA
20
(4.4)
NA
11
(1.4)
NA
















TABLE 19







Clinical characteristic of screening study cohort.













CRC
AA
Others



Characteristic
N = 18
N = 78
N = 1354







Age (mean [SD])






45-49 yr-no. (%)
0
4 (5.1)
338 (25.0)



50-59 yr-no. (%)
 8 (44.4)
18 (23.1)
446 (32.9)



60-69 yr-no. (%)
10 (55.6)
42 (53.9)
472 (34.9)



>70 yr-no. (%)
0
14 (17.9)
98 (7.2)



Sex



Female (%)
 8 (44.4)
30 (38.5)
698 (51.6)



Male (%)
10 (55.6)
48 (61.5)
656 (48.4)

















TABLE 20







Characteristics of 45 methylation markers in


ctDNA methylation based subtyping of CRC.












Methylation



MCBs
Target ID
status
Ref Gene





4-38673144
cg05205843
Hypo in cluster 1
KLF3


17-75539913
cg11841704
Hypo in cluster 1
NA


8-95651048
cg06699564
Hypo in cluster 1
NA


13-111160399
cg08924619
Hypo in cluster 1
COL4A2


1-57001742
cg11959316
Hypo in cluster 1
PPAP2B


13-111160365
cg08924619
Hypo in cluster 1
COL4A2


8-95651086
cg06699564
Hypo in cluster 1
NA


17-75539901
cg01824933
Hypo in cluster 1
NA


13-111160418
cg08924619
Hypo in cluster 1
COL4A2


4-3867313
cg05205842
Hypo in cluster 1
KLF3


13-111160424
cg08924619
Hypo in cluster 1
COL4A2


17-48894963
cg04049981
Hypo in cluster 1
NA


1-15686708
cg09026722
Hypo in cluster 1
PEAR1


13-112602444
cg03616722
Hypo in cluster 1
NA


13-11116043
cg08924619
Hypo in cluster 1
COL4A2


17-16924597
cg05928904
Hypo in cluster 1
NA


3-194826585
cg08704934
Hypo in cluster 1
C3orf21


7-2150548
cg09776772
Hypo in cluster 1
MAD1L1


13-106834942
cg17494199
Hypo in cluster 1
NA


17-75539895
cg01824933
Hypo in cluster 1
NA


8-126285443
cg16296417
Hypo in cluster 1
NSMCE2


7-2150534
cg09776772
Hypo in cluster 1
MAD1L1


7-215055
cg09776772
Hypo in cluster 1
MAD1L1


15-68498251
cg05338167
Hypo in cluster 1
CA1ML4


1-161169007
cg10493436
Hypo in cluster 1
ADAMTS4


13-112612344
cg011251410
Hypo in cluster 1
NA


15-58723675
cg16391792
Hypo in cluster 1
LIPC


6-31527889
cg06393830
Hyper in cluster 2
NA


2-113931518
cg09366118
Hyper in cluster 2
PSD4


3-118955835
cg22513455
Hyper in cluster 2
B4GALT4


8-129005567
cg17583432
Hyper in cluster 2
PVT1


8-59058648
cg23881926
Hyper in cluster 2
FAM110B


16-29757318
cg09638208
Hyper in cluster 2
C16orf54


17-55456535
cg12441066
Hyper in cluster 2
MSI2


11-60738995
cg27284288
Hyper in cluster 2
CD6


10-14701815
cg04441857
Hyper in cluster 2
FAM107B


8-129005599
cg17583432
Hyper in cluster 2
PVT1


7-45018848
cg10673833
Hyper in cluster 2
MYO1G


1-154128002
cg19757176
Hyper in cluster 2
TPM3


19-1602335
cg08670281
Hyper in cluster 2
CYP4F11


8-129005583
cg17583432
Hyper in cluster 2
PVT1


17-8370004
cg04460364
Hyper in cluster 2
NDEL1


12-7276714
cg16959747
Hyper in cluster 2
RBP5


12-132269608
cg15011734
Hyper in cluster 2
SFRS8


2-69027039
cg25754195
Hyper in cluster 2
ARHGAP25
















TABLE 21







Positive and negative predictive values of cfDNA methylation test.












Positive
Negative



N of
Predictive Value
Predictive Value


Outcome
participations
(95% CI)
(95% CI)





Colorectal
18/1450
0.077 (0.041-0.113)
0.998 (0.996-1.00) 


cancer


Advanced
78/1450
0.134 (0.087-0.183)
0.042 (0.031-0.053)


precancerous


lesion









While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby.










Lengthy table referenced here




US20200277677A1-20200903-T00001


Please refer to the end of the specification for access instructions.














LENGTHY TABLES




The patent application contains a lengthy table section. A copy of the table is available in electronic form from the USPTO web site (). An electronic copy of the table will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).





Claims
  • 1. A method of detecting a methylation pattern of a set of biomarkers in a subject suspected of having a cancer, the method comprising: a) processing an extracted genomic DNA with a deaminating agent to generate a genomic DNA sample comprising deaminated nucleotides, wherein the extracted genomic DNA is obtained from a biological sample from the subject suspected of having a cancer; andb) detecting the methylation pattern of one or more biomarkers selected from Table 1, Table 2, Table 7, Table 8, Table 13, Table 14, or Table 20 from the extracted genomic DNA by contacting the extracted genomic DNA with a set of probes, wherein the set of probes hybridizes to the one or more biomarkers, and perform a DNA sequencing analysis to determine the methylation pattern of the one or more biomarkers.
  • 2. The method of claim 1, wherein said detecting comprises a real-time quantitative probe-based PCR or a digital probe-based PCR.
  • 3. The method of claim 2, wherein the digital probe-based PCR is a digital droplet PCR.
  • 4. The method of claim 1, wherein the set of probes comprises a set of padlock probes.
  • 5. The method of claim 1, wherein step b) comprises detecting the methylation pattern of one or more biomarkers selected from Table 2, Table 13, Table 14, or Table 20.
  • 6. The method of claim 1, wherein step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg19516279, cg06100368, cg25945732, cg19155007, cg17952661, cg04072843, cg01250961, cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg01237565, cg16561543, cg13771313, cg13771313, cg08169020, cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg09095222, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, cg13169641, cg25352342, cg09921682, cg02504622, cg17373759, cg06547203, cg06826710, cg00902147, cg17609887, cg15721142, cg08116711, cg00736681, cg18834029, cg06969479, cg24630516, cg16901821, cg20349803, cg23610994, cg19313373, cg16508600, cg24096323, cg24746106, cg12288267, cg10430690, cg24408776, cg05630192, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, cg09921682, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, cg06405341, cg08557188, cg00690392, cg03421440, cg07077277, or cg20702527.
  • 7. The method of claim 1, wherein the subject is suspected of having a breast cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg19516279, cg06100368, cg20349803, cg23610994, cg19313373, cg16508600, or cg24096323.
  • 8. The method of claim 7, wherein the subject is determined to have a breast cancer if: at least one of the cg markers cg19516279 and cg06100368 is hypermethylated;at least one of the cg markers cg20349803, cg23610994, cg19313373, cg16508600, and cg24096323 is hypomethylated; ora combination thereof.
  • 9. The method of claim 1, wherein the subject is suspected of having a liver cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg25945732, cg19155007, cg17952661, cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, or cg00456086.
  • 10. The method of claim 9, wherein the subject is determined to have a liver cancer if: at least one of the cg markers cg25945732, cg19155007, or cg17952661 is hypermethylated;at least one of the cg markers cg25934700, cg14164596, cg24461337, cg23041410, cg07366553, cg26859666, or cg00456086 is hypomethylated; ora combination thereof.
  • 11. The method of claim 1, wherein the subject is suspected of having a liver cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, 5-176829639, 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858.
  • 12. The method of claim 11, wherein the subject is determined to have a liver cancer if: at least one of the markers 3-49757316, 8-27183116, 8-141607252, 17-29297711, 3-49757306, 19-43979341, 8-141607236, 5-176829755, 18-13382140, 15-65341965, 3-13152305, 17-29297770, 8-27183316, 5-176829740, 19-41316693, 18-43830649, 15-65341957, 20-44539531, 7-30265625, 2-131129567, 5-176829665, 3-13152273, 8-27183348, 3-49757302, 19-41316697, 8-61821442, 20-44539525, 10-102883105, 11-65849129, or 5-176829639 is hypermethylated;at least one of the markers 15-91129457, 2-1625431, 6-151373292, 6-151373294, 20-25027093, 6-14284198, 10-4049295, 19-59023222, 1-184197132, 2-131004117, 2-8995417, 12-10782319, 20-25027033, 6-151373256, 8-86100970, 9-4839459, 17-41221574, 1-153926715, 20-25027044, 20-20177325, 2-1625443, 20-25027085, 11-69420728, 1-229234865, 6-13408877, 22-50643735, 6-151373308, 1-232119750, 8-134361508, or 6-13408858 is hypomethylated; ora combination thereof.
  • 13. The method of claim 1, wherein the subject is suspected of having an ovarian cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg04072843, cg01250961, cg24746106, cg12288267, or cg10430690.
  • 14. The method of claim 13, wherein the subject is determined to have an ovarian cancer if: at least one of the cg markers cg04072843 and cg01250961 is hypermethylated;at least one of the cg markers cg24746106, cg12288267, and cg10430690 is hypomethylated; ora combination thereof.
  • 15. The method of claim 1, wherein the subject is suspected of having a colorectal cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, cg00846300, cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, and cg09921682.
  • 16. The method of claim 15, wherein the subject is determined to have a colorectal cancer if: at least one of the cg markers cg08131100, cg03788131, cg17528648, cg07784526, cg18948743, cg23986470, or cg00846300 is hypermethylated;at least one of the cg markers cg25352342, cg09921682, cg02504622, cg17373759, cg12028674, cg24820270, cg12028674, cg26718707, cg10349880, or cg09921682 is hypomethylated; ora combination thereof.
  • 17. The method of claim 1, wherein the subject is suspected of having a colorectal cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg10673833, cg10493436, cg10428836, cg27284288, cg16959747, cg17494199, cg23678254, cg24067911, or cg25459300.
  • 18. The method of claim 1, wherein the subject is suspected of having a colorectal cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, cg16391792, cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195.
  • 19. The method of claim 18, wherein the subject is determined to have a colorectal cancer if: at least one of the cg markers cg06393830, cg09366118, cg22513455, cg17583432, cg23881926, cg09638208, cg12441066, cg27284288, cg04441857, cg17583432, cg10673833, cg19757176, cg08670281, cg17583432, cg04460364, cg16959747, cg15011734, or cg25754195 is hypermethylated;at least one of the cg markers cg05205843, cg11841704, cg06699564, cg08924619, cg11959316, cg08924619, cg06699564, cg01824933, cg08924619, cg05205842, cg08924619, cg04049981, cg09026722, cg03616722, cg08924619, cg05928904, cg08704934, cg09776772, cg17494199, cg01824933, cg16296417, cg09776772, cg09776772, cg05338167, cg10493436, cg011251410, or cg16391792 is hypomethylated; ora combination thereof.
  • 20. The method of claim 1, wherein the subject is suspected of having a prostate cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, cg26149167, cg06547203, cg06826710, cg00902147, cg17609887, or cg15721142.
  • 21. The method of claim 20, wherein the subject is determined to have a prostate cancer if: at least one of the cg markers cg01029638, cg08350814, cg05098590, cg18085998, cg06532037, cg15313226, cg16232979, or cg26149167 is hypermethylated;at least one of the cg markers cg06547203, cg06826710, cg00902147, cg17609887, or cg15721142 is hypomethylated; ora combination thereof.
  • 22. The method of claim 1, wherein the subject is suspected of having a pancreatic cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg01237565, cg16561543, and cg08116711.
  • 23. The method of claim 22, wherein the subject is determined to have a pancreatic cancer if: at least one of the cg markers cg01237565 or cg16561543 is hypermethylated;cg marker cg08116711 is hypomethylated; ora combination thereof.
  • 24. The method of claim 1, wherein the subject is suspected of having acute myeloid leukemia and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg13771313, cg13771313, and cg08169020.
  • 25. The method of claim 1, wherein the subject is suspected of having cervical cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg08169020, cg21153697, cg07326648, cg14309384, cg20923716, cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, or cg13169641.
  • 26. The method of claim 25, wherein the subject is determined to have cervical cancer if: at least one of the cg markers cg08169020, cg21153697, cg07326648, cg14309384, or cg20923716 is hypermethylated;at least one of the cg markers cg22220310, cg21950459, cg13332729, cg10802543, cg20707333, or cg13169641 is hypomethylated; ora combination thereof.
  • 27. The method of claim 1, wherein the subject is suspected of having sarcoma and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg09095222.
  • 28. The method of claim 27, wherein the subject is determined to have sarcoma if at least cg marker cg09095222 is hypermethylated.
  • 29. The method of claim 1, wherein the subject is suspected of having stomach cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg00736681 and cg18834029.
  • 30. The method of claim 29, wherein the subject is determined to have stomach cancer if at least one of the cg markers cg00736681 or cg18834029 is hypomethylated.
  • 31. The method of claim 1, wherein the subject is suspected of having thyroid cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg06969479, cg24630516, and cg16901821.
  • 32. The method of claim 31, wherein the subject is determined to have thyroid cancer if at least one of the cg markers cg06969479, cg24630516, or cg16901821 is hypomethylated.
  • 33. The method of claim 1, wherein the subject is suspected of having mesothelioma and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg05630192.
  • 34. The method of claim 33, wherein the subject is determined to have mesothelioma if cg marker cg05630192 is hypomethylated.
  • 35. The method of claim 1, wherein the subject is suspected of having glioblastoma and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg06405341.
  • 36. The method of claim 1, wherein the subject is suspected of having lung cancer and step b) comprises detecting the methylation pattern of one or more biomarkers selected from cg08557188, cg00690392, cg03421440, or cg07077277.
  • 37. The method of claim 36, wherein the subject is determined to have lung cancer if at least one of the cg markers cg08557188, cg00690392, cg03421440, or cg07077277 is hypomethylated.
  • 38. The method of claim 1, wherein the biological sample is a blood sample, a urine sample, a saliva sample, a sweat sample, or a tear sample.
  • 39. The method of claim 1, wherein the biological sample is a cell-free DNA sample.
  • 40. The method of claim 1, wherein the biological sample comprises circulating tumor cells.
CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Application No. 62/569,459, filed Oct. 6, 2017, and U.S. Provisional Application No. 62/673,593, filed May 18, 2018, which each of the applications is incorporated herein by reference in its entirety.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2018/054660 10/5/2018 WO 00
Provisional Applications (2)
Number Date Country
62673593 May 2018 US
62569459 Oct 2017 US