The present invention relates to a method for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a given sample based on the analysis of differential DNA methylation patterns. More particularly, the method is directed to the identification of one or more epigenetic markers that derive from the application of a variety of statistical methods in order to point out the prognostic significance of the difference in methylation states at one or more genomic loci and predict whether the sample analyzed has a good or bad prognosis following treatment.
DNA methylation is found in the genomes of diverse organisms including both prokaryotes and eukaryotes. In prokaryotes, DNA methylation occurs on both cytosine and adenine bases and encompasses part of the host restriction system. In multicellular eukaryotes, however, methylation seems to be confined to cytosine bases and is associated with a repressed chromatin state and inhibition of gene expression (reviewed, for example, in Wilson, G. G. and Murray, N. E. (1991) Annu. Rev. Genet. 25, 585-627).
In mammalian cells, DNA methylation predominantly occurs at CpG dinucleotides, which are distributed unevenly and are underrepresented in the genome. Clusters of usually unmethylated CpGs (also referred to as CpG islands) are found in many promoter regions (reviewed, e.g., in Li, E. (2002) Nat. Rev. Genet. 3, 662-673). Changes in DNA methylation leading to aberrant gene silencing have been demonstrated in several human cancers such as colorectal and prostate cancer (reviewed, e.g., in Robertson, K. D. and Wolffe, A. P. (2000) Nat. Rev. Genet. 1, 11-19). Hypermethylation of promoters was demonstrated to be a frequent mechanism leading to the inactivation of tumor suppressor genes. In the other hand, promoter hypomethylation often correlates to DNA breaks and genome instability, and thus to the severity of some cancers (Bird, A. P. (2002) Genes Dev. 16, 6-21).
Various methods exist for experimentally determining differential methylation in individual genes (reviewed, e.g., in Rein, T. et al. (1998) Nucleic Acids Res. 26, 2255-2264). These techniques include inter alia bisulfite sequencing, methylation specific PCR (MSP), Methylight and pyro-sequencing.
Breast cancer affects 1.2 million people worldwide and is one of the leading causes of death in women, with approximately 400,000 new cases being diagnosed in the USA and Western Europe each year. Therefore, breast cancer diagnostics remains a high opportunity market.
Differential methylation patterns of several target genes has been associated with the outcome of breast cancer (see, e.g., Zrihan-Licht, S. et al. (1995) Int. J. Cancer 62, 245-251; Mancini, D. N. et al. (1998) Oncogene 16, 1161-1169). However, many different clinical types of breast cancer exist, some of which are not well characterized on a molecular level at all. Furthermore, available diagnostic assays for analyzing breast cancer are also hampered by the fact that they are typically based on the analysis of only a single molecular marker, which might affect reliability and/or accuracy of detection. In addition, a single marker normally does not enable detailed predictions concerning latency stages, tumor progression, and the like.
Thus, there is still a need for the identification of alternative molecular markers and assay formats for assisting in diagnosing breast cancer and/or monitoring breast cancer progression overcoming these limitations. The most useful biomarkers from a clinical standpoint are predictive markers that can predict the response to any treatment regiment at the time of diagnosis. Prognostic markers that can identify a patient's risk of relapse with breast cancer after surgery are also useful, especially if they can identify patients who are at low risk for relapse and thus can be exempted from highly toxic chemotherapy.
It is an objective of the present invention to provide novel approaches for assisting in diagnosing breast cancer and/or monitoring breast cancer progression based on the analysis of differential DNA methylation patterns.
More specifically, it is an objective to provide panels of epigenetic markers that derive from the application of a variety of statistical methods in order to point out the significance of the difference in methylation states at one or more genomic loci analyzed, thus enabling the prediction whether a given sample is predicted to have good or bad prognosis following treatment.
Furthermore, it is an objective to provide a diagnostic approach enabling a reliable and accurate breast cancer prognosis independent of other pathological parameters than the methylation state.
These objectives as well as others, which will become apparent from the ensuing description, are attained by the subject matter of the independent claims. Some of the preferred embodiments of the present invention are defined by the subject matter of the dependent claims.
In one aspect, the present invention relates to a method for assisting in diagnosing breast cancer and/or monitoring breast cancer progression, comprising:
In a specific embodiment of the method, the breast cancer is estrogen receptor positive breast cancer.
In a preferred embodiment, the method is used for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a patient, and further comprises:
providing a genomic DNA sample from the patient to be analyzed,
wherein the method is performed in vitro.
In another specific embodiment, the method further comprises:
classifying the one or more genomic loci according to its/their methylation state as unmethylated, partially methylated, and methylated prior to performing step (c).
In a preferred embodiment, the statistical survival analysis performed in step (c) comprises generating Kaplan-Meier survival estimates for the respective methylation states (that is, the samples belonging to the respective methylation state) of each of the one or more genomic loci and calculating the differences between the Kaplan-Meier survival estimates generated for each of the loci.
In a further preferred embodiment, determining the statistical significance of the data obtained in the survival analysis comprises applying the log-rank or Mantel-Haenszel test. Particularly preferably, determining the statistical significance further comprises a permutation testing method.
In another specific embodiment, the method further comprises:
determining whether the prognostic value of the one or more genetic loci selected is independent of other pathological parameters than the methylation state.
Particularly preferably, the method is performed using a computing device.
In another aspect, the present invention relates to a panel of genetic markers for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a patient, wherein the panel comprises any one or more, or preferably all, of the genetic markers listed in Table 1.
In yet another aspect, the present invention relates to a panel of genetic markers for assisting in diagnosing estrogen receptor positive breast cancer and/or monitoring estrogen receptor positive breast cancer progression in a patient, wherein the panel comprises any one or more, or preferably all, of the genetic markers listed in Table 2.
Preferably, the panels of genetic markers are determined by the method as defined herein.
In a further aspect, the present invention relates to the use of the panels of genetic markers as defined herein for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a patient.
In a preferred embodiment, the monitoring of breast cancer progression comprises stratification of breast cancer patients into good or poor prognosis groups. Particularly preferably, the monitoring of breast cancer progression comprises predicting relapse free survival at five years from diagnosis.
The present invention is based on the unexpected finding that combining analysis of differential DNA methylation in a sample with a variety of statistical and machine learning methods in order to point out the significance of the difference in methylation states results in the identification of panels of epigenetic markers having independent prognostic value for assisting in diagnosing and/or monitoring the progression of breast cancer.
The present invention illustratively described in the following may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein.
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. The drawings described are only schematic and are to be considered non-limiting.
Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps. For the purposes of the present invention, the term “consisting of” is considered to be a preferred embodiment of the term “comprising of”. If hereinafter a group is defined to comprise at least a certain number of embodiments, this is also to be understood to disclose a group, which preferably consists only of these embodiments.
Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated.
The term “about” in the context of the present invention denotes an interval of accuracy that the person skilled in the art will understand to still ensure the technical effect of the feature in question. The term typically indicates deviation from the indicated numerical value off 10%, and preferably ±5%.
Furthermore, the terms first, second, third, (a), (b), (c), and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
Further definitions of term will be given in the following in the context of which the terms are used.
The following terms or definitions are provided solely to aid in the understanding of the invention. These definitions should not be construed to have a scope less than understood by a person of ordinary skill in the art.
In one aspect, the present invention relates to a method for assisting in diagnosing breast cancer and/or monitoring breast cancer progression, comprising:
The term “cancer”, as used herein, generally denotes any type of malignant neoplasm, that is, any morphological and/or physiological alterations (based on genetic re-programming) of target cells exhibiting or having a predisposition to develop characteristics of a carcinoma as compared to unaffected (healthy) wild-type control cells. Examples of such alterations may relate inter alia to cell size and shape (enlargement or reduction), cell proliferation (increase in cell number), cell differentiation (change in physiological state), apoptosis (programmed cell death) or cell survival. Hence, the term “breast cancer” refers to cancerous growths in breast tissue.
In one embodiment of the method according to the present invention, the breast cancer is estrogen receptor positive breast cancer.
In a preferred embodiment, the method is used for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a patient, and further comprises:
The term “in vitro”, as used herein, denotes that the method is performed using an isolated DNA sample derived from the patient to be analyzed, that is, one or more cells, a cell extract, a tissue biopsy, and the like.
The term “sample” (or “genomic sample”), as used herein, denotes any sample comprising one or more genomic DNA molecules whose differential methylation status is to be analyzed. The DNA molecules comprised in the sample may be naturally occurring or synthetic compounds (e.g., generated by means of recombinant DNA technology or by chemical synthesis) and may be single-stranded or double-stranded. The DNA molecules may have any length. Typically, the length varies between 10 bp and 100000 bp, preferably between 100 bp and 10000 bp, and particularly preferably between 500 bp and 5000 bp.
The DNA molecules comprised in the sample may be present in purified form (e.g., provided in a suitable buffer solution such as TE or PBS known in the art) or may be included in an unpurified, partially purified or enriched sample solution. Examples of such unpurified samples include cell lysates, body fluids (e.g., blood, serum, salvia, and urine), solubilized tissues, and the like.
In some embodiments, the method according to the present invention also comprises the purification of the DNA present in such an unpurified sample. Methods and corresponding devices for purifying DNA (optionally as integral part of an automated system or working platform) are well known in the art and commercially available from many suppliers.
The determination of the methylation state of the DNA comprised in the sample may be performed using any detection method established in the art, e.g., including bisulfite-sequencing, methylation-sensitive single-strand conformation analysis (MS-SSCA), methylation-sensitive single nucleotide primer extension (MS-SnuPE), methylation-sensitive microarray applications, combined bisulfite restriction analysis (COBRA), methlyation-sensitive real-time PCR applications, and the like. In preferred embodiments, the analysis of the DNA methylation patterns is performed in a whole genome format using Methylation Oligonucleotide Microarray Analysis (MOMA) arrays (cf. also
Within the present invention, each methylation profile was determined by using an expectation maximization algorithm to pool each genomic locus in a particular sample into one of three distinct methylation states—unmethylated, partially methylated and methylated.
Subsequently, the method of the invention comprises the identification of one or more genomic loci are exhibiting differences in its/their DNA methylation state, that is, genomic loci which are, for example, unmethylated in non-tumor samples and become (at least partially) methylated during tumor progression, or vice versa, which are (at least partially) methylated in non-tumor samples and become demethylated during tumor progression.
In some embodiments, the results of the differential methylation analyses are compared with a reference value, for example the methylation pattern obtained using a DNA sample derived from a healthy subject or with data from the literature in order to identify differential methylation.
In specific embodiments, the one or more differentially methylated genomic loci are classified according to its/their methylation state as unmethylated, partially methylated, and methylated prior to performing the statistical survival analysis.
In the next step, the methylation data obtained are subjected to a statistical survival analysis in order to identify whether a methylation state of a particular genomic locus in the breast tumor sample would classify the patient as having good or bad prognosis, that is, whether the variation in methylation behavior observed is significant. Several statistical survival models are known in the art. The method of present invention may be practiced by employing any of these models.
Preferably, however, the statistical survival analysis performed in step (c) of the method according to the invention comprises generating Kaplan-Meier survival estimates for the respective methylation states (that is, for the samples belonging to the respective methylation states) of each of the one or more genomic loci and calculating the differences between the Kaplan-Meier survival estimates generated for each of the loci (that is, for the samples belonging to each of the loci).
The Kaplan-Meier estimator of the survival function is known in the art (Hosmer, D. W., et al. (2008) Applied Survival Analysis—Regression Modeling of Time-to-Event Data. 2nd ed. Wiley Series in Probability and Statistics. Hoboken, N.J.: John Wiley & Sons, Inc.) and calculates the probability of no systemic recurrence at a given time by using the time to systemic recurrence for all the patients included in the study. Since some patients typically leave the study after a while, the Kaplan-Meier estimator accounts for the loss of patients from the study at different points in time due to lack of follow-up. This is called the “censoring problem” in survival analysis and is already accounted for in the Kaplan-Meier estimator. Within the present invention, the probability of no systemic recurrence was calculated over a period of 10 years after initial diagnosis. However, other periods of time (e.g., 1, 3, 5, 15 or 20 years) are possible as well.
Kaplan-Meier estimators are generated for the three methylation states unmethylated, partially methylated, and methylated, respectively. In case, the difference between the Kaplan-Meier estimators obtained for a particular locus is significant, this locus is retained for further analysis. Otherwise, it is discarded. The overall procedure for performing the statistical survival analysis is schematically depicted in
In order to select those genomic loci whose differential methylation pattern has independent prognostic value for breast cancer diagnosis, as a next step, the statistical significance of the data obtained in the survival analysis is determined. Again, various established statistical means are possible for performing such tests. The skilled person is well aware of how to select an appropriate procedure.
In a preferred embodiment of the method, determining the statistical significance of the data obtained in the survival analysis comprises applying the log-rank or Mantel-Haenszel test, which is established in the art as well (Hosmer, D. W., et al. (2008), supra). This test outputs a chi-square value for each comparison, which is a measure of the amount of difference in the Kaplan-Meier curves. The statistical significance of these differences can be further validated through a permutation testing method, which, for example, involves permuting the available clinical data of the samples analyzed and recomputing the chi-square index for all loci.
Thus, in a further preferred embodiment of the method, determining the statistical significance of the data obtained in the survival analysis further comprises a permutation testing method. This is repeated several times (e.g., 2, 5, 10, 50, 100, 200, 500, 1000, 2000 times, and so forth) to obtain a background distribution of chi-square values for all loci. Within the present invention, the permutation testing method is preferably repeated 1000 times.
Then, the chi-square value for each genomic locus obtained from the original clinical data is compared to the background distribution. Any locus that achieves a statistical significance of 0.05 or lower after multiple testing correction, for example after Benjamini-Hochberg correction (Benjamini, Y. and Hochberg, Y. (1995) J. Royal Stat. Soc. Series B 57, 289-300), is considered a good biomarker for stratification of patients into good and poor prognosis groups. The overall procedure for performing the analysis of statistical significance is schematically depicted in
Finally, in some embodiments, the method of the present invention comprises determining whether the prognostic value of the one or more genetic loci selected is independent of other pathological parameters than the methylation state, that is, the results obtained are corrected for any ambiguities potentially associated with clinical parameters such as age of the patients analyzed, tumor grade, adjuvant or hormone therapy, and the like.
In order to estimate the extent to which the cancer recurrence rates were correlated with the methylation status of a given locus established Cox regression analysis may be used but other models are possible as well. Loci that had a statistically significant Cox coefficient (as determined by the Wald test) were chosen for further analysis. Multivariate Cox regression may be performed using the methylation status of the significant loci in combination with, for example, age (e.g. <55 versus >55), tumor grade (I or II versus III), as well as the status of several marker proteins such as p53 (positive versus negative), estrogen receptor (ER) (positive versus negative) and ERBB2 (positive versus negative).
Loci that had statistically significant Cox coefficient in the multivariate Cox regression model were considered to be providing prognostic information independent of the other clinical factors for assisting in diagnosing breast cancer and/or monitoring breast cancer progression.
Particularly preferably, the method according to the present invention is performed using a computing device. Such devices are known in the art and may be configured in many ways. For example, such computing device may be designed to receive a data set concerning the DNA methylation status of one or more genomic loci of the DNA comprised in a given sample, processing this dataset to identify one or more genomic loci exhibiting differences in its/their DNA methylation state, subjecting the differentially methylated one or more genomic loci identified to the statistical survival analysis using an appropriate algorithm, correlating the data set obtained with other clinical parameters associated with the sample tested, and generating a (ranked) listing based of the correlated data of one or more genomic loci displaying statistically significant independent prognostic value for assisting in diagnosing breast cancer and/or monitoring breast cancer progression.
In another aspect, the present invention relates to a panel of genetic (more particularly epigenetic) markers for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a patient, wherein the panel comprises any one or more, or preferably all, of the 241 genetic markers listed in Table 1. All these markers are based on differential DNA methylation patterns.
In yet another aspect, the present invention relates to a panel of genetic (more particularly epigenetic) markers for assisting in diagnosing estrogen receptor positive breast cancer and/or monitoring estrogen receptor positive breast cancer progression in a patient, wherein the panel comprises any one or more, or preferably all, of the 105 genetic markers listed in Table 2. All these markers are based on differential DNA methylation patterns.
Preferably, the above-referenced panels of genetic markers are determined by the method as defined herein.
The term “any one or more”, as used herein, relates to any one or any subgroup of any two or more (i.e. any two, any three, any four, any five, any six, any seven, any eight, any nine, any ten, and so forth) or to all of the respective genetic marker genes disclosed herein in Tables 1 and 2, respectively.
Preferably, the panel of epigenetic markers for assisting in diagnosing breast cancer comprises all of the 241 markers listed in Table 1, whereas the panel of epigenetic markers for assisting in diagnosing estrogen receptor positive breast cancer comprises all of the 105 markers listed in Table 2.
The markers listed in Tables 1 and 2 are unambiguously defined by means of their chromosomal location (i.e. number of the human chromosome as well as start and end points of the respective chromosomal fragment).
In a further aspect, the present invention relates to the use of the panels of genetic markers as defined herein for assisting in diagnosing breast cancer and/or monitoring breast cancer progression in a patient. The panels of genetic markers may also be used to classify breast cancer patients according to tumor type or tumor grade.
In a preferred embodiment, the monitoring of breast cancer progression comprises stratification of breast cancer patients into good or poor prognosis groups (for example, based on the respective p-values associated with the statistical multivariate model described herein; cf. also Tables 1 and 2). Particularly preferably, the monitoring of breast cancer progression comprises predicting relapse free survival at five (or, e.g., 10) years from diagnosis.
The invention is further described by the figures and the following examples, which are solely for the purpose of illustrating specific embodiments of this invention, and are not to be construed as limiting the scope of the invention in any way.
The Methylation Oligonucleotide Microarray Analysis (MOMA) array used in the present invention for performing whole genome detection of differentially methylated loci was designed as follows.
The genomic DNA was digested with a restriction endonuclease with a CG rich recognition sequence (MspI), followed by ligation of adaptors for use in a subsequent step of reducing genomic complexity. One-half of the adaptor-ligated sample was depleted of its methylated sequences by digestion with the methylation specific endonuclease, McrBC, and the other half was mock-treated. Carefully balanced PCR conditions were used to size-select Mspl fragments and reduce the overall genome complexity. The McrBC treated representation was compared to the mock treated sample which serves as the reference for comparative hybridization on an oligonucleotide tiling array with 367K features with coverage of 26.219 out of 27.801 annotated CpG islands. The procedure is schematically illustrated in
DNA methylation analysis was performed for 121 human breast tumors, 108 of which had associated clinic-pathological annotations including relapse and survival data for up to 10 years.
In one embodiment, only those tumors that were Estrogen receptor positive were analyzed, a total of 70 tumors.
Each sample's methylation profile was determined by using an expectation maximization algorithm to pool each locus into one of three distinct states—unmethylated, partially methylated and methylated.
Gnomic DNA extraction from the tumor samples as well as determining the DNA methylation pattern was performed according to established standard proceedings.
The statistical model chosen for evaluating the probability that there would be no systemic recurrence in a given amount of time is the Kaplan-Meier estimator of the survival function.
The Kaplan-Meier estimator calculates the probability of no systemic recurrence at a given time by using the time to systemic recurrence for all the patients included in the study. Since some patients typically leave the study after a while, the Kaplan-Meier estimator accounts for the loss of patients from the study at different points in time due to lack of follow-up. This is called the “censoring problem” in survival analysis and is already accounted for in the Kaplan-Meier estimator. The Kaplan-Meier estimator was used to analyze the probability of no systemic recurrence over a period of 10 years after initial diagnosis. The procedure for identifying genomic loci that have potential prognostic value for diagnosing and/or monitoring breast cancer is schematically given in
Using the above methodology, 159.436 genomic loci in the dataset were searched for loci with prognostic capability.
Given the three possible states of any locus (i.e. unmethylated, partially methylated, and methylated), the Kaplan-Meier estimator is used to estimate the probability of no systemic recurrence for at least 10 years using all the patients that fall into a given methylation state of the locus.
Statistically significant differences in the three Kaplan-Meier estimates were evaluated by using the log-rank or Mantel-Haenszel test. This test results in a chi-square value for each comparison, which is a measure of the amount of difference in the Kaplan-Meier curves. The statistical significance of these differences can be estimated through a permutation testing method, which involves permuting the clinical data and recomputing the chi-square index for all loci. This was repeated 1000 times to obtain a background distribution of chi-square values. Then the chi-square value for each locus obtained from the original clinical data was compared to the background distribution. Any locus achieving a statistical significance of 0.05 or lower, after Benjamini-Hochberg multiple testing correction, is considered to represent a suitable biomarker for stratification of patients into good and poor prognosis groups. This procedure is also schematically outlined in
In one experiment, all 121 breast tumors were included in the analysis. Based on the methodology described above, the number of potential prognostic genomic loci was narrowed to 2.559.
Then, it was determined whether these loci were providing prognostic information independent of other clinical variables such as ER/PR status, ERBB2 status, tumor grade, as well as adjuvant or hormone therapy.
Cox regression analysis was used to estimate the extent to which the cancer recurrence rates were correlated with the methylation status of a given locus. Loci that had a statistically significant Cox coefficient (as determined by the Wald test) were chosen for further analysis. Multivariate Cox regression was performed using the methylation status of the significant loci in combination with age (<55 versus >55), tumor grade (I or II versus III), p53 status (positive versus negative), ER status (positive versus negative), and ERBB2 status (positive versus negative). Loci that had statistically significant Cox coefficient in the multivariate Cox regression model were considered to be providing prognostic information independent of the other clinical factors.
Finally, a total of 241 loci that had prognostic value independent of other clinical factors could be identified. These loci (unambiguously characterized by their chromosomal position) are included in Table 1.
In another experiment, only the 70 estrogen receptor positive breast cancer samples were included in the analysis.
After eliminating all loci that did not provide prognostic information independent of the other clinical factors, a total of 105 loci could be identified as independent prognostic factors for estrogen receptor positive tumors. These loci are listed in Table 2.
The present invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising”, “including”, “containing”, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed.
Thus, it should be understood that although the present invention has been specifically disclosed by embodiments and optional features, modifications and variations of the inventions embodied therein may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.
The invention has been described broadly and generically herein. Each of the narrower species and sub-generic groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
Other embodiments are within the following claims. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
Drosophila)
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB10/54152 | 9/15/2010 | WO | 00 | 3/20/2012 |
Number | Date | Country | |
---|---|---|---|
61244625 | Sep 2009 | US |