This invention pertains in general to the field of statistical data processing. More particularly the invention relates to methylation classification correlated to clinical pathological information, for indicating likelihood of recurrence of cancer.
DNA methylation, a type of chemical modification of DNA that can be inherited and subsequently removed without changing the original DNA sequence, is the most well studied epigenetic mechanism of gene regulation. There are areas in DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases called CpG islands.
It is known that DNA methylation of these islands, present in the promoter region, can act as a mechanism for gene silencing. Methods exist for experimentally finding the differential methylation, such as differential methylation hybridization, methylation specific sequencing, HELP assay, bisulfite sequencing, CpG island arrays etc.
CpG islands are generally heavily methylated in normal cells. However, during tumorigenesis, hypomethylation occurs at these islands, which may result in the expression of certain repeats. In addition, this hypomethylation correlates to DNA breaks and genome instability. These hypomethylation events also correlate to the severity of some cancers. Under certain circumstances, which may occur in pathologies such as cancer, imprinting, development, tissue specificity, or X chromosome inactivation, gene associated islands may be heavily methylated. Specifically, in cancer, methylation of islands proximal to tumor suppressors is a frequent event, often occurring when the second allele is lost by deletion (Loss of Heterozygosity, LOH). Some tumor suppressors commonly seen with methylated islands are p16, Rassf1a, BRCA1.
There are reported epigenetic markers for colorectal and prostate cancer. For example, Epigenomics AG (Berlin, Germany) has the Septin 9 as a marker for colorectal cancer screening in blood plasma. A method for using methylation sites to predict differential therapy responses in cancer and recommending an appropriate therapy has been disclosed in US20050021240A1. However, the results predicted by this method are limited.
Methods known within the art involves the use of immuno-histopathological variables such as tumor size, ER/PR status, lymph node negativity, etc. to define a clinical prognostic index such as the Nottingham Prognostic Index (NPI). The problem with such an index is that it has been shown to be very conservative, thus typically causing patients to receive aggressive therapy even when a low risk of disease recurrence exists.
An alternate method known within the art involves measurement of the expression levels of a large number of genes, typically around 70, and calculating a risk score based on the relative expression levels of the genes. These prognostic tests are not very specific and also remain very costly in terms of tissue handling requirements. Using RNA is difficult because RNA degrades much faster and needs more careful handling.
Hence, an improved method for obtaining statistically processed methylation data correlated to clinical pathological information would be advantageous and in particular a method allowing for increased flexibility, cost-effectiveness, and/or statistically correct prognosis data would be advantageous.
Accordingly, the present invention preferably seeks to mitigate, alleviate or eliminate one or more of the above-identified deficiencies in the art and disadvantages singly or in any combination and solves at least the above-mentioned problems by providing a method, and a sequence list according to the appended patent claims.
According to an aspect of the invention, a methylation classification list comprising loci DNA, for which loci the methylation status of the DNA is indicative of likelihood of recurrence of cancer, is provided. The methylation classification list comprises at least one sequence of the group comprising SEQ ID NO: 1 to SEQ ID NO: 252.
An advantage of the methylation classification list is that it allows for clinical prognostic tests that could be widely used in clinical practice.
In another aspect, a method for obtaining a methylation classification list comprising statistically processed methylation data correlated to clinical pathological information is provided. The method comprises at least the steps of providing tumour DNA from cancer patients with a known clinical pathological history. Then, the methylation status of the tumour DNA is analyzed, resulting in a methylation classification list. The list comprises a selection of the statistically processed methylation data, wherein the selection is suitable for predicting probability of relapse free survival of a subject. This is advantageous, since DNA methylation may be much more easily measured in the clinical setting compared to data such as gene expression, thus enabling a highly useful clinical prognostic test. A further advantage is that clinicians are able to robustly stratify patients into good or poor prognostic groups and thus make appropriate therapy choices using the discovered DNA methylation markers.
In yet another aspect, a method for predicting probability of relapse free survival of a subject diagnosed with cancer is provided. The method comprises creating a marker panel comprising at least one post from the methylation classification list, providing DNA from the subject, analysing the methylation status of the parts of the DNA from the subject, corresponding to the marker panel. The result is a local methylation classification list, comprising statistically processed methylation data. The local methylation classification list is statistically analysed, which gives a predicted probability of relapse free survival for the subject.
In another aspect, an apparatus for predicting probability of relapse free survival of a subject, who has been diagnosed with cancer, is provided. The apparatus comprises a first unit, creating a marker panel comprising at least one post from the methylation classification list. The apparatus also comprises a second unit, providing DNA from the subject and a third unit, analysing the methylation status of the parts of the DNA from the subject, corresponding to the marker panel. The output is a local methylation classification list comprising statistically processed methylation data. The apparatus further comprises a fourth unit, statistically analysing the local methylation classification list providing a predicted probability of relapse free survival for the subject. The units are operatively connected to each other.
In a further aspect, use of the methylation classification list, for predicting probability of relapse free survival of a subject diagnosed with cancer is disclosed.
Further embodiments of the invention are defined in the dependent claims and in the description of embodiments.
These and other aspects, features and advantages of which the invention is capable of will be apparent and elucidated from the following description of embodiments of the present invention, reference being made to the accompanying drawings, in which
Several embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in order for those skilled in the art to be able to carry out the invention. The invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. The embodiments do not limit the invention, but the invention is only limited by the appended patent claims. Furthermore, the terminology used in the detailed description of the particular embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention.
The following description focuses on an embodiment of the present invention applicable to a method for obtaining a methylation classification list comprising statistically processed methylation data correlated to clinical pathological information.
In an embodiment, according to
In an embodiment according to
Based on the methylation classification list, a marker panel is created by selecting at least one post from the methylation classification list. The selection of loci for the classification is based on the Kaplan-Meier Survivial estimate that is detailed below. In order to select the particular loci for the test from the table, a variety of criteria are used, such as P-value of the difference between methylation status and the likelihood of relapse. Tope performing loci are preferred;
Combination of two loci can be made by accounting for synergy between two loci in making a better prediction of relapse than single loci alone;
Performance and ease of methylation assay will be taken in to account in choosing one loci over the other; and
Other information such as tumor grade or size can be put into the classification scheme, but are not present in the table.
Next, DNA is provided, i.e. by performing extraction from the subject, e.g. from blood, tissue, urine, saliva etc. Extraction is performed according to methods well known to a person skilled in the art, such as ethanol precipitation or by using a DNeasy Blood & Tissue Kit from Qiagen. This results in subject DNA.
Then, the methylation status of each sequence of subject DNA, corresponding to the sequences in the marker panel is analysed using a method well known to the skilled artisan, such as differential methylation hybridization, methylation specific sequencing, HELP assay, bisulphite sequencing, or using a CpG island microarray. The result is a methylation list.
In an embodiment the methylation list is compared to the marker panel and the posts in the methylation list matching posts in the marker panel are selected. The methylation status of the selected posts, i.e. DNA sequences, is checked using a local methylation classification, further described below, thus creating a local methylation classification list. The local methylation classification list is then subject to a diagnostic multivariate analysis, further described below. The result of the multivariate analysis is a predicted probability of relapse free survival for the subject.
In order to find the loci with highest prognosis potential, a methylation classification list is constructed in the following manner. Extraction of DNA is performed according to methods well known to a person skilled in the art, such as ethanol precipitation or by using a DNeasy Blood & Tissue Kit from Qiagen. This results in classification DNA.
The methylation status of each sequence of classification DNA, each locus, is decided using a method well known to the skilled artisan, such as differential methylation hybridization, methylation specific sequencing, HELP assay, bisulphite sequencing, or using a CpG island microarray. The resulting methylation list, based on the classification DNA, is subject to methylation classification.
The methylation classification is performed with the Kaplan-Meier estimator of the survival function, as described below.
Of the 159,436 loci resulting from the 89 tumours, each locus is sorted binary, i.e. associated to a good or a bad prognosis. This is done by first classifying the methylation status of the specific locus as non-methylated, partially methylated or methylated. These three possible states of the locus correspond to three possible groupings of subjects.
The Kaplan-Meier estimator, well known to a person skilled in the art, uses the time to relapse for each patient within the above groupings and calculates the survival probability, S(t), which is, the probability that a patient within the grouping would survive without a relapse for a given length of time. Assuming there were N patients in a specific grouping and the observed time to recurrence for each of the N samples was:
t
1
≦t
2
≦t
3
. . . ≦t
N.
Corresponding to each time ti is ni the number of patients at risk of relapse just prior to ti, and di, the number of patients who experienced relapse at time ti. The Kaplan-Meier survival function is then defined as:
This Kaplan-Meier estimator is used to derive the recurrence-free survival function for each of the three groupings defined by each methylation locus. These survival functions, when plotted against time, give us survival curves. The survival curve has time on the x-axis and probability of recurrence-free survival on the y-axis. Thus, one survival curve is drawn for each grouping generated using the methylation status of a particular locus.
We then check for statistically significant differences between the three Kaplan-Meier survival curves for each locus using the log-rank or Mantel-Haenszel test of the difference in Kaplan-Meier curves. The log-rank test statistic compares estimates of the survival functions of any two groups at each observed event time. It is constructed by computing the observed and expected number of events in one of the groups at each observed event time and then adding these to obtain an overall summary across all time points where there is an event. Let j=1, . . . , J be the distinct times of observed relapse of cancer in any group. For each time, j, let N1i and N2j be the number of patients at risk of relapse in each group respectively. Let Nj=N1j+N2j. Let O1j and O2j be the number of relapses in the groups at time j respectively, and Oj=O1j+O2j. Given that Oj events happened across both groups at time j, the null hypothesis that the grouping was purely random, would have a hyper geometric distribution with:
mean equal to
and variance
The logrank statistic then compares each Oj to its expectation under the null hypothesis and is defined as:
The above Z-value can then be converted into a p-value, which is the probability that the survival functions are different purely by chance, by using the chi-squared statistic:
p=Pr(χ2(1)≧Z)
The p-value as calculated above gives the probability that the observed difference in the two survival curves is purely by chance. It is well known to a person skilled in the art that a p-value of 0.05 or lower is interpreted to suggest that one can be practically certain that the observed difference between the two curves is definitely not due to pure chance. This would suggest that any locus that achieves a p-value (statistical significance) of at least 0.05 or lower, is potentially a good biomarker for stratification of patients into good or poor prognosis groups. We evaluate all 159,436 loci in the above fashion. The loci with a statistical significance of at least 0.05 or lower are stored in a list, shown in table 1, along with their ability to stratify subjects into good or poor prognosis groups. The resulting methylation classification list is provided as SEQ ID NO: 1 to SEQ ID NO: 252. While the p-value is used as a means of including loci in the list, once a particular locus is included, the key elements are the survival curves associated with that locus. These survival curves provide the means to ascertain a patient's risk of relapse at any given point after initial diagnosis, and thus would be used in the embodiment of a diagnostic, as described in the diagnostic multivariate analysis section below.
The inventors have found that SEQ ID NO's: 135, 78, 230, 82, 120, 60, 75, 63 and 173 are advantageous. The loci of said sequences are surprisingly good biomarkers for stratification of patients into good or poor prognosis groups.
From the methylation classification list (12), a local methylation classification list (25) may be obtained according to the following. The methylation status is determined according to any method known in the art. Extraction of DNA is performed according to methods well known to a person skilled in the art, such as ethanol precipitation or by using a DNeasy Blood & Tissue Kit from Qiagen. From the extracted DNA, the methylation status of each sequence of classification DNA, each locus, is decided using a method well known to the skilled artisan, such as differential methylation hybridization, methylation specific sequencing, HELP assay, bisulphite sequencing, The results from these will be the methylation status of each of the assayed loci given in the form of a binary variable—0 or 1.
In an embodiment, Markers 1, 2, 5, 10 are selected from the methylation classification list. Then, DNA from the patient sample is evaluated and the methylation status for each of these loci corresponding to markers 1, 2, 5 and 10 is decided. The results are shown in table 2.
The methylation status values are then input into the risk model, detailed in section “Diagnostic Multivariate Analysis” and finally there is an output that gives the probability of relapse risk for the patient based on the measurement of methylation at these loci.
Any kind of markers may be selected from SEQ ID NO: 1 to SEQ ID NO: 252. The methylation status at each of those markers may then be measured and input into the classification model, which will give an output similar to the list shown in table 2.
In one embodiment of the invention, the diagnostic assay can include just one of the posts from the list of loci submitted, thus making it a univariate diagnostic assay. In this embodiment, upon diagnosis with breast cancer, a given patient will immediately undergo the diagnostic test as described above and the methylation level of the specific locus will be estimated. Depending on whether the methylation level is unmethylated, partially methylated or methylated, the patient would be placed in the appropriate grouping, thus suggesting that the patient's relapse-free survival function is similar to the one derived for that particular grouping and that specific locus in the list above.
For example, the survival function for locus i in the methylated state is Si=Methylated (t). The risk of relapse for the patient with this methylation status may be estimated from the above survival function as:
R(t)=Si=Methylated(t)
Therefore, if one wishes to give the patient a risk of relapse in 5 years, the above risk function is evaluated at t=5 years.
In another embodiment of the invention, the diagnostic assay could include several loci from the list as independent risk factors. These independent risk factors would be measured as described above and their individual methylation levels ascertained. The risk functions for each of the factors is then be extracted similar to the example described in the previous embodiment. These independent risks can then be combined using any number of approaches, one of which could be as follows.
Let Ri be the probability of relapse in 5 years for a given patient based on the methylation level mj of locus i, in a diagnostic test containing K loci. The total risk of relapse for the given patient may be calculated as:
In another embodiment, the risk assessment from individual loci in the diagnostic assay can be further combined with other risk factors such as age, tumor size, hormone status, etc. The risks from these individual factors can be combined just as above, assuming independence, or depending on further analysis, the factors can be combined in other ways to identify synergies amongst different risk factors, thus including that in the multivariate diagnosis.
In an embodiment, according to
Although the present invention has been described above with reference to specific embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the invention is limited only by the accompanying claims and, other embodiments than the specific above are equally possible within the scope of these appended claims.
In the claims, the term “comprises/comprising” does not exclude the presence of other elements or steps. Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly advantageously be combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. In addition, singular references do not exclude a plurality. The terms “a”, “an”, “first”, “second” etc do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2009/055909 | 12/22/2009 | WO | 00 | 6/22/2011 |
Number | Date | Country | |
---|---|---|---|
61140272 | Dec 2008 | US |