The present invention is related to cancer, more particularly to methods and markers for predicting whether paclitaxel would be effective for treating a tumour in a patient, and to methods and markers for screening drug candidates for paclitaxel-like tumour treating activity.
Cancer is the second most common cause of death in the Western world, where the lifetime risk of developing cancer is approximately 40%. The overall annual costs of cancer, measured in direct medical expenses and lost productivity, is increasing at an exponential rate. In 2008 costs were estimated to be $228 billion in the United States alone (La Thangue 2011). In general, one cancer drug is only effective in a small fraction (10-30%) of cancer patients (Sarker 2007). Therefore, predictive biomarker-driven cancer therapy could lead to a reduction in unnecessary treatment (reducing healthcare cost) and adverse effects.
Predictive biomarkers for drug response are sets of genes/proteins whose modulated levels could be used to determine whether a patient would or would not respond to a particular drug. Paclitaxel is a drug that targets a cancer cell's essential cell-cycle processes, and has become a first line drug for treating various cancers, for example breast cancer, ovarian cancer and prostate cancer. However, similar to other cancer drugs, only a small fraction of patients respond to paclitaxel treatment, for example only 20% of ER+ breast cancer patients and 30% of ERN triple negative breast cancer patients respond to paclitaxel. Therefore, it would be useful to have biomarkers to predict whether a patient would respond or not to treatment with paclitaxel. Current efforts have been made to identify such biomarkers; however, prediction rates are in the range of 50-60% (Hatzis 2011), which is still too low to be truly useful.
Recently, an algorithm (Multiple Survival Screening (MSS)) has been developed for identifying high-quality cancer prognostic markers and this algorithm was applied for identifying robust marker sets for breast cancer prognosis (Li 2010; Wang 2010).
There is a need to find new markers and develop new tests which are able to more accurately and robustly predict which patients would respond or not respond to paclitaxel or paclitaxel-like drug treatment.
It has now been found that marker sets consisting of particular genes differentially expressed in tumours advantageously provide improved accuracy of predicting effectiveness of paclitaxel or paclitaxel-like drug treatment against a cancer. These sets are further useful for screening drug candidates for paclitaxel-like tumour treatment activity. The marker sets of the present invention may be used in a clinical setting to provide information about the likelihood that a cancer patient would or would not respond to paclitaxel or paclitaxel-like drug treatment.
In one aspect of the present invention, there is provided a method of determining likelihood that a tumour in a patient would be treatable with paclitaxel or a paclitaxel-like drug, the method comprising: obtaining a gene expression list of a sample of the tumour or an extract of the tumour having message RNA therein of the patient; determining a gene expression profile of the sample from the gene expression list for genes of a gene marker set; and, comparing the gene expression profile of the sample to standardized “good” and “bad” profiles of the marker set to determine whether the gene expression profile of the sample predicts that the tumour is treatable or not treatable with paclitaxel or a paclitaxel-like drug, wherein “good” indicates that the tumour is likely treatable with paclitaxel or a paclitaxel-like drug and “bad” indicates that the tumour is not likely treatable with paclitaxel or a paclitaxel-like drug.
In a second aspect of the invention, there is provided a method of screening a chemical compound as a drug candidate with paclitaxel-like tumour-treating activity, the method comprising: determining a gene expression profile for genes of a gene marker set of a tumor sample treated with the chemical compound; and, comparing the gene expression profile of the sample to standardized “good” and “bad” profiles of the marker set to determine whether the gene expression profile of the sample predicts that the chemical compound would have paclitaxel-like tumour-treating activity, wherein “good” indicates that the chemical compound is likely to have paclitaxel-like tumour-treating activity and “bad” indicates that the tumour is not likely to have paclitaxel-like tumour-treating activity.
In methods of the present invention, the gene marker set is one or more of Set 1, Set 2, Set 3, Set 4, Set 5 and Set 6, wherein
cerevisiae)
The genes in the marker sets of the present invention are individually known and are individually known to be differentially expressed in tumour cells. How they are differentially expressed and whether their differential expression generally correlates to “good” or “bad” paclitaxel tumour-treating activity can also be determined from publicly available datasets. However, the specific combination of the genes in each marker set of the present invention unexpectedly provides for more robust marker sets having improved accuracy for prediction of whether or not paclitaxel is likely to be effective in treating the tumour. The marker sets of the present invention consisting of the specific combination of genes that gives rise to the improved predictive accuracy may be generated using the Multiple Survival Screening (MSS) method previously developed (Li 2010; Wang 2010).
Paclitaxel is a mitotic inhibitor. It stabilizes microtubules and as a result, interferes with the normal breakdown of microtubules during cell division. Paclitaxel-treated cells have defects in mitotic spindle assembly, chromosome segregation, and cell division. Unlike other tubulin-targeting drugs such as colchicine that inhibit microtubule assembly, paclitaxel stabilizes the microtubule polymer and protects it from disassembly. Chromosomes are thus unable to achieve a metaphase spindle configuration. This blocks progression of mitosis, and prolonged activation of the mitotic checkpoint triggers apoptosis or reversion to the G-phase of the cell cycle without cell division. The ability of paclitaxel to inhibit spindle function is generally attributed to its suppression of microtubule dynamics, however that suppression of dynamics occurs at concentrations lower than those needed to block mitosis. At the higher therapeutic concentrations, paclitaxel appears to suppress microtubule detachment from centrosomes, a process normally activated during mitosis. The binding site for paclitaxel has been identified on the beta-tubulin subunit. Paclitaxel-like drugs have a similar mechanism of action as paclitaxel. Paclitaxel-like drugs include, for example, paclitaxel derivatives (e.g. DHA-paclitaxel, PG-paclitaxel) and other taxanes (e.g. docetaxel).
The sample comprises a sample of the tumour of the patient or an extract thereof, which contains the genes in the marker set or message RNA that hybridizes to the genes in the marker set. Preferably, the sample comprises a sample of the tumour of the patient. The tumour is preferably a breast tumour, ovarian tumor, lung tumour or prostate tumour, more preferably a breast tumour (e.g. estrogen receptor positive (ER+); estrogen receptor negative (ERN triple negative), etc).
Preferably, three marker sets are used together to make predictions. Thus, gene expression profiles of the sample are preferably determined for the genes in each of Sets 1, 2 and 3, or each of Sets 4, 5 and 6. Sets 1, 2 and 3 are particularly useful for determining the effectiveness of paclitaxel for treating ER+ tumours. Sets 4, 5 and 6 are particularly useful for determining the effectiveness of paclitaxel for treating ERN triple negative tumours. In this case, the gene expression profiles are compared to standardized “good” and “bad” profiles of each respective gene marker set to determine whether each of the gene expression profiles predicts that the effectiveness of paclitaxel is “good” or “bad”. If all three marker sets predict that the effectiveness is “good” then the patient is predicted to be a suitable candidate for paclitaxel cancer treatment. If all three marker sets predict that the effectiveness is “bad” then the patient is predicted to be a bad candidate for paclitaxel cancer treatment. If one or two of the marker sets predict that the effectiveness is “good” or one or two of the marker sets predict that the effectiveness is “bad” then the patient is predicted to be an uncertain candidate for paclitaxel cancer treatment. Using all three marker sets improves accuracy of the prediction.
In a particular embodiment, each gene in the gene expression profile has a gene expression value and a modified gene expression profile is obtained by multiplying the gene expression value by its marker-factor. Standardized “good” and “bad” profiles are determined by computing standardized centroids for both “good” and “bad” classes using prediction analysis for microarrays method (Tibshirani 2002). Modified class centroids of the marker set are obtained by multiplying the standardized centroids for each class by the marker-factor. The modified gene expression profile of the sample is compared to each modified class centroid to determine if paclitaxel effectiveness is “good” or “bad”. The class whose centroid is closest to the modified gene expression profile, in Pearson correlation distance, is predicted to be the class for the sample.
Gene expression profiles of a patient's tumour may be readily obtained by any number of methods known in the art, for example microarray analysis, individual gene or RNA screening (e.g. by PCR or real time PCR), diagnostic panels, mini chips, NanoString chips, RNA-seq chips, protein chips, ELISA tests, etc. In a preferred embodiment, a sample may be obtained from a patient by any suitable means, for example, with a syringe or other fluid and/or tissue separation means. The sample may be screened against a microarray on which gene probes of the marker sets are printed. An output of the gene expression profile of the sample is preferably obtained before comparing the gene expression profile to the standardized “good” and “bad” profiles of the marker set. To obtain the output, message RNA in the sample may be hybridized to the genes on the microarray, the hybridized microarray may be scanned to get all the readouts of marker genes for the sample, the readouts may be normalized and the gene expression profile of the marker set for the sample is thereby obtained. Detailed information for making microarray gene chip, scanning and normalization of array data is generally known in the art and can be found in the publicly available literature (http://en.wikipedia.org/wiki/DNA_microarray). It is also possible to obtain the gene expression profile by RNA-sequencing and related sequencing technologies as these technologies become more accessible (http://en.wikipedia.org/wiki/RNA-Seq).
In another embodiment, kits or commercial packages are provided, which comprise gene probes for each of the genes in a gene marker set of the present invention along with instructions for obtaining a gene expression profile of a sample for the gene marker set. The kit or commercial package may further comprise instructions for comparing the gene expression profile of the sample to standardized “good” and “bad” profiles of the marker set to determine whether the gene expression profile of the sample predicts that paclitaxel effectiveness is “good” or “bad”. Preferably, the kit or commercial package comprises gene probes for at least three gene marker sets of the present invention. The kit or commercial package may further comprise means for obtaining a sample of a tumour having message RNA therein from a patient, for example suitable syringes, fluid and/or tissue separation means, etc. In addition to the gene probes, the kit or commercial package may further comprise reagents and/or equipment useful for screening the sample against the gene probes for obtaining the gene expression profile of the sample. Various standard elements of such kits or commercial packages are generally known in the art.
Further features of the invention will be described or will become apparent in the course of the following detailed description.
To develop ER+ cancer marker sets of the present invention, the Multiple Survival Screening (MSS) method (Li 2010; Wang 2010) was used. In applying this method, a training set of 260 ER+ breast cancer samples was selected from a public metadata set (GEO GSE4779, GSE20194, GSE20271, GSE22093 and GSE23988). Each patient has been treated with paclitaxel and followed-up pathologically to determine who is responsive to the treatment. The primary tumors prior to any drug treatment have been microarray profiled. The datasets contain information about gene expression profiles for patient primary tumours and the information of response/non-response for paclitaxel treatment for each patient. Datasets identify whether each of these genes is up-regulated or down-regulated in tumours and correlates these genes with responsiveness to paclitaxel treatment (i.e. “good” vs. “bad”).
100 samples from the datasets were randomly selected in which 70 were samples that did not respond to paclitaxel treatment (“bad”) and 30 were samples that did respond to paclitaxel treatment (“good”). Array-wide single-gene based clustering (using fuzzy clustering method, http://stat.ethz.ch/R-manual/R-patched/library/cluster/html/fanny.html) of responsive/non-responsive was conducted to obtain effectiveness genes, which are genes whose differential expression values are correlated with effective paclitaxel treatment. It is not relevant whether the expression of each gene is upregulated or downregulated so long as the differential expression is correlated to effective paclitaxel treatment. Selection of samples and array-wide single-gene based clustering analyses (using fuzzy clustering method, http://stat.ethz.ch/R-manual/R-patched/library/cluster/html/fanny.html) were repeated 100 times, and the effectiveness genes (which have P value <0.05 in more than 75 out of the 100 times) from each of the 100 repetitions were merged.
Using the effectiveness gene set, Gene Ontology (GO) analysis (using GO annotation software, David, http://david.abcc.ncifcrf.gov/) was performed to identify only those genes that belong to GO terms that are known to be associated with cancer, such as apoptosis, response to wounding, DNA replication and transcription repair, mitosis and immune response. Table 1 lists the ER+ cancer-related GO term gene sets. Two million distinct random-gene-sets were generated by randomly picking 30 genes from each ER+ cancer-related GO term gene set.
Of 83 samples (58 with no response to paclitaxel treatment and 25 that responded to paclitaxel treatment) selected from the dataset to form the training set, 36 random datasets were generated. For a given GO term gene set, paclitaxel effectiveness screening was then conducted using the 2 million random-gene-sets against all the 36 random datasets. For each random dataset, the statistical significance of the correlation between the expression values of each random-gene-set (30 genes) and paclitaxel effectiveness status (“good” or “bad”) was examined by fuzzy clustering analysis (using fuzzy clustering method, http://stat.ethz.ch/R-manual/R-patched/library/cluster/html/fanny.html). If the P value was less than a cut-off for an effectiveness screening using one random-gene-set against one random dataset, that random-gene-set was said to have passed. When a few thousands of random-gene-sets had passed 32 or more random datasets (the detailed parameters are shown in Table 2), the random-gene-sets that had passed were retained for further analysis. The genes in the retained random-gene-sets were then ranked based on their frequency of appearance in the passed random-gene-sets. The top 30 genes were chosen as a potential-marker-set. A similar effectiveness screening of random-gene-sets against random datasets was performed for each of the other selected GO term gene sets. Only apoptosis, mitosis and immune response GO term gene sets were used to generate the ER+ marker sets.
For each GO term gene set used, another 1 million distinct random-gene-sets were generated and the clustering process using the random datasets mentioned above was repeated. If the gene members for the top 30 were substantially the same as those in the potential-marker-set generated by the first screening, then the potential-marker-set is stable and can be used as a real ER+ cancer marker set. If the genes for the two potential marker sets were not substantially the same, then these GO term genes are unsuitable for finding a real marker set and the potential marker set was dropped from further analysis.
In this way, three ER+ cancer marker sets were generated having stable signatures, one related to apoptosis (Set 1), one related to mitosis (Set 2) and one related to immune response (Set 3). The genes, EntrezGene ID and full names of the genes in each of the three marker sets are given above. More details of each gene, including the nucleotide sequence of each gene, are known in the art and may be conveniently found in the National Center for Biotechnology Information (NCBI) Databases at http://www.ncbi.nlm.nih.gov/.
To develop ERN (estrogen receptor negative) cancer marker sets of the present invention, the Multiple Survival Screening (MSS) method (Li 2010; Wang 2010) was used. In applying this method, a training set of 202 ERN breast cancer samples was selected from GSE25066 dataset (Hatzis 2011). The dataset contains information which is the same as those described above (the ER+ datasets). 153 samples from the dataset were randomly selected in which 100 were samples that did not respond to paclitaxel treatment (“bad”) and 53 were samples that did respond to paclitaxel treatment (“good”). Array-wide single-gene based fuzzy clustering (using fuzzy clustering method, http://stat.ethz.ch/R-manual/R-patched/library/cluster/html/fanny.html) screening of responsive/non-responsive samples was performed to obtain effectiveness genes, which are genes whose differential expression values are correlated with effective paclitaxel treatment. It is not relevant whether the expression of each gene is upregulated or downregulated so long as the differential expression is correlated to effective paclitaxel treatment. Selection of samples and array-wide screening were repeated 3 times, and effectiveness genes (P value <0.05) from each of the 3 repetitions were merged. Using the effectiveness gene set, Gene Ontology (GO) analysis (using GO annotation software, David, http://david.abcc.ncifcrf.gov/) was performed to identify only those genes that belong to GO terms that are known to be associated with cancer, such as apoptosis, cell cycle, cell adhesion, response, DNA repair & replication and mitosis. Table 3 lists the ERN cancer-related GO term gene sets. Two million distinct random-gene-sets were generated by randomly picking 30 genes from each ERN cancer-related GO term gene set.
Of 152 samples (99 with no response to paclitaxel treatment and 53 that responded to paclitaxel treatment) selected from the dataset to form the training set, 36 random datasets were generated. For a given GO term gene set, paclitaxel effectiveness screening was then conducted using the 1 million random-gene-sets against all the 36 random datasets. For each random dataset, the statistical significance of the correlation between the expression values of each random-gene-set (30 genes) and paclitaxel effectiveness status (“good” or “bad”) was examined by fuzzy clustering analysis (using fuzzy clustering method, http://stat.ethz.ch/R-manual/R-patched/library/cluster/html/fanny.html). If the P value was less than a cut-off for an effectiveness screening using one random-gene-set against one random dataset, that random-gene-set was said to have passed. When a few thousands of random-gene-sets had passed 32 or more random datasets (the detailed parameters are shown in Table 4), the random-gene-sets that had passed were retained for further analysis. The genes in the retained random-gene-sets were then ranked based on their frequency of appearance in the passed random-gene-sets. The top 30 genes were chosen as a potential-marker-set. A similar effectiveness screening of random-gene-sets against random datasets was performed for each of the other selected GO term gene sets. Only apoptosis, cell adhesion and response GO term gene sets were used to generate the ERN marker sets.
For each GO term gene set used, another 1 million distinct random-gene-sets were generated and the survival screening process using the random datasets mentioned above was repeated. If the gene members for the top 30 were substantially the same as those in the potential-marker-set generated by the first screening, then the potential-marker-set is stable and can be used as a real ERN cancer marker set. If the genes for the two potential marker sets were not substantially the same, then these GO term genes are unsuitable for finding a real marker set and the potential marker set was dropped from further analysis.
In this way, three ERN cancer marker sets were generated having stable signatures, one related to apoptosis (Set 4), one related to cell adhesion (Set 5) and one related to response to stimulus (Set 6). The genes, EntrezGene ID and full names of the genes in each of the three marker sets are given above. More details of each gene, including the nucleotide sequence of each gene, are known in the art and may be conveniently found in the National Center for Biotechnology Information (NCBI) Databases at http://www.ncbi.nlm.nih.gov/.
The effectiveness of the marker sets generated in Examples 1 and 2 was validated against datasets containing breast cancer gene expression data from sample populations. Sets 1, 2 and 3 from Example 1 were validated against metadata from public data (GSE4779, GSE20194, GSE20271, GSE22093 and GSE23988) and against the GSE25066 dataset (Hatzis 2011). Sets 4, 5 and 6 from Example 2 were validated against the GSE25066 dataset (ERN, 87% triple negative) (Hatzis 2011), the GSE20174 dataset (triple negative) (Zeidler-Erdely 2010), and the GSE20194 dataset (triple negative) (Popovici 2010; Shi 2010).
To perform the validation for a given test dataset containing ‘n’ samples, the gene expression profile of the marker set was extracted. For each gene expression value its marker-factor was multiplied to obtain a modified gene expression profile of the testing sample. Standardized centroids were computed for both “good” and “bad” classes from n−1 samples for the marker set using the Prediction Analysis for Microarrays (PAM) method (Tibshirani 2002). The marker-factor of each gene was multiplied to the class centroids to get modified class centroids of the marker set. For predicting the paclitaxel response of the targeted testing sample using the marker set, the modified gene expression profile of the sample was compared to each of these modified class centroids. The class whose centroid that it is closest to, in Pearson correlation distance, is the predicted class for that sample. If the sample is predicted to be unresponsive to paclitaxel treatment (i.e. “bad”), it is denoted as 0, otherwise it is denoted as 1. If all three marker sets (Sets 1, 2 and 3, or Sets 4, 5 and 6) predict that a particular sample is unresponsive to paclitaxel (i.e. denoted as 0 for all 3 marker sets), the sample is assigned to a paclitaxel unresponsive group (i.e. “bad”). If all three marker sets predict that a particular sample is responsive to paclitaxel (i.e. denoted as 1 for all 3 marker sets), the sample is assigned to a paclitaxel responsive group (i.e. “good”). If a sample is not assigned to either of these groups, it is assigned to an indeterminate group.
This validation process was carried out in each of the test datasets. Table 5 shows the accuracy for Sets 1, 2 and 3 in predicting the paclitaxel unresponsive group in the metadata from public data dataset and the GSE25066 dataset. Table 6 shows the accuracy for Sets 4, 5 and 6 in predicting the paclitaxel unresponsive group in the GSE25066 dataset, the GSE20174 dataset and the GSE20194 dataset. The accuracy of the marker sets against the test datasets is remarkably high, and much higher than the 50-60% that can be achieved using current prior art marker sets (Hatzis 2011).
The contents of the entirety of each of which are incorporated by this reference.
Other advantages that are inherent to the structure are obvious to one skilled in the art. The embodiments are described herein illustratively and are not meant to limit the scope of the invention as claimed. Variations of the foregoing embodiments will be evident to a person of ordinary skill and are intended by the inventor to be encompassed by the following claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/563,929 filed Nov. 28, 2011, the entire contents of which is herein incorporated by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CA2012/001087 | 11/27/2012 | WO | 00 | 5/28/2014 |
Number | Date | Country | |
---|---|---|---|
61563929 | Nov 2011 | US |