I. Field of the Invention
Embodiments of this invention are directed generally to biology and medicine. In certain aspects the invention relates to a gene set whose levels of expression are evaluated and used to prognose and/or derive a survival indicator for a patient who has undergone therapy, who is undergoing therapy, or who is a candidate for therapy.
II. Background
There are four main approaches to improving the ability to predict responsiveness to therapies. One approach is a standard predictive or chemopredictive study focused on treatment, in which a sufficiently powered discovery population of subjects is used to define a predictive test that must then be proven to be accurate in a similarly sized validation population (Ransohoff, 2005; Ransohoff 2004). Several studies have used this approach to define predictive genes for adjuvant tamoxifen therapy (Ma et al., 2004; Jansen et al., 2005; Loi et al., 2005). There are advantages to this approach, particularly when samples are available from mature studies for retrospective analysis. But two disadvantages are that the study design is empirical and that adjuvant (post surgery) treatment introduces surgery as a confounding variable, because it is impossible to ever know which patients were cured by their surgery and would never relapse, irrespective of their sensitivity to systemic therapy. Neoadjuvant chemotherapy trials enable a direct comparison of tumor characteristics with pathologic response to the specific therapy (Ayers et al., 2004).
In medicine today, doctors search for methods of predicting how a patient (given their condition) may respond to treatment. Symptoms and tests may indicate favorable treatment with standard therapies. Likewise, a number of symptoms, health factors, and tests may indicate a less favorable treatment result with standard treatment—this may indicate that a more aggressive treatment plan may be desired. Prognostic scoring is also used for cancer outcome predictions.
Although pathologic complete response (pCR) has been adopted as the primary endpoint for neoadjuvant trials because it is associated with long-term survival, it has not been uniformly or consistently defined (Bear, 2006; Carey, 2005; Hennessy, 2005;
Kaufmann, 2006; Kuroi, 2005; Kurosumi, 2004; Rajan, 2004; von Minckwitz, 2005). While it is generally agreed that a definition of pCR should include patients without residual invasive carcinoma in the breast (pT0), the presence of nodal metastasis, minimal residual cellularity, and residual in situ carcinoma are not consistently stated as either pCR or residual disease (RD) (Bear, 2006; Kaufmann, 2006; Hennessy, 2005; Rajan, 2004). Therefore, dichotomization of response as pCR or residual disease (RD) may be simplistic for the objective of assay discovery and validation, particularly because residual disease (RD) after neoadjuvant treatment includes a broad range of actual tumor shrinkage. In some patients who are categorized as RD but actually show minimal residual disease, the response outcome blurs the prognostic distinction between pCR and RD. On the other hand, it should be possible to clearly identify patients within RD who are resistant to treatment in order to develop management strategies for this adverse outcome.
Expression markers are chosen for the ability to classify and/or identify patients as to probability for response (or non response) to therapy. Response to therapy is commonly classified by the RECIST criteria established by the World Health Organization, the National Cancer Institute and the European Organization for Research and Treatment of Cancer. The RECIST criteria classify response as progressive disease (PD), stable disease (SD), partial response (PR), and complete response (CR). A good response is typically considered to include PR+CR (collectively referred to herein as Objective Response).
Certain aspects of the invention include methods of evaluating a cancer patient comprising one or more of the steps of (a) evaluating gene expression levels in a patient sample comprising cancer cells or an RNA sample isolated from one or more a patient samples, wherein a plurality of genes to be evaluated are selected from 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, or all of the genes identified in Table 2, Table 3, and Table 4, including all ranges and values there between and all subsets and combinations thereof (5, 10, 15, 20, 25, 100 or more such genes can be specifically excluded, including all values and ranges there between); (b) calculating a predictor score using a gene expression profile index; and (c) assessing the likelihood of a therapeutic outcome using the predictor score. The method may further comprise classifying a patient prior to evaluation. In certain aspects classification can include identifying a cancer patient with a disease state classified as a residual disease state or other clinically defined state prior to evaluation. In certain aspects, a predictor includes but is not limited to a measure for distant relapse-free survival (DRFS).
In still a further aspect, a gene expression index comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150 or all of the genes identified in Table 2, Table 3, and Table 4 including all values and ranges there between as well as a number of subsets of these genes which may include some genes from one or more tables and exclude others from the same table or other tables.
In other aspects, a patient may be stratified or analyzed by using other factors such as protein expression, demographic information, family history, and other biological or medical states. The method may include determining Her2-neu and/or estrogen receptor status of the patient sample and/or evaluation of tumor size, cellularity of tumor bed, and/or nodal burden to name a few.
The methods may also provide a treatment recommendation depending on the assessment derived from analysis of the gene expression profile as well as other factors. In certain aspects the recommendation may be based on residual cancer burden (RCB) classification or the like. A treatment is typically a standard treatment or a more aggressive non-standard treatment depending on the analysis. For example a treatment may be combination of one or more cancer therapies, such as hormonal therapy and/or chemotherapy. Hormonal therapy includes, but is not limited to tamoxifen therapy, aromatase inhibitor therapy, or SERM therapy.
In other aspects, preparing a gene expression index can include one or more of the following steps: (a) obtaining data associated with a plurality of cancer patients, such as breast cancer, melanoma, ovarian cancer, testicular cancer or the like comprising measuring expression levels of a plurality of genes in samples from a plurality of patients; (b) partitioning the data into a first and second dataset; (c) evaluating the data and identifying data associated with a particular treatment outcome; (d) selecting a set of genes whose expression levels are indicative of therapeutic outcome. In one aspect, the index includes evaluation of survival of the patient population sampled for all or part of the reference population of tumor samples such as the distant relapse-free survival (DRFS) of the patient population.
Other aspects of the invention include kits to determine responsiveness of a cancer or cancer patient to a treatment or therapy comprising one or more of (a) reagents for determining expression levels of a plurality of genes selected from Table 2, Table 3, and Table 4 or combinations thereof, such as probe sets that identify and measure the levels of gene transcripts, transcription, or protein levels; and software encoding methods for designing, gathering, inputting, analyzing and/or assessing various data, which includes an algorithm for calculating a predictor score based on the analysis of the gene expression levels.
In still other aspects the invention includes an apparatus, or system for providing assessment of a sample relative to a gene expression index, the system comprising (a) an application server comprising an input manager to receive expression data from a user for a plurality of genes selected from Table 2, Table 3, and Table 4 or combinations thereof obtained from a patient sample or an RNA sample from such patient sample; and (b) a network server comprising an output manager constructed and arranged to provide an assessment to the user.
In yet another aspect the invention includes a computer readable medium having software modules for performing the one or more of the methods described herein comprising the acts of: (a) comparing gene expression data obtained from a patient sample for a plurality of genes selected from Table 2, Table 3, and Table 4 or combinations thereof with a reference; and (b) providing a predictor score to a physician for use in determining an appropriate therapeutic regimen for a patient.
In still yet another aspect the invention includes a computer system, having a processor, memory, external data storage, input/output mechanisms, a display, for performing the method of the invention, comprising (a) a database; (b) logic mechanisms in the computer for generating the transcriptional profile index; and (c) a comparing mechanism in the computer for comparing the gene expression reference to expression data from a patient sample or an RNA sample from such a patient sample to calculate a predictor score.
An internet accessible portal may be use to provide biological information constructed and arranged to execute a computer-implemented methods for providing: (a) a comparison of gene expression data of a plurality of genes of claim 1 in a patient sample with a transcriptional profile index; and (b) providing a predictor score to a physician for use in determining an appropriate therapeutic regime for a patient.
Other embodiments of the invention are discussed throughout this application. Any embodiment discussed with respect to one aspect of the invention applies to other aspects of the invention as well and vice versa. The embodiments in the Example section are understood to be embodiments of the invention that are applicable to all aspects of the invention.
The terms “inhibiting,” “reducing,” or “prevention,” or any variation of these terms, when used in the claims and/or the specification includes any measurable decrease or complete inhibition to achieve a desired result.
The use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.”
Throughout this application, the term “about” is used to indicate that a value includes the standard deviation of error for the device or method being employed to determine the value.
The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.”
As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
Despite the critical importance of selecting the most effective adjuvant/neoadjuvant chemotherapy for an individual, diagnostic tests to guide selection of the optimal regimen for a particular patient continue to be inadequate (Carlson, 2000; Goldhirsch, 2003). Estrogen receptor (ER) negative status, high grade and high proliferative activity are histological characteristics that tend to indicate more chemotherapy sensitive cancer (Bast, 2001; Ross, 2003; Rouzier, 2005). However, although these clinicopathologic variables may identify eligibility or predict general chemotherapy sensitivity, they have little potential to guide selection of a specific treatment regimen in standard-of-care practice.
The limited utility of individual markers to predict clinical outcome of cancer may be due to the incomplete understanding of the function of these markers. In addition, biologically important molecules act in concert and form complex, interactive pathways where an individual molecule may only contribute limited information on the functional activity of a whole pathway. The promise of microarray technology is that, by assessing the transcriptional activity of a large number of genes, the complex gene-expression profile may contain more information than any individual marker that contributes to it.
There are examples indicating that the molecular classification of cancer based on gene-expression profiles could be important in framing patient management strategies. Unsupervised clustering of breast cancer specimens consistently separated tumors into ER+ and ER− clusters (Gruvberger, 2001; Perou, 2000; Pusztai, 2003). Analysis of gene-expression profiles also distinguished sporadic breast cancers from breast cancer gene, BRCA, mutant cases (Hedenfalk, 2001). Transcriptional profiles have also revealed previously unrecognized molecular subgroups within existing histological categories in breast cancer (Perou, 2000), diffuse large-B-cell lymphoma, and soft tissue and central nervous system embryonal tumors (Nielsen, 2002; Pomeroy, 2002). In addition, gene-expression profiles have been shown to predict survival of patients with node-negative breast cancer (van de Vijver, 2002; van't Veer, 2002), lymphoma (Alizadeh, 2000; Rosenwald, 2002), renal cancer (Takahashi, 2001), and lung cancer (Beer, 2002).
Previous efforts into applying gene expression-based predictors in breast cancer have focused largely on predicting a patient's risk of cancer recurrence in the event of either receiving no systemic treatment after surgery (van de Vijver, 2002; van't Veer, 2002; Wang, 2005) or receiving tamoxifen, a hormonal therapy agent, for 5 years after surgery (Paik, 2006; Paik, 2004; Ma, 2006; Davis, 2007). These gene-based predictors do not directly address the need or the responsiveness to chemotherapy although a high risk of recurrence may indirectly suggest the general consideration of chemotherapy among the available options for patient management.
Other research efforts have also reported gene-based predictors of response to standard breast cancer treatments (Ayers, 2004; Bild, 2006; Chang, 2003; Hess, 2006; Modlich, 2006) although these are not commercially marketed yet as assays. Some of these predictors are developed using patient tissue samples treated clinically with a specific chemotherapy regimen and subsequently comparing genomic profiles of responders versus non-responders using survival-driven endpoints (Ayers, 2004; Chang, 2003; Hess, 2006; Modlich, 2006) whereas others are focused on analyses of changes in genes within breast cancer cell lines that are treated in vitro with single standard therapeutic agents (Bild, 2006).
As an in vivo model for marker development and validation, neoadjuvant (preoperative) chemotherapy provides an opportunity to gain access to samples that directly describe tumor response to therapy. Furthermore, complete eradication of all invasive cancer from the breast and regional lymph nodes, called pathologic complete response (pCR), is associated with excellent long-term cancer-free survival (Fisher, 1998; Kuerer, 1999). Therefore, the goal in developing treatment-directed response markers is to evaluate gene expression profiles in order to predict who may achieve pCR versus residual disease (RD). Pathologic CR is a meaningful clinical end-point to predict because these patients experience prolonged disease-free and overall survival compared to patients with lesser response (Cleator, 2005; Fisher, 1998; Kaufmann, 2006; Wolmark, 2001). Good survival in these patients reflects benefit from chemotherapy since most clinical and gene expression variables that are associated with pCR high grade, ER-negative status, high OncotypeDX recurrence score) tend to predict worse prognosis in the absence of chemotherapy (Paik, 2006; Paik, 2004).
Previous work has demonstrated the development and validation of a 30-probe genomic predictor for response to a taxane-containing chemotherapy (Ayers, 2004; Hess, 2006). The treatment administered in the neoadjuvant setting was sequential paclitaxel anthracycline preoperative chemotherapy (T/FAC). A complex multidrug regimen was selected for study because combination chemotherapy represents the current clinical standard for patients who require systemic cytotoxic treatment. Also, studies that explore gene signatures for response to individual drugs may not fully capture sensitivity to combination chemotherapy as practiced in standard-of-care.
A cohort of 82 patients was used for predictor discovery of pCR to preoperative T/FAC chemotherapy using fine needle biopsies taken before treatment and by analyzing gene profiles generated from a commercially available standard gene expression profiling technology (Affymetrix, Santa Clara, Calif.). Although several analytic techniques and resulting gene sets for response prediction were studied, the nominally best predictor for pCR with the least number of genes, called DLDA-30, was selected for independent validation in 51 additional patients. The predictor showed substantially higher sensitivity (a measure of how well a predictor identifies responsiveness or non-responsiveness to a therapy, e.g., true positives/(true positives+false negatives)) (92% vs. 61%) and slightly better negative predictive value (NPV, the proportion of patients with negative test results who are correctly diagnosed.) (96% vs. 86%) than a clinical predictor based on ER, grade and age (Hess, 2006). The positive predictive value (PPV, is the proportion of patients with positive test results who are correctly diagnosed.) of the genomic predictor at 52% (95 CI: 30%-73%), was significantly higher than the baseline 26% pCR rate in unselected patients. A sensitivity of 100% means that the test recognizes all patient as either responsive to therapy or non-responsive to therapy. Typically, sensitivity alone does not tell us how well the test predicts other classes (that is, about the negative cases). Sensitivity is not the same as the positive predictive value (ratio of true positives to combined true and false positives), which is as much a statement about the proportion of actual positives in the population being tested as it is about the test. The calculation of sensitivity typically does not take into account indeterminate test results. If a test cannot be repeated, the options are to exclude indeterminate samples from analyses (but the number of exclusions should be stated when quoting sensitivity), or, alternatively, indeterminate samples can be treated as false negatives (which gives the worst-case value for sensitivity and may therefore underestimate it).
Although this predictor and others described in literature (Chang, 2003; Modlich, 2006) may help define a patient population that is more likely to achieve pCR than the general patient population, further developments can help refine prediction of treatment response considerably. Although pCR as a response endpoint is strongly correlated with high treatment-related survival, patients with residual disease (RD) after treatment encompass a wide range of outcomes ranging from very good prognosis (“near-pCR”) to drug resistance. Predictors that can better classify response outcomes to capture and differentiate the high responders and non-responders within the spectrum of residual disease could significantly benefit patient management.
Although pathologic complete response (pCR) has been adopted as the primary endpoint for neoadjuvant trials because it is associated with long-term survival, it has not been uniformly or consistently defined (Bear, 2006; Carey, 2005; Hennessy, 2005; Kaufmann, 2006; Kuroi, 2005; Kurosumi, 2004; Rajan, 2004; von Minckwitz, 2005). While it is generally agreed that a definition of pCR should include patients without residual invasive carcinoma in the breast (pT0), the presence of nodal metastasis, minimal residual cellularity, and residual in situ carcinoma are not consistently stated as either pCR or residual disease (RD) (Bear, 2006; Kaufmann, 2006; Hennessy, 2005; Rajan, 2004). Therefore, dichotomization of response as pCR or residual disease (RD) may be simplistic for the objective of assay discovery and validation, particularly because residual disease (RD) after neoadjuvant treatment includes a broad range of actual tumor shrinkage. In some patients who are categorized as RD but actually show minimal residual disease, the response outcome blurs the prognostic distinction between pCR and RD. On the other hand, it should be possible to clearly identify patients within RD who are resistant to treatment in order to develop management strategies for this adverse outcome.
A measure of residual disease or residual cancer burden (RCB), previously developed and reported, may be useful as a variable to characterize response to treatment (Symmans et al., 2007). This measure is derived from the primary tumor dimensions, cellularity of the tumor bed, and axillary nodal burden. Each component contributes meaningful pathologic information and can be obtained using routine pathologic materials and methods of interpretation that could easily be implemented in routine diagnostic practice. RCB measurements can provide a continuous parameter of residual disease and thus of response or resistance, so that all subject responses contribute to the analysis.
RCB is divided into four survival-related classes (RCB-0 to RCB-III) where patients with minimal residual disease (RCB-I) have the same 5-year relapse-free survival as those with pCR (RCB-0), irrespective of the type of neoadjuvant chemotherapy administered, adjuvant hormonal therapy or the pathologic stage of RD. Therefore, the combination of RCB-0 (pCR) and RCB-I expands the subset of patients who can be identified as having “good response” and to have benefited from the chemotherapy. Extensive residual disease (RCB-III), on the other hand, is associated with poor prognosis, irrespective of the type of neoadjuvant chemotherapy administered, adjuvant hormonal therapy, or the pathologic stage of RD. In particular, all patients with RCB-III after T/FAC chemotherapy, who did not receive adjuvant hormonal therapy, suffered distant relapse within 3 years (Symmans et al., 2007). This identifies an important subset of patients who are not responsive to chemotherapy, or with residual disease (after surgery) that is too extensive to be controlled by hormonal therapy alone.
Therefore, residual cancer burden (RCB) is an informative tool and a metric to help develop response predictors based on better characterization of likely treatment outcomes. RCB categories can be employed with existing methods to define surrogate endpoints from neoadjuvant trials. As a metric correlated with survival, RCB is strongly and independently prognostic and the classes of RCB capture distinct sets of survival-based outcomes. Development of a predictor that reports likelihood of a patient's tumor post-treatment to belong to one of the RCB classes, rather than simply pCR as an endpoint, can yield valuable diagnostic information for efficient treatment management. In certain aspects, predictors specific to RCB-0 (pCR or complete response), RCB-0/I (pCR+near-pCR called good response) and RCB-III (resistance) are developed. In certain aspects of the methods described, the inventors have also accounted for tumor sub-types based on the status of two receptors, Her2-neu and ER, allowing for the predictors to capture heterogeneity within breast cancers and achieve acceptable diagnostic performance.
Sets of genes are defined that are prognostic, diagnostic, or predictive or indicative of the outcome for a cancer patient. These genes can be incorporated into an index or predictor of such an outcome and used in the management of the treatment for a given patient. Prognosis is a medical term denoting the doctor's prediction of how a patient's disease will progress, and whether there is chance of recovery.
Outcome can be represented in various forms to indicate probability of survival or likely survival outcome. In biostatistics, survival rate is a part of survival analysis, indicating the percentage of people in a study or treatment group who are alive for a given period of time after diagnosis. Survival rates are important for prognosis; for example, whether a type of cancer has a good or bad prognosis can be determined from its survival rate or survival outcome.
Patients with a certain disease can die directly from that disease or from an unrelated cause such as a car accident. When the precise cause of death is not specified, this is called the overall survival rate or observed survival rate. Doctors often use mean overall survival rates to estimate the patient's prognosis. This is often expressed over standard time periods, like one, five, and ten years. For example, prostate cancer has a much higher one year overall survival rate than pancreatic cancer, and thus has a better prognosis.
When someone is more interested in how survival is affected by the disease, there is also the net survival rate, which filters out the effect of mortality from other causes than the disease. Typically, the two main ways to calculate net survival arc relative survival and cause specific survival or disease specific survival.
Relative survival is calculated by dividing the overall survival after diagnosis of a disease by the survival as observed in a similar population that was not diagnosed with that disease. A similar population is composed of individuals with at least age and gender similar to those diagnosed with the disease. Cause-specific survival is calculated by treating deaths from other causes than the disease as withdrawals from the population that don't lower survival, comparable to patients who are not observed any longer, e.g. due to reaching the end of the study period. Relative survival has the advantage that it does not depend on accuracy of the reported cause of death; cause-specific survival has the advantage that it does not depend on the ability to find a similar population of people without the disease.
Survival is not the only endpoint that can be used as a metric in developing predictors such as those described herein. Endpoints or therapeutic outcomes can include survival or distant relapse-free survival (DRFS). Other endpoints are discussed in Cooper and Kaanders, Biological surrogate end-points in cancer trials: Potential uses, benefits and pitfalls, European Journal of Cancer, Volume 41, Issue 9, Pages 1261-1266, which is incorporated herein by reference. A “surrogate marker” or “surrogate endpoint” or “secondary endpoint” typically will refer to a biological or clinical parameter that is measured in place of the biologically definitive or clinically most meaningful parameter, i.e., survival. Primary endpoints may also include limitation of pharmacologic therapies, reduction of time to death, or reduction in the progression of the disease, disorder, or condition. Surrogate markers are pathophysiologic parameters determined by medical or clinical laboratory diagnosis that arc associated and have been correlated with the prognosis, progression, predisposition, or risk analysis with a disease, disorder, or condition that are not directly related to the primary diagnosed pathophysiologic condition. Secondary endpoints are those that supplement the primary endpoint. For example, secondary endpoints include reduction in pharmacologic therapy, reduction in requirement of a medical device, or alteration of the progression of the disease disorder, or condition. Typically, a clinical endpoint may refer to a disease, symptom, or sign that constitutes one of the target outcomes of the therapy or clinical trial. The results of a therapy or clinical trial generally indicate the number of people enrolled who reached the pre-determined clinical endpoint during the study interval, compared with the overall number of people who were enrolled. Once a patient reaches the endpoint, he or she is generally excluded from further experimental intervention (the origin of the term endpoint). For example, a clinical trial investigating the ability of a medication to prevent heart attack might use chest pain as a clinical endpoint. Any patient enrolled in the trial who develops chest pain over the course of the trial, then, would be counted as having reached that clinical endpoint. The results would ultimately reflect the fraction of patients who reached the endpoint of having developed chest pain, compared with the overall number of people enrolled. When an experiment involves a control group, the fraction of individuals who reach the clinical endpoint after an intervention is compared with the fraction of individuals in the control group who reached the same clinical endpoint, thus reflecting the ability of the intervention to prevent the endpoint in question. Some studies will examine the incidence of a combined endpoint, which can merge a variety of outcomes into one group.
When building prediction rules of treatment response or disease state in general from gene expression data can be selected from a small subset of informative genes that will be used as prognostic features in the predictor. Most predictors employ univariate filtering to rank the candidate genes according to the p-value of a two-sample unequal variance t-test comparing the mean expression values of each gene in the two response classes (e.g., pCR and RD). Univariate filtering methods have the disadvantage that they do not deal well with redundant features (genes that have similar expression profiles) and therefore the resulting predictors tend to be less robust (Lai, 2006).
The method used to identify predictive genes involved first, applying a filter to the gene expression data of all probes on an array to select the top probe sets to be used in signature development using the above described algorithm. Gene filtering can be based on the regularized t-test for the selected response endpoint such as pCR or RCB-0 (complete response), RCB-0/I (good response), or RCB-III (poor response). Other methods for gene filtering include methods that utilize non-specific global filtering criteria. These include, but are not limited to intensity-based filtering, which aims to remove genes that are not expressed at all in the samples studied or variability-based filtering, which aims to remove genes with low variability across samples.
A multivariate method was used to simultaneously select the signature genes and to calculate the classification score. The predictor is determined by level of penalization, which determines the number of genes included in the predictive signature, and the choice of a decision threshold to dichotomize the classification score. As one example, the inventors selected the maximum level of penalization resulting in the smallest signatures that yield significant cross-validated predictor or outcome predictor, each of these terms can be used interchangeably, performance—this step determines the signature probe sets and their weights. Then, a decision threshold is selected in order to optimize the predictive values of the predictor. Evaluation of the predictors was based on the joint confidence interval of the positive predictive value (PPV) and the negative predictive value (NPV) of the predictor at 5% significance level (low 95% confidence limit of PPV≧baseline response rate & low 95% confidence limit of NPV≧1—baseline response rate).
In developing the RCB-based predictor, the inventors used an approach that combines feature selection and model discovery using a multivariate penalized approach, an example of which is Gradient Directed Regularization developed by Prof. J. Friedman at Stanford University, a description of which can be found on the World Wide Web at stat.stanford.edu/˜jhf/ftp/pathlite.pdf. Typically, the informative genes are selected with penalization using the maximization of the area under the receiver operating characteristic (ROC) curve (AUC) as the optimization criterion. Ma and Huang have previously used a similar approach for disease classification (Ma, 2006). A receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot of the sensitivity vs. (1—specificity) for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (TPR=true positive rate) vs. the fraction of false positives (FPR=false positive rate). The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the ROC space, representing 100% sensitivity (all true positives are found) and 100% specificity (no false positives are found). The (0,1) point is also called a perfect classification. A completely random guess would give a point along a diagonal line (the so-called line of no-discrimination) from the left bottom to the top right corners. The diagonal line divides the ROC space in areas of good or bad classification/diagnostic. Points above the diagonal line indicate good classification results, while points below the line indicate wrong results.
As an example of predictor discovery and evaluation the protocol suggested by Wessels et al. was followed (Wessels, 2005). The methodology is briefly explained below. First, the input dataset is randomly partitioned into a training set and a test set. A 3-fold cross-validation based on Dudoit et al. recommendation of a 2:1 split between training and test sets was used (Dudoit, 2002). The training set consisting of ⅔ of the original data is used to develop a predictor. To account for bias in the several data-dependent decisions involved in building the predictor, a 5-fold internal cross-validation can be used to select the optimal set of genes for the predictor and to tune the parameters of the predictor, e.g., the degree of penalization. Since different optimal reporter gene sets might result from the different internal cross-validation folds, the number of times each gene is selected is tracked to provide a measure of its importance or its reliability. The trained predictor is then tested on the ⅕ hold-out part of the training dataset and its performance is evaluated based on the AUC.
To obtain a less biased estimate of classification performance, the trained predictor or outcome predictor can be evaluated on the test set (⅓ of the original data) that was not used in training the predictor. To assess the significance of the predictive performance of the trained predictor, the permutation predictive performance of the predictor was estimated by randomly scrambling the outcome labels in the test dataset. The entire process of randomly splitting the data to a training and a test set was repeated a number of times to obtain the distributions and summary statistics of the performance metrics.
Typically, under cross-validation the decision threshold is varied along all possible values and for each value predictor performance (accuracy, positive predictive value (PPV), negative predictive value (NPV)) is determined. The threshold is selected that yields the best compromise between PPV and NPV, as typically increasing PPV results in decreasing NPV. Typically, the objective is to maximize both.
In certain aspects, other measurements or determinations can be made in conjunction the nucleic acid analysis, for example determination of protein expression and/or histology of a sample. Protein expression can be detected in tumor tissue, cell material obtained by biopsy and the like. For example, a biopsy sample can be immobilized and contacted with an antibody, an antibody fragment or an aptamer that binds selectively to the protein to be detected. The sample can be assayed to determine whether the antibody, fragment or aptamer has bound to the protein by techniques well known in the art. Protein expression can be measured by a variety of methods including but not limited to Western blot, immunoblot, enzyme-linked immunosorbant assay (ELISA), radioimmunoassay (RIA), immunoprecipitation, surface plasmon resonance, immunohistochemical (IHC) analysis, mass spectrometry, fluorescence activated cell sorting (FACS) and flow cytometry.
In a further aspect, IHC analysis is used to measure protein expression. The level of expression for a sample is determined by IHC by staining the sample for a particular expression marker and developing a score for the staining. For example, monoclonal antibodies can be used to stain for the expression of a marker of interest. Mouse antibodies are known for use in the staining of the marker PTEN. Samples can be evaluated for the frequency of cells stained for each sample and the intensity of the stain. Typically, a score based on the frequency (rated from 0-4) and intensity (rated from 0-4) of the stained sample is developed as a measure of overall expression. Exemplary but non-limiting methods for IHC and criteria for scoring expression are described in detail in Handbook of Immunohistochemistry and In Situ Hybridization in Human Carcinomas, M. Hayat Ed., 2004, Academic Press.
In one aspect of the invention, a predictor or transcriptional profile index is used to measure the expression of many genes that provide predictive information about a likely outcome for a particular patient. The invention includes the methods for standardizing the expression values of future samples to a normalization standard that will allow direct comparison of the results to past samples, such as from a clinical trial. The invention also includes the biostatistical methods to calculate and report such results. A sample as used herein can comprise any number of cells that is sufficient for a clinical diagnosis or prognosis, and typically contain at least, at most or about 100 target cells.
The microarrays provide a suitable method to measure gene expression from clinical samples. mRNA levels measured by microarrays, such as Affymetrix U133A gene chips, in fine needle aspirates (FNA), core needle biopsy, and/or frozen tumor tissue samples of breast cancer correlated closely with protein expression by enzyme immunoassay and by routine immunohistochemistry.
Estrogen receptor and Her2-neu status. ER-positive breast cancer includes a continuum of ER expression that might reflect a continuum of biologic behavior and endocrine sensitivity. Others have reported that some breast cancers are difficult to predict as ER-positive based on transcriptional profile and described non-estrogenic growth effects, such as HER-2, more frequently in this small subset of tumors with aggressive natural history (Kun et al., 2003). Indeed, ER mRNA levels are lower in breast cancers that are positive for both ER and HER2 (Konecny et al., 2003).
Diagnostic tools are needed not merely for prognosis, but, for providing a biological rationale and to demonstrate clinical benefit when they are used to guide the selection and duration of therapies, particularly in light of the cost, complexity, toxicity, benefits and other factors related to such therapies. An index or predictor can be used to predict the likelihood of response rather than intrinsic prognosis.
In addition to other know methods of cancer therapy, hormone therapies may be employed in the treatment of patients identified as having hormone sensitive cancers. Hormones, or other compounds that stimulate or inhibit these pathways, can bind to hormone receptors, blocking a cancer's ability to get the hormones it needs for growth. By altering the hormone supply, hormone therapy can inhibit growth of a tumor or shrink the tumor. Typically, these cancer treatments only work for hormone-sensitive cancers. If a cancer is hormone sensitive, a patient might benefit from hormone therapy as part of cancer treatment. Sensitive to hormones is usually determined by taking a sample of a tumor (biopsy) and conducting analysis in a laboratory.
A. Chemotherapy
Chemotherapy is the use of chemical substances to treat disease. In its modern-day use, it refers to cytotoxic drugs used to treat cancer or the combination of these drugs into a standardized treatment regimen. There are a number of strategies in the administration of chemotherapeutic drugs used today. Chemotherapy may be given with a curative intent or it may aim to prolong life or to palliate symptoms.
Combined modality chemotherapy is the use of drugs with other cancer treatments, such as radiation therapy or surgery. Combination chemotherapy is a similar practice which involves treating a patient with a number of different drugs simultaneously, e.g., T/FAC therapy. Typically, the drugs differ in their mechanism and side effects. The biggest advantage is minimizing the chances of resistance developing to any one agent.
In neoadjuvant chemotherapy (preoperative treatment) initial chemotherapy is aimed for shrinking the primary tumor, thereby rendering local therapy (surgery or radiotherapy) less destructive or more effective.
Adjuvant chemotherapy (postoperative treatment) can be used when there is little evidence of cancer present, but there is risk of recurrence. This can help reduce chances of resistance developing if the tumor does develop. It is also useful in killing any cancerous cells which have spread to other parts of the body. This is often effective as the newly growing tumors are fast-dividing, and therefore very susceptible.
Palliative chemotherapy is given without curative intent, but simply to decrease tumor load and increase life expectancy. For these regimens, a better toxicity profile is generally expected.
All chemotherapy regimens require that the patient be capable of undergoing the treatment. Performance status is often used as a measure to determine whether a patient can receive chemotherapy, or whether dose reduction is required.
B. Hormone Therapy
Several malignancies respond to hormonal therapy. Strictly speaking, this is not chemotherapy. Cancer arising from certain tissues, including the mammary and prostate glands, may be inhibited or stimulated by appropriate changes in hormone balance. Cancers that are most likely to be hormone-receptive include: Breast cancer, Prostate cancer, Ovarian cancer, and Endometrial cancer. Not every cancer of these types is hormone-sensitive, however. That is why the cancer must be analyzed to determine if hormone therapy is appropriate.
Breast cancer cells often highly express the estrogen and/or progesterone receptor. Inhibiting the production (with aromatase inhibitors) or action (with tamoxifen) of these hormones can often be used as an adjunct to therapy.
Hormone therapy may be used in combination with other types of cancer treatments, including surgery, radiation and chemotherapy. A hormone therapy can be used before a primary cancer treatment, such as before surgery to remove a tumor. This is called neoadjuvant therapy. Hormone therapy can sometimes shrink a tumor to a more manageable size so that it's easier to remove during surgery.
Hormone therapy is sometimes given in addition to the primary treatment—usually after—in an effort to prevent the cancer from recurring (adjuvant therapy). In some cases of advanced (metastatic) cancers, such as in advanced prostate cancer and advanced breast cancer, hormone therapy is sometimes used as a primary treatment.
The most common types of drugs for hormone-receptive cancers include: (1) Anti-hormones that block the cancer cell's ability to interact with the hormones that stimulate or support cancer growth. Though these drugs do not reduce the production of hormones, anti-hormones block the ability to use these hormones. Anti-hormones include the anti-estrogens tamoxifen (Nolvadex) and toremifene (Fareston) for breast cancer, and the anti-androgens flutamide (Eulexin) and bicalutamide (Casodex) for prostate cancer. (2) Aromatase inhibitors —Aromatase inhibitors (AIs) target enzymes that produce estrogen in postmenopausal women, thus reducing the amount of estrogen available to fuel tumors. AIs are only used in postmenopausal women because the drugs can't prevent the production of estrogen in women who haven't yet been through menopause. Approved AIs include letrozole (Femara), anastrozole (Arimidex) and exemestane (Aromasin). (3) Luteinizing hormone-releasing hormone (LH-RH) agonists and antagonists—LH-RH agonists—sometimes called analogs —and LH-RH antagonists reduce the level of hormones by altering the mechanisms in the brain that tell the body to produce hormones. LH-RH agonists are essentially a chemical alternative to surgery for removal of the ovaries for women, or of the testicles for men. Depending on the cancer type, one might choose this route if they hope to have children in the future and want to avoid surgical castration. In most cases the effects of these drugs are reversible. Examples of LH-RH agonists include: Leuprolide (Lupron, Viadur, Eligard) for prostate cancer, Goserelin (Zoladex) for breast and prostate cancers, Triptorelin (Trelstar) for ovarian and prostate cancers and abarelix (Plenaxis).
One class of pharmaceuticals is the Selective Estrogen Receptor Modulators or SERMs. SERMs block the action of estrogen in the breast and certain other tissues by occupying estrogen receptors inside cells. SERMs include, but are not limited to tamoxifen (the brand name is Nolvadex, generic tamoxifen citrate); Raloxifene (brand name: Evista), and toremifene (brand name: Fareston).
Further embodiments of the invention include kits for the measurement, analysis, and reporting of gene expression and transcriptional output. A kit may include, but is not limited to microarray, quantitative RT-PCR, antibodies, labeling or other reagents and materials, as well as hardware and/or software for performing at least a portion of the methods described. For example, custom microarrays or analysis methods for existing microarrays are contemplated. Also, methods of the invention include methods of accessing and using a reporting system that compares a single result to a scale of clinical trial results. In yet still further aspects of the invention, a digital standard for data normalization is contemplated so that the assay result values from future samples would be able to be directly compared with the assay value results from past samples, such as from specific clinical trials.
The following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the present invention in any fashion. One skilled in the art will appreciate readily that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those objects, ends and advantages inherent herein. The present examples, along with the methods described herein are presently representative of preferred embodiments, are exemplary, and are not intended as limitations on the scope of the invention. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.
Needle biopsy samples (fine needle aspirates—FNAs or core biopsies—CBX) were analyzed in order to examine genes correlated with the selected endpoint. The genes were identified by this method using these samples and methods to standardize data were done in order to facilitate calculation of the predictor indices consistently in different sample types such as biopsies, resected tissue from an excised tumor, and frozen tumor tissue.
Patients and samples—Patients prospectively consented to an Institutional Research Board approved research protocol (LAB99-402, USO-02-103, 2003-0321, I-SPY-1) to obtain a tumor biopsy by fine needle aspiration (FNA) or core biopsy (CBX) prior to any systemic therapy for genomic studies to develop and test predictors of treatment outcome. Clinical nodal status was determined before treatment from physical examination, with or without axillary ultrasound, with diagnostic FNA as required. Pathologic HER2 status was defined as negative according to the ASCO/CAP guidelines. Patients with any nuclear immunostaining for ER in the tumor cells were considered as eligible for adjuvant endocrine therapy. During this research, patients were consented to undergo pretreatment biopsy as fine needle aspiration (FNA) (Ayers, 2004; Hess, 2006) or core needle biopsy, of the primary breast tumor or ipsilateral axillary metastasis before starting chemotherapy as part of an ongoing pharmacogenomic marker discovery program. Gene expression data generated from the biopsies captures the molecular characteristics of the invasive cancer including the molecular class (Pusztai, 2003). At least 70% of all aspirations yielded at least 1 μg total RNA that is required for the gene expression profiling. The main reason for failure to obtain sufficient RNA was acellular aspirations. Three hundred and ten (310) patients with at least 1 μg RNA were included in this analysis. All patients received neoadjuvant chemotherapy consisting of a combination of either paclitaxel or docetaxel with anthracycline. At the completion of neoadjuvant chemotherapy all patients had modified radical mastectomy or lumpectomy and sentinel lymph node biopsy or axillary node dissection as determined appropriate by the surgeon. Patients who were ER-positive also received endocrine therapy as tamoxifen or aromatase inhibitor. Clinical characteristics of the patients are in Table 1A.
aM. D. Anderson Cancer Center;
bI-SPY-1 clinical trial;
cLyndon B. Johnson Hospital;
dInstituto Nacional de Enfermedades Neoplásicas (INEN);
eGrupo Español de Investigación en Cáncer de Mama (GEICAM);
fUS Oncology;
gAmerican Joint Committee on Cancer;
hEstrogen receptor;
iProgesterone receptor.
112 weekly doses of paclitaxel (T) followed by four cycles of fluorouracil (F), doxorubicin (A) and cyclophosphamide (C) and then surgery.
2Four cycles of doxorubicin (A) and cyclophosphamide (C) followed by four cycles of paclitaxel (T) (N = 60) or docetaxel (Tx) (N = 18) or taxane not specified (N = 5) and then surgery.
3Four cycles of docetaxel (Tx) with capecitabine (X) followed by four cycles of fluorouracil (F), epirubicin (E) and cyclophosphamide (C) and then surgery.
4Six cycles of fluorouracil (F), doxorubicin (A) or epirubicin (E), and cyclophosphamide (C) followed by surgery and then by 12 weekly doses of paclitaxel (T).
5Surgery followed by 12 weekly doses of paclitaxel (T) and then by four cycles of fluorouracil (F), doxorubicin (A) or epirubicin (E), and cyclophosphamide (C).
6Surgery followed by four cycles of docetaxel (Tx) with capecitabine (X) and then followed by four cycles of fluorouracil (F), epirubicin (E) and cyclophosphamide (C).
7Surgery followed by four cycles of docetaxel (Tx) and then by four cycles of fluorouracil (F), epirubicin (E) and cyclophosphamide (C).
RNA extraction and gene expression profiling—Biopsy samples were either collected in 1.5 ml RNAlater™ (Qiagen, Valencia, Calif.) and stored locally at −70° C. and transported to the laboratory on dry ice (MDACC, INEN, LBJ, GEICAM) or couriered overnight in a cooler pack from clinics to the laboratory (USO), or were frozen, cryosectioned and an aliquot of RNA sent to the laboratory on dry ice (I-SPY). Details of our methods for RNA purification and microarray hybridization have been reported previously Rouzier, 2005; Stec, 2005; Symmans, 2003). Briefly, a single-round T7 amplification was used to generate biotin-labeled cRNA for hybridization to oligonucleotide microarrays (U133A GeneChip™, Affymetrix, Santa Clara, Calif.). Gene expression levels were derived from multiple oligonucleotide probes on the microarray that hybridize to different sequence sites of a gene transcript (probe sets).
Microarray quality control—Quality control (QC) checks are performed at 3 levels (i) RNA yield, (ii) cRNA yield, and (ii) chip hybridization signal) and samples that fail at any level are not processed further. The amount and quality of RNA is assessed with NanoDrop ND-1000 Spectrophotometer (Thermo Fisher scientific In, Wilmington, Del., USA) and is generally considered adequate for further analysis if the OD 260/280 ratio is between 1.8-2.1 and the total RNA yield is ≧1.0 microgram. If total RNA yield is <1.0 microgram all remaining samples (if available) from that patient are used for RNA extraction. At least 10 μg of biotin-labeled cRNA need to be generated from a single-round in vitro transcription protocol to proceed with hybridized to U133A chips.
Microarray data normalization—Raw intensity files (.CEL) from each microarray were processed using MAS5.0 (R/Bioconductor, www.bioconductor.org)1 to normalize to a mean array intensity of 600 and to generate probe set-level expression values. Expression values were then log2-transformed and subsequently scaled by the expression levels of 1322 breast cancer reference genes to reference values that had been established as the median expression of these genes in an independent reference cohort of invasive breast cancer (N=444). The quality of hybridization and microarray profiling was assessed based on a set of 8 metrics that compare the expression level of the reference genes in each sample to the historical reference values before and after scaling. Metrics include the median deviation, the inter-quartile range (IQR) of deviations, the Kolmogorov-Smirnov statistic for equality of the distributions and the p-value of the K-S statistic. Dimensionality was reduced through a principal component analysis (PCA) model of the 8 metrics which were further summarized in two multivariate statistics, the Hotteling T2 and the sum of squares of the residuals or Q statistic (Jackson & Mudholkar, 1979). Control limits for Q and T2 for sample acceptance were established from historical in-control samples. Prior to analysis for predictor development, 2,522 probe sets that either had low specificity (extensions _xfri_ in their name), were housekeeping probes (starting with AFFX) or were not adequately expressed (log2-transformed intensity of at least 5 in at least 75% of the arrays) were removed. A total of 16,289 probe sets (73% of all) were retained for further analysis.
Methods for building predictor of survival outcomes as a result of therapy—Distant relapse-free survival (DRFS) was used as the endpoint of favorable outcome of therapy to build the predictor genes. Prior to analysis, probes that either had low specificity (those that include extensions _xfri_ in their name) or housekeeping probes (those starting with AFFX) were selected and removed from the candidate probesets. This process removed 2522 probesets. Subsequently, a non-specific filter was applied to retain probesets that has log2-transformed intensity of at least 5 in at least 75% of the arrays. A total of 16289 probesets (73% of all) were retained for further analysis.
The samples in the development cohort were subdivided in ER+ and ER− subsets and in lymph node negative (N0) and lymph positive (NP) subsets within each ER group. Means and standard deviations (SDs) of the 16289 genes were computed for each of the 4 subsets of cases. Within each ER cohort, the means and SDs for N0 and NP subsets were averaged to yield nodal-status adjusted statistics. These means and SDs were then used to scale the expression values of all probesets using the corresponding statistics for ER+ or ER− cases.
Each probeset was evaluated in a univariate Cox regression model for the significance of its association with risk of distant relapse. For this analysis, distant relapses or breast-cancer related deaths were considered as events, whereas local relapses were censored at the time of occurrence. Time to event was determined since the time of initial diagnosis. The significance of the association of each probeset to distant relapse risk was assessed based on the likelihood ratio test, which compares the log-likelihood of the model having the given probeset as the only covariate to the null model. The likelihood ratio statistic is distributed according to a chi-squared with one degree of freedom. P-values for the significance of each probeset were calculated from this distribution.
To account for sampling variability in the training dataset, Cox regression models for each probeset were fit repeatedly using a bootstrap procedure in which cases were sampled with replacement to generate bootstrapped datasets of the same size as the original dataset. This process was repeated 499 times, thus generating 500 estimates for the p-values of each probeset. The association of each probeset with distant relapse risk was assessed within each bootstrapped dataset at a critical significance level of 0.001 or 0.0005 to account for multiple testing. Those probesets that were called significant in at least 20% of the bootstrap replicates were selected as candidate probesets. This process was applied separately to the
ER-positive and ER-negative cases in the training dataset and resulted in 235 and 268 candidate probesets in the ER+ and ER− subsets.
Final multivariate prediction models were built from the candidate probesets in the ER+ and ER− cohorts. Maximization of the partial likelihood associated with Cox proportional hazards models becomes problematic and non-unique if the number of covariates exceeds the number of available samples or if there is a high degree of colinearity between the predictors. To prevent this pathologic behavior, some sort of regularization or shrinkage needs to be applied to the regression coefficients to allow efficient estimation of the remaining ones. The Cox univariate shrinkage (CUS) approach was used for this purpose (Tibshirani, 2009), which is equivalent to the lasso estimate in standard regression analysis. The level of penalization is an adjustable parameter in the algorithm, with higher penalization resulting in smaller signatures. The optimal level of penalization was determined under 5-fold cross-validation as the penalization level that resulted in the shortest list of genes that yielded the highest incremental improvement in the Cox model's deviance.
The final predictors for ER+ and ER− subsets used 33 probesets and 27 probesets respectively to make the predictions. The probesets, genes that they encode for, and their weights (Cox coefficients) are shown in Table 2. The risk score is calculated by multiplying the scaled log2-transformed expression level of each gene in a given sample by its corresponding weight and then adding up the weighted expression values for all genes in the signature. The following formula describes the score calculation for sample i:
A cut point was selected to dichotomize the risk score and predict two risk classes. The optimal cutoff was selected in order to maximize the accuracy of the prediction of 5-yr distant relapse outcome by the risk classes. A cutoff of 0 was selected for both the ER+ and ER− scores. Positive scores signify “High risk” class, i.e. higher risk of distant relapse and a zero or negative score signifies “Low risk”.
Drosophila)
The plot shows that predicted good and poor responders to taxane-chemotherapy (
Patients and samples—Patient samples used were those shown in Table 1A. All other laboratory analytic methods were the same as in Example 1.
Methods for building predictors of response to chemotherapy—The inventors used the response endpoint RCB0/I, representing no residual disease or minimal residual disease measured at the completion of neoadjuvant chemotherapy, to identify genes that differentiated patients who responded to chemotherapy versus all others in the discovery cohort of Table 1A. Prior to analysis, probes that either had low specificity (those that include extensions _xfri_ in their name) or housekeeping probes (those starting with AFFX) were selected and removed from the candidate probesets. This process removed 2522 probesets. Subsequently, a non-specific filter was applied to retain probesets that has log2-transformed intensity of at least 5 in at least 75% of the arrays. A total of 16289 probesets (73% of all) were retained for further analysis.
The samples in the development cohort were subdivided in ER+ and ER− subsets and in lymph node negative (N0) and lymph positive (NP) subsets within each ER group. Means and standard deviations (SDs) of the 16289 genes were computed for each of the 4 subsets of cases. Within each ER cohort, the means and SDs for N0 and NP subsets were averaged to yield nodal-status adjusted statistics. These means and SDs were then used to scale the expression values of all probesets using the corresponding statistics for ER+ or ER− cases.
Each probeset was evaluated for differential expression in the two responder groups (RCB-0/I vs rest) using an unequal variance t-statistic based on the trimmed means and trimmed standard deviations in the two groups using a trim fraction of 0.025 (i.e. the lowest 2.5% and highest 2.5% values were eliminated and the statistics were calculated on the remaining 95% of the observations in each group). Degrees of freedom for the unequal variance t-statistic were estimated based on Satterthwaite's approximation (Armitage, Berry & Matthews, 2002). The significance of association of each probe set with response was assessed based on the unequal variance t-statistic. P-values for the significance of each probeset were calculated from the t-distribution with the corresponding degrees of freedom.
To account for sampling variability in the training dataset, the differential expression analysis for each probeset described in the previous paragraph was performed repeatedly using a bootstrap procedure in which cases were sampled with replacement to generate bootstrapped datasets of the same size as the original dataset. This process was repeated 499 times, thus generating 500 estimates for the p-values of each probeset. The association of each probeset with distant relapse risk was assessed within each bootstrapped dataset at a critical significance level of 0.0005 to account for multiple testing. Those probesets that were called significant in at least 30% of the bootstrap replicates were selected as candidate probesets. This process was applied separately to the ER-positive and ER-negative cases in the training dataset and resulted in 209 and 244 candidate probesets in the ER+ and ER− subsets.
In developing the RCB-based chemotherapy response predictor, the inventors used an approach that combines feature selection and model discovery using a multivariate penalized approach called Gradient Directed Regularization developed by Prof. J. Friedman at Stanford University, a description of which can be found on the World Wide Web at stat.stanford.edu/˜jhf/ftp/pathlite.pdf. The informative genes are selected through penalization using the maximization of the area under the ROC curve (AUC) as the optimization criterion. Ma and Huang have previously used a similar approach for disease classification (Ma, 2006).
For predictor discovery and evaluation the inventors followed a cross-validation protocol. First, the input dataset is randomly partitioned into a training set and a test set. A 5-fold cross-validation for a 4:1 split stratified by response group between training and test sets was used (Dudoit, 2002). The training set consisting of ⅘ of the original data is used to develop the predictor. The algorithm starts with the same initial list of candidate genes that were determined through the bootstrap procedure and iteratively refines the predictor by selecting genes that contribute in maximizing the AUC of the candidate predictor. The maximum level of penalization is used to derive the most parsimonious predictors. Since different optimal reporter gene sets might result from the different internal cross-validation folds, the number of times each gene is selected is tracked to provide a measure of its importance or its reliability. The trained predictor is then tested on the ⅕ hold-out part of the training dataset and its performance is evaluated based on the AUC.
The entire process of randomly splitting the data to a training- and a test-set was repeated 499 times to obtain the distributions and summary statistics of the performance metrics from the cross-validated replicates.
The final predictors for ER+ and ER− subsets used 39 probesets and 55 probesets respectively to make the predictions. The probesets, genes that they encode for, and their weights (coefficients) are shown in Table 3. The risk score is calculated by multiplying the scaled log2-transformed expression level of each gene in a given sample by its corresponding weight and then adding up the weighted expression values for all genes in the signature. The following formula describes the score calculation for sample i:
A cut point was selected to dichotomize the risk score and predict two risk classes. The optimal cutoff was selected in order to maximize the accuracy of the prediction. A cutoff of 0 was selected for both the ER+ and ER− scores. Positive scores signify “responders” and a zero or negative score signifies “non-responders”.
cerevisiae)
cerevisiae)
elegans)
The plot shows that predicted responders to taxane-containing chemotherapy (
Based on the performance of the relapse-based or resistance predictor of Example 2 and the response-based predictor of Example 4, combined prediction using the two predictors was studied in the validation cohort (Table 1A). The relapse-based predictor was applied first to the cohort as described in
The prediction of breast cancer sensitivity to endocrine therapy such as tamoxifen and aromatase inhibitors has been described earlier by measurement of gene expression levels (U.S. Provisional Patent Application, 61/174706). We examined the combination of the sensitivity to endocrine therapy (SET) index with prediction of chemosensitivity using the combined predictor genes described in Example 6.
In this example, the endocrine sensitivity index (as described in U.S. 61/174706) was applied first to the validation cohort of patients shown in Table 1A. The High and Intermediate classes (8.9%) of endocrine sensitivity showed good relapse-free survival (
The relapse-based predictor (Example 2) and response-based predictor (Example 4), combined as described in Example 6, were applied to the patient samples classified with a low endocrine sensitivity index. Patients identified for chemosensitivity by the predictors of Example 2 and 4 together were then combined with patients with high and intermediate endocrine sensitivity index as responders.
Patients and samples - Patient samples used were those shown in Table 1A. All other laboratory analytic methods were the same as in Example 1.
Methods for building predictors of poor response to chemotherapy—The inventors used the response endpoint RCB-III, representing extensive residual disease after the completion of neoadjuvant chemotherapy, to identify genes that differentiated patients who failed to respond to chemotherapy versus all others in the discovery cohort (Table 1A). Prior to analysis, probes that either had low specificity (those that include extensions _xfri_ in their name) or housekeeping probes (those starting with AFFX) were selected and removed from the candidate probesets. This process removed 2522 probesets. Subsequently, a non-specific filter was applied to retain probesets that has log2-transformed intensity of at least 5 in at least 75% of the arrays. A total of 16289 probesets (73% of all) were retained for further analysis.
The samples in the development cohort were subdivided in ER+ and ER− subsets and in lymph node negative (N0) and lymph positive (NP) subsets within each ER group. Means and standard deviations (SDs) of the 16289 genes were computed for each of the 4 subsets of cases. Within each ER cohort, the means and SDs for N0 and NP subsets were averaged to yield nodal-status adjusted statistics. These means and SDs were then used to scale the expression values of all probesets using the corresponding statistics for ER+ or ER− cases.
Each probeset was evaluated for differential expression in the two responder groups (RCB-III vs rest) using an unequal variance t-statistic based on the trimmed means and trimmed standard deviations in the two groups using a trim fraction of 0.025 (i.e. the lowest 2.5% and highest 2.5% values were eliminated and the statistics were calculated on the remaining 95% of the observations in each group). Degrees of freedom for the unequal variance t-statistic were estimated based on Satterthwaite's approximation (Armitage, Berry & Matthews, 2002). The significance of association of each probe set with response was assessed based on the unequal variance t-statistic. P-values for the significance of each probeset were calculated from the t-distribution with the corresponding degrees of freedom.
To account for sampling variability in the training dataset, the differential expression analysis for each probeset described in the previous paragraph was performed repeatedly using a bootstrap procedure in which cases were sampled with replacement to generate bootstrapped datasets of the same size as the original dataset. This process was repeated 499 times, thus generating 500 estimates for the p-values of each probeset. The association of each probeset with distant relapse risk was assessed within each bootstrapped dataset at a critical significance level of 0.00075 to account for multiple testing. Those probesets that were called significant in at least 30% of the bootstrap replicates were selected as candidate probesets. This process was applied separately to the ER-positive and ER-negative cases in the training dataset and resulted in 256 and 202 candidate probesets in the ER+ and ER− subsets.
In developing the RCB-based chemotherapy response predictor, the inventors used an approach that combines feature selection and model discovery using a multivariate penalized approach called Gradient Directed Regularization developed by Prof. J. Friedman at Stanford University, a description of which can be found on the World Wide Web at stat.stanford.edu/˜jhf/ftp/pathlite.pdf. The informative genes are selected through penalization using the maximization of the area under the ROC curve (AUC) as the optimization criterion. Ma and Huang have previously used a similar approach for disease classification (Ma, 2006).
For predictor discovery and evaluation the inventors followed a cross-validation protocol. First, the input dataset is randomly partitioned into a training set and a test set. A 5-fold cross-validation for a 4:1 split stratified by response group between training and test sets was used (Dudoit, 2002). The training set consisting of ⅘ of the original data is used to develop the predictor. The algorithm starts with the same initial list of candidate genes that were determined through the bootstrap procedure and iteratively refines the predictor by selecting genes that contribute in maximizing the AUC of the candidate predictor. The maximum level of penalization is used to derive the most parsimonious predictors. Since different optimal reporter gene sets might result from the different internal cross-validation folds, the number of times each gene is selected is tracked to provide a measure of its importance or its reliability. The trained predictor is then tested on the ⅕ hold-out part of the training dataset and its performance is evaluated based on the AUC.
The entire process of randomly splitting the data to a training- and a test-set was repeated 499 times to obtain the distributions and summary statistics of the performance metrics from the cross-validated replicates.
The final predictors for ER+ and ER− subsets used 73 probesets and 54 probesets respectively to make the predictions. The probesets, genes that they encode for, and their weights (coefficients) are shown in Table 4. The risk score is calculated by multiplying the scaled log2-transformed expression level of each gene in a given sample by its corresponding weight and then adding up the weighted expression values for all genes in the signature. The following formula describes the score calculation for sample i:
A cut point was selected to dichotomize the risk score and predict two risk classes. The optimal cutoff was selected in order to maximize the accuracy of the prediction. A cutoff of 0 was selected for both the ER+ and ER− scores. Positive scores signify “resistant” or poor-responder and a zero or negative score signifies “non-resistant”.
cerevisiae)
cerevisiae)
cerevisiae)
Survival outcomes of patients predicted as responders and non-responders were assessed by using the predictor of RCB-III described in Example 8 used as a combined algorithm with predictors of Examples 2 and 4 and the sensitivity to endocrine therapy (SET) index of Example 7. Survival is defined by distant relapse-free survival (DRFS) over a period of about 80 months. These patients have undergone surgery where it was considered appropriate and the ER-positive patients received hormonal therapy (tamoxifen) for 5 years after the surgery. ER-negative patients did not receive any treatment post-surgery. We combined the individual predictions into a testing algorithm (
The predictive test (algorithm) was applied to the discovery cohort of 310 samples (
Of note, 3-year DRFS in patients predicted to be treatment-sensitive at the time of diagnosis was similar to the 3-year DRFS of 93% (95% CI 85 to 100) in the 21% of patients in the validation cohort who achieved pathologic complete response (pCR) after completion of neoadjuvant chemotherapy. Also, 3-year DRFS for predicted treatment-insensitive was identical to the 3-year DRFS of 75% (95% CI 68 to 83) in those who had residual disease (RD) (
Treatment Sensitivity According to ER Status: There were 30% and 26% of patients with predicted sensitivity to treatment in the ER+/HER2- and ER−/HER2-subsets, respectively, and both had significantly favorable prognosis (
Performance of the Predictive Test in Other Relevant Subsets The association between predicted treatment sensitivity and DRFS appears to be unrelated to the type of taxane therapy administered (
Comparison of the Predictive Test with Clinical-Pathologic Parameters Genomic predictions were independently and significantly associated with risk of distant relapse or death (sensitive versus insensitive; HR 0.19; 95% CI 0.07 to 0.55; p=0.002), after adjusting for standard clinical-pathologic parameters (Table 5). Addition of the genomic prediction to a multivariate Cox model of the clinical-pathologic factors significantly increased the model's predictive utility (likelihood ratio of complete model versus clinical model 13.8, p<0.001). In this model, higher clinical tumor stage (tumor stage T3 or T4 versus T1 or T2; HR 2.13;
95% CI 1.13 to 4.02; p=0.02) and ER-negative status (ER status positive versus negative; HR 0.34; 95% CI 0.18 to 0.65; p=0.001) were associated with statistically significant greater risk of distant relapse or death.
The entire predictive test algorithm described in
The performance of the different genomic signatures for predicting 3-year DRFS was compared on the basis of the diagnostic likelihood ratio (DLR), which is clinically useful statistic for summarizing the diagnostic accuracy of tests (Deeks and Altman, 2004). The DLR+ summarizes how many times a positive test (predicted distant relapse or treatment insensitive) is more likely among patients who experience distant metastasis within 3 years, compared to those who do not. The DLR− is a similar metric for a negative test (predicted absence of relapse or treatment sensitive), which is more relevant in the context of this test. A clinically useful test associated with the presence of relapse should have DLR+>1, whereas a test associated with the absence of relapse should have DLR−<1. Another useful property of the DLR is that it allows calculation of the post-test odds of relapse, simply by multiplying the pre-test odds of relapse by the DLR. The odds ratio (OD), defined as DLR+/DLR−, is also related to the coefficient of a logistic regression model of the binary genomic test for predicting the binary relapse outcome. The values summarized in Table 7 were calculated from the K-M estimates of DRFS for the two predicted groups from each genomic predictor, for the overall validation cohort and for the ER-positive and ER-negative subsets.
The predictive test of Example 9 (last entry in Table 7) is the only test with a significant DLR− (0.33, 0.27, 0.35 in the overall validation cohort and ER+, ER− subsets), indicating a 3-fold reduction in the odds of distant relapse in the presence of a negative test result (predicted treatment sensitive). The DLR+ of the genomic predictor was >1 in all 3 cohorts, but was not significant. The ER-stratified predictor of pCR/RCB-I showed consistent but not significant metrics. The first three genomic predictors showed paradoxical statistics (DLR+<1 and DLR−>1), i.e. a positive test result (predicted relapse) was associated with lower odds of relapse and vice versa.
§ Performance of the pCR predictor on the discovery cohort is optimistically biased because the predictor was trained on a subset of these samples. Performance of the pCR/RCB-I predictor and of the overall genomic prediction test on the discovery cohort represents resubstitution performance, since the predictors were trained on the same cohort.
¶ Genomic prediction of pathologic response was evaluated in the SET-Low subset in both cohorts.
# Performance of the predictive test is optimistically biased in the discovery cohort because a component of the test was trained on DRFS events to define resistance.
RNA is extracted in a manner described in Example 1. A gene chip such as Affymetrix U133A (Affymetrix, Inc., Santa Clara, CA) is used to analyze the expression levels of genes of Tables 2, 3 and 4. The resulting expression values are then normalized as described in Examples 2, 4, and 8, and weighted according to their respective coefficients to calculate the predictor score. Using cut-off values for the predictor score, a patient's tumor can be classified as either a High Score (good outcome from therapy) or a Low Score (poor outcome of therapy). The analyses could be completed within 5-7 days from receipt of a tumor sample to provide a report on results to the requesting physician. Decisions may be made by physicians regarding the inclusion of a certain therapy if the likely outcome is good or alternatively, to consider additional aggressive therapy regimens for the patient in the likely event of a poor outcome.
The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.
This application claims priority to U.S. Provisional application Ser. No. 61/324,166 filed Apr. 14, 2010, which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/32462 | 4/14/2011 | WO | 00 | 12/14/2012 |
Number | Date | Country | |
---|---|---|---|
61324166 | Apr 2010 | US |