The present invention relates to gene sets, the expression levels of which are useful for classifying colorectal tumors and thereby predicting disease-free survival prognosis and response of patients to specific therapies that are either novel or currently available in the clinics for treating colorectal cancer patients.
Colorectal cancer (CRC) is a cancer arising from uncontrolled cell growth in the colon, rectum or in the appendix. Genetic analysis shows that colon and rectal tumors are essentially genetically the same type cancer. Symptoms of colorectal cancer typically include rectal bleeding, anemia which are sometimes associated with weight loss and changes in bowel habits. It typically starts in the lining of the bowel and if left untreated, can grow into the muscle layers underneath, and then through the bowel wall. Cancers that are confined within the wall of the colon are often curable with surgery while cancer that has spread widely around the body is usually not curable and management then focuses on extending the person's life via chemotherapy and improving quality of life.
Colorectal cancer is the third most commonly diagnosed cancer in the world, but it is more common in developed countries. Most colorectal cancer occurs due to lifestyle and increasing age with only a minority of cases associated with underlying genetic disorders. Greater than 75-95% of colon cancer occurs in people with no known inherited familial predisposition. Risk factors for the non-familial forms of CRC include advancing age, male gender, high fat diet, alcohol, obesity, smoking, and a lack of physical exercise.
Colorectal cancer is often found after symptoms appear, but most people with early colon or rectal cancer don't have symptoms of the disease. Symptoms usually only appear with more advanced disease. This is why screening is effective at decreasing the chance of dying from colorectal cancer and is recommended starting at the age of 50 and continuing until a person is 75 years old. Localized bowel cancer is usually diagnosed through sigmoidoscopy or colonoscopy.
Diagnosis of colorectal cancer is via tumor biopsy typically done during sigmoidoscopy or colonoscopy. The extent of the disease is then usually determined by a CT scan of the chest, abdomen and pelvis. There are other potential imaging test such as PET and MRI which may be used in certain cases. Colon cancer staging is done next and based on the TNM system which is determined by how much the initial tumor has spread, if and where lymph nodes are involved, and if and how many metastases there are.
Different types of treatment are available for patients with colorectal cancer. Four types of standard treatments are used: surgery, chemotherapy, radiation therapy and targeted therapy with the EGFR inhibitor cetuximab. While all can produce responses in patients with advanced disease, none are curative beyond surgery in early stage of disease. Notably, some patients demonstrate pre-existing resistance to certain of these therapies in particular to cetuximab or FOLFIRI therapy. Thus only a fraction of CRC patients respond well to therapy. As such, colorectal cancer continues to be a major cause of cancer mortality, and personalized treatment decisions based on patient and tumour characteristics are still needed.
To solve the above-identified problem, Applicants classified colorectal cancer in to six subtypes based on the integrated analysis of genes expression profiles and cetuximab-based drug response. These subtypes are predictive of disease-free survival prognosis and response to selected therapies.
Thus in an embodiment, the present invention provides an in-vitro method for the prognosis of disease-free survival of a subject suffering from colorectal cancer or suspected of suffering therefrom and who has undergone a prior surgical resection of colorectal cancer, the method comprising
The present invention further provides an in-vitro method for predicting the likelihood that a subject suffering from colorectal cancer or suspected of suffering therefrom and who has undergone a prior surgical resection of colorectal cancer will respond to therapies inhibiting or targeting EGFR, such as cetuximab, and/or cMET, the method comprising
The present invention also provides an in-vitro method for predicting the likelihood that a subject suffering from colorectal cancer or suspected of suffering therefrom and who has undergone a prior surgical resection of colorectal cancer will respond to cytotoxic chemotherapies such as FOLFIRI, the method comprising
Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. The publications and applications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. In addition, the materials, methods, and examples are illustrative only and are not intended to be limiting.
In the case of conflict, the present specification, including definitions, will control. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in art to which the subject matter herein belongs. As used herein, the following definitions are supplied in order to facilitate the understanding of the present invention.
As herein used, “a” or “an” means “at least one” or “one or more.”
The term “comprise” is generally used in the sense of include, that is to say permitting the presence of one or more features or components.
The term “disease-free survival (DFS)” in generally means the length of time after primary treatment for a cancer ends that the patient survives without any signs or symptoms of that cancer. In the context of the present invention, the primary treatment is preferably surgical resection of colorectal cancer. In a clinical trial, measuring the disease-free survival is one way to see how well a new treatment works.
“Adjuvant setting” as used herein refers to adjuvant treatment to surgical resection of colorectal cancer, whereas “metastatic setting” refers to treatment used in colorectal cancer recurrence (when colorectal cancer comes back) after surgical resection of colorectal cancer and after a period of time during which the colorectal cancer cannot be detected.
The terms “level of expression” or “expression level” in general are used interchangeably and generally refer to the amount of a polynucleotide or an amino acid product or protein in a biological sample. “Expression” generally refers to the process by which gene-encoded information is converted into the structures present and operating in the cell. Therefore, as used herein, “expression” of a gene may refer to transcription into a polynucleotide, translation into a protein, or even posttranslational modification of the protein. Fragments of the transcribed polynucleotide, the translated protein, or the post-translationally modified protein shall also be regarded as expressed whether they originate from a transcript generated by alternative splicing or a degraded transcript, or from a posttranslational processing of the protein, e.g., by proteolysis. “Expressed genes” include those that are transcribed into a polynucleotide as mRNA and then translated into a protein, and also those that are transcribed into RNA but not translated into a protein (for example, transfer and ribosomal RNAs).
As used herein the terms “subject” or “patient” are well-recognized in the art, and, are used interchangeably herein to refer to a mammal, including dog, cat, rat, mouse, monkey, cow, horse, goat, sheep, pig, camel, and, most preferably, a human. In some embodiments, the subject is a subject in need of treatment or a subject with a disease or disorder, such as colorectal cancer. However, in other embodiments, the subject can be a normal “healthy” subject or a subject who has already undergone a treatment, such as for example a prior surgical resection of colorectal cancer. The term does not denote a particular age or sex. Thus, adult and newborn subjects, whether male or female, are intended to be covered.
Applicants used non-matrix factorization (NMF) based consensus-based unsupervised clustering of CRC gene expression profiles from 1049 patient samples overlaid with corresponding response data to an epidermal growth factor receptor (EGFR)-targeted drug (cetuximab; clinically available) to identify six clinically relevant subtypes of CRC. These subtypes exhibit differential patterns of gene expression (CRC assigner signature) and associate with chemotherapy response and disease-free survival. Surprisingly, these subtypes appear to transcend the microsatellite stable (MSS/MSI) status traditionally used to subtype CRC in terms of predicting response to therapy. Interestingly, these subtypes have phenotypes similar to various normal cell types within the colon-crypt and exhibit differential degrees of stemness. In addition, CRC assigner signatures classified human CRC cell lines and xenograft tumors into four of the five CRC subtypes, which can now better serve as surrogates to analyze drug responsiveness and other parameters of CRC tumor subtypes. Recognizing these subtypes, their apparent cellular phenotypes, and their differential responses to therapy may guide the development of pathway- and mechanism-based therapeutic strategies targeted at specific subtypes of CRC tumors.
Seeking to extend and generalize these findings for CRC, and in particular as a step towards a more specific predictive clinical classification system for CRC, Applicants used consensus-based non-negative matrix factorization (NMF) to cluster two published gene expression datasests (after merging them using the distance weighted discrimination—DWD—method) derived from resected, primary CRC (core dataset, n=445). This approach revealed five distinct molecular genetic subtypes of CRC, with each of the five subtypes exhibiting a high degree of consensus. Because expression profiles obtained from the pooled data were envisioned to be used for identification of gene signatures (and marker gene components thereof) of putative subtypes, silhouette width (a measure of goodness of cluster validation that identifies samples that are the most representative of the subtypes and belong to their own subtype than to any other subtypes) was used to exclude samples situated on the periphery of the five CRC subtype clusters, yielding a ‘core’ set of 387 CRC samples. To identify markers associated with the 5 subtypes, Applicants used two algorithms—Significance Analysis of Microarrays (SAM, false discovery rate, FDR=0), followed by Prediction Analysis for Microarrays (PAM)—to identify 786 subtype-specific signature genes.
More specifically in order to detect multiple subtypes (some of which may represent relatively small fractions of the patient population), the clustering methods require moderately large numbers of samples—more than contained in any one of the individual CRC datasets published to date. With that in mind, Applicants began our analysis by identifying suitable and comparable microarray datasets (see Table 1) and selecting only those datasets that were described in Dalerba, et al, Nature biotechnology 29, 1120-1127 (2011), as not having redundant samples.
Once the datasets were selected, the raw gene expression readouts were either normalized using robust multiarray averaging (RMA) or obtained as processed data from the Applicants, and then pooled using distance weighted discrimination (DWD) after normalizing each dataset to N(0,1). Consensus-based non-negative matrix factorization (NMF) was applied to the pooled data to cluster the samples into the initial set of three and then five CRC subtypes. Although NMF based consensus-based clustering algorithms can be used to detect robust clusters (i.e. clusters that tolerate a moderate degree of outlier contamination in the training set), the identification of genes (or markers) specific to each cluster is somewhat more sensitive to samples representing rare subtypes or samples of indeterminate origin. Therefore, once the clusters (subtypes) were identified using NMF, Applicants used silhouette width to screen out those samples residing on the periphery of the NMF-identified clusters. From there, Applicants applied well-established methods (Significance Analysis of Microarrays; SAM and Prediction Analysis for Microarrays; PAM) to extract biomarkers associated with the screened subtypes.
Pooling Datasets Using DWD.
When pooling microarray data, one of the main challenges is to pool the microarray datasets in such a way as to compensate for systematic biases (e.g. batch effects) without distorting or collapsing biologically informative and subtype-discriminating structures in the gene expression space. In this respect, a method known as distance weighted discrimination (DWD) was used to pool microarray data and showed that DWD demonstrates superior pooling characteristics when compared to alternative methods such as singular value decomposition (SVD) and Fisher linear discrimination, especially for high-throughput gene expression data in which Applicants must contend with small numbers of samples relative to the number of gene expression readouts (i.e. a high dimensional features space). As a variation on the support vector machine (SVM) approach, DWD is suitable for high dimensional features spaces, but it has the added benefit of minimizing the effects that data artifacts and outliers can have on the batch effect adjustments.
Unsupervised Clustering Using Consensus-Based NMF—
By itself, non-negative matrix factorization is a dimensionality reduction method in which Applicants can attempt to capture the salient functional properties of a high-dimensional gene expression profile using a relatively small number of “metagenes” (defined to be non-negative linear combinations of the expression of individual genes—i.e. a weighted average of gene expression, with each metagene having its own set of weighting coefficients). As with principal component analysis, the familiar gene expression table (samples×genes) is factored into two lower-dimensional matrices except that for NMF the matrix factors are constrained to be purely non-negative values. This ‘non-negativity’ constraint is believed to more realistically represent the nature of gene expression, in that gene expression is either zero- or positive-valued. In contrast, PCA matrix factors can be either positive- or negative-valued.
Given an arbitrary gene expression table (profile), it is not generally possible to analytically factor the table into two matrix factors. As a consequence, numerical algorithms have been developed to accomplish this by first initializing the two matrices to random values and then iteratively updating the matrices using a search algorithm. There is no guarantee that this search algorithm will converge to a globally optimal factorization, hence one re-runs the algorithm using multiple random initial conditions to see whether the algorithm provides a consistent consistent factorization. At the end of the factorization algorithm, one obtains two lower-dimensional matrices, which when multiplied together will yield an approximation to the original gene expression table. The metagenes correspond to functional properties represented in the original gene expression table and can be viewed as ‘anchors’ for clustering the samples into subtypes. Specifically, each sample is assigned to a subtype by finding which metagene is most closely aligned with the sample's gene expression profile. Hence each sample is assigned to one and only one cluster.
As explained above, the robustness of clustering can be gauged by repeating the factorization process several times using different random initial conditions for the factorization algorithm. If the factorization is insensitive to the initial conditions of the search algorithm, then any pair of samples will tend to co-cluster irrespective of the initial condition.
In the NMF consensus analysis of the core dataset, Applicants found good consensus for both k=3 and k=5 clusters, suggesting that there was evidence for 5 consensus clusters and hence 5 functional properties in the core dataset
Removing Outliers Using Silhouette Width—
For the purposes of identifying subtype-specific markers, the analysis includes only those samples that are statistically belonging to the core of each of the clusters. Excluding samples with negative silhouette width has been shown minimize the impact of sample outliers on the identification of subtype markers. Accordingly, 58 samples from the original 445 samples dataset were identified as having negative silhouette width and were therefore excluded from the marker identification phase of the analysis.
Identification of Subtype-Specific Biomarkers Using SAM and PAM—
Applicants used a two-step process to identify subtype-specific biomarkers. The first step, identifies the differentially expressed genes and the second step finds subsets of these genes that are associated with specific subtypes. For the first step, Applicants used significance analysis of microarrays (SAM) to identify genes significantly differentially expressed across the 5 subtypes. This is a well established method that looks for large differential gene expression relative to the spread of expression across all genes. Sample permutation is used to estimate false discovery rates (FDR) associated with sets of genes identified as differentially expressed. By adjusting a sensitivity threshold, ΔSAM, users can control the estimated FDR associated with the gene sets. the gene sets. For the analysis, Applicants selected ΔSAM=12.2, which yielded 786 differentially expressed genes and an FDR of zero. The second step in the process was to match the differentially expressed genes to specific subtypes. For this step, Applicants used the prediction analysis of microarrays (PAM), which is similar in nature to the centroid method recently applied by the TCGA consortium to glioblastoma data, except that PAM eliminates the contribution of genes that differentially express below a specific threshold, ΔPAM, relative to the subtype-specific centroids. Threshold scales, ΔPAM=2 was chosen after evaluating various ΔPAM values and misclassification errors. Leave out cross validation (LOCV) analysis was then performed to identify a set of genes that had the lowest prediction error. Applicants identified all of the 786 SAM selected genes that had the lowest prediction error of about 7% after PAM and LOCV analysis. The resulting subtype-specific markers (CRCassigner) are listed in Table 2.
Based on genes preferentially expressed in the each subtype, Applicants named the five CRC subtypes:
According to an embodiment of the present invention, preferred gene profile specific to “Transit-amplifying (TA)” type of CRC is shown in Table 3 and more preferred gene profile specific to “Transit-amplifying (TA)” type of CRC is shown in Table 4. The scores are illustrative only and represent expression profiles (tendencies) of listed genes. Positive score means high expression, negative score means low expression and zero means no change in expression.
In a further embodiment of the present invention, preferred gene profile specific to “Stem-like” type of CRC are shown in Table 5 and more preferred gene profile specific to “Stem-like” type of CRC are shown in Table 6. The scores are illustrative only and represent expression profiles (tendencies) of listed genes. Positive score means high expression, negative score means low expression and zero means no change in expression.
In a further embodiment of the present invention, preferred gene profile specific to “Inflammatory” type of CRC are shown in Table 7 and more preferred gene profile specific to “Inflammatory” type of CRC are shown in Table 8. The scores are illustrative only and represent expression profiles (tendencies) of listed genes. Positive score means high expression, negative score means low expression and zero means no change in expression.
In a further embodiment of the present invention, preferred gene profile specific to “Goblet-like” type of CRC are shown in Table 9 and more preferred gene profile specific to “Goblet-like” type of CRC are shown in Table 10. The scores are illustrative only and represent expression profiles (tendencies) of listed genes. Positive score means high expression, negative score means low expression and zero means no change in expression.
In a further embodiment of the present invention, preferred gene profile specific to “Enterocyte” type of CRC are shown in Table 11 and more preferred gene profile specific to “Enterocyte” type of CRC are shown in Table 12. The scores are illustrative only and represent expression profiles (tendencies) of listed genes. Positive score means high expression, negative score means low expression and zero means no change in expression.
In
To determine if particular CRC subtypes amongst the five Applicants identified are associated with survival, Applicants evaluated one of the core CRC datasets, GSE14333, which included disease-free survival (DFS; n=197) information. In this dataset, the median follow up among patients without events was 45.1 months. Applicants first evaluated DFS for all the samples irrespective of their treatments (adjuvant radiation and/or chemotherapy) or Duke's stage (combined Duke's stage A or B and considered C separately), the later of which is known to correlate with CRC-specific survival. Applicants found no significant association of subtypes with DFS (p=0.12; log-rank test;
In an embodiment, the present invention provides an in-vitro method for the prognosis of disease-free survival of a subject suffering from colorectal cancer or suspected of suffering therefrom and who has undergone a prior surgical resection of colorectal cancer, the method comprising
A preferred method according to the invention comprises the combination of genes comprising at least two genes selected from Table 2, or at least five genes selected from Table 2, or at least 10 genes selected from Table 2, or at least 20 genes that are selected from Table 2, more preferred at least 30 genes that are selected from Table 2, more preferred at least 40 genes that are selected from Table 2, more preferred at least 50 genes that are selected from Table 2, more preferred at least 60 genes that are selected from Table 2, more preferred at least 70 genes that are selected from Table 2, more preferred at least 80 genes that are selected from Table 2, more preferred at least 90 genes that are selected from Table 2, more preferred at least 100 genes that are selected from Table 2, more preferred at least 120 genes that are selected from Table 2, more preferred at least 140 genes that are selected from Table 2, more preferred at least 160 genes that are selected from Table 2, more preferred at least 180 genes that are selected from Table 2, more preferred at least 200 genes that are selected from Table 2, more preferred at least 220 genes that are selected from Table 2, more preferred at least 240 genes that are selected from Table 2, more preferred at least 260 genes that are selected from Table 2, more preferred at least 280 genes that are selected from Table 2, more preferred at least 300 genes that are selected from Table 2, more preferred at least 320 genes that are selected from Table 2, more preferred at least 340 genes that are selected from Table 2, more preferred at least 360 genes that are selected from Table 2, more preferred at least 380 genes that are selected from Table 2, more preferred at least 400 genes that are selected from Table 2, more preferred at least 420 genes that are selected from Table 2, more preferred at least 460 genes that are selected from Table 2, more preferred at least 480 genes that are selected from Table 2, more preferred at least 500 genes that are selected from Table 2, more preferred at least 520 genes that are selected from Table 2, more preferred at least 540 genes that are selected from Table 2, more preferred at least 560 genes that are selected from Table 2, more preferred at least 580 genes that are selected from Table 2, more preferred at least 600 genes that are selected from Table 2, more preferred at least 620 genes that are selected from Table 2, more preferred at least 640 genes that are selected from Table 2, more preferred at least 660 genes that are selected from Table 2, more preferred at least 680 genes that are selected from Table 2, more preferred at least 700 genes that are selected from Table 2, more preferred at least 720 genes that are selected from Table 2, more preferred at least 740 genes that are selected from Table 2, more preferred at least 760 genes that are selected from Table 2.
In a further preferred embodiment, a method of the invention comprises the combination of genes selected from all 786 genes of Table 2.
More preferably the combination of genes comprises at least two, or at least five, or at least 10, or at least 20, or at least 30, or at least 40 genes selected from Table 2.
Preferably the combination of genes comprises genes listed in Tables 3, 5, 7, 9 and 11. More preferably the combination of genes comprises genes listed in Tables 4, 6, 8, 10 and 12.
More preferably the combination of genes comprises LY6G6D, KRT23, CEL, ACSL6, EREG, CFTR, TCN1, PCSK1, NCRNA00261, SPINK4, REG4, MUC2, TFF3, CLCA4, ZG16, CA1, MS4A12, CA4, CXCL13, RARRES3, GZMA, IDO1, CXCL9, SFRP2, COL10A1, CYP1B1, MGP, MSRB3, ZEB1, FLNA.
Also more preferably the combination of genes comprises SFRP2, ZEB1, RARRES3, CFTR, FLNA, MUC2, TFF3.
Applicants next sought to compare their method with the standard method of CRC classification, namely microsatellite instability (MSI). Applicants assessed subtype prevalence and distribution in samples from a dataset with known MSI status (GSE13294)9 and observed that 94% of the inflammatory subtype were MSI whereas 86% of the TA and 77% of the stem-like subtypes were microsatellite stable (MSS,
Numerous cell types with specialized functions make up the colon. While colonic stem cells are thought to be the cell of origin for CRC, more differentiated cells may have similar capacity. In light of these considerations, Applicants performed a series of analyses seeking to describe the cellular phenotypes of the observed CRC subtypes. First, Applicants used a published gene signature that discriminates between the normal colon crypt top (where terminally differentiated cells reside) and the normal crypt base (where the undifferentiated or stem cells reside). Using reside). Using the Nearest Template Prediction (NTP) algorithm, Applicants predicted that 98% of the stem-like subtype tumors were significantly associated with the crypt base signature (statistics includes only those samples that were predicted with FDR<0.2). On the other hand, more than 75% of samples from the enterocyte subtype tumors were significantly associated with crypt top by their concordant gene signatures. Intriguingly, 60% of the TA subtype tumor samples have a crypt top signature with low expression of Wnt signaling targets, LGR5 and ASCL2. In contrast, the rest of the TA subtype tumors are significantly associated with the crypt base and exhibit high mRNA expression of the stem/progenitor markers LGR5 and ASCL2 (
To associate CRC subtypes to colon crypt top/base, Applicants used a previously published gene signature (Kosinski, C., et al., Proceedings of the National Academy of Sciences of the United States of America 104, 15418-15423 (2007) of the colon crypt base (see
After performing NTP algorithm based prediction for association of colon-crypt top/base to each sample using a published gene signature that discriminates between the normal colon crypt top and the normal crypt base, Applicants observed statistically significant (only for samples with FDR<0.2) associations as reported in the main text. Here, Applicants are reporting the statistics for all the samples irrespective of the FDR cut-off. Applicants observed that 55% that 55% (n=77) of the stem-like subtype is associated with the crypt base whereas 33% (n=105) of TA, 43% (n=63) of goblet-like and 75% (n=64) of enterocyte subtypes are associated with the crypt top. On the other hand, Applicants observed that more than 80% (n=78) of the inflammatory subtypes have no significant association with either the crypt base or top.
The colon-crypt base is composed predominantly of stem and progenitor cells, which are known to exhibit high Wnt activity. Thus, Applicants examined Wnt signaling activity in the stem-like subtype by mapping a publicly available gene signature for active Wnt signaling onto the core CRC dataset. Similar to the colon-crypt top/base gene signature comparison, the majority of the stem-like subtype samples were predicted to have high Wnt activity, whereas enterocyte and goblet-like subtypes did not (
In order to validate the five subtypes in additional datasets, Applicants mapped the SAM and PAM genes-specific to each subtypes onto each of the preprocessed dataset (RMA in the case of Affymetrix arrays and directly from authors in case of other microarray platforms). Later, Applicants performed consensus-based NMF analysis to identify the number of classes. Further, heatmap was generated using NMF class and SAM and PAM genes.
Applicants performed DWD based merging of gene expression profile datasets for CRC cell lines from two different sources, for the purpose of increasing the total number of CRC cell lines, after first removing 14 repeated cell lines between the two datasets. Overall, Applicants obtained 51 unique CRC cell lines. The merged cell lines dataset was later merged again with the CRC core dataset, using the DWD based method. Next, Applicants performed NMF based consensus clustering of the merged CRC cell lines and core dataset, seeking to identify subtypes amongst the cell lines (
Applicants examined the relationship between disease-free survival (DFS) and other histopathological information such as Dukes' stage, age, location of tumors (left or right of colon or rectum) and adjuvant treatment in the GSE14333 dataset; see Table 13.
Applicants censored those patients who were alive without tumor recurrence or dead at last contact. Since subtype is not significantly associated with DFS for all the data, Applicants first used a Cox model to do an adjusted analysis using the variables of Duke's stage or adjuvant treatment. As subtype was not significant in the adjusted analysis, Applicants examined the relationships between subtype and DFS on subsets based on these variables as shown in the main text.
In this dataset, the median follow up among patients without events (tumor recurrence) was 45.1 months. As already mentioned, Applicants first evaluated DFS for all the samples irrespective of treatment (adjuvant chemotherapy and/or radiotherapy—standard chemotherapy of either single agent 5-fluouracil; 5-FU/capecitabine or 5-FU and oxaliplatin) or Dukes' stage (for analysis, Applicants considered Dukes' stage A and B patients with lymph node negativity together whereas Dukes' stage C patients with lymph node positivity separately), the latter known to correlate with CRC survival. Applicants did not find a significant association between subtype and DFS (p=0.12;
The monoclonal anti-EGFR antibody cetuximab is a mainstay of treatment for metastasitc CRC with wild-type Kras; however, cetuximab has failed to show benefit in the adjuvant setting, irrespective of KRAS genotype. Applicants examined the possibility that tumors from our subtypes respond differently to cetuximab. To this end, Applicants correlated their subtypes with cetuximab response using a CRC liver metastases microarray (Khambata-Ford) dataset with matched therapy response from patients (n=80). In this particular dataset, Applicants predicted three of their five CRC subtypes using NMF consensus clustering and CRCassigner genes (
In the course of further characterizing the two TA subtypes, Applicants observed that CS-TA tumors have significantly higher expression of epiregulin (EREG) and amphiregulin (AREG), which are epidermal growth factor receptor (EGFR) ligands known to be positive predictors of cetuximab response, compared to CR-TA tumors, using SAM analysis (TA signature; FDR=0.1 and delta=0.8,
In another embodiment, the present invention provides an in-vitro method for predicting the likelihood that a subject suffering from colorectal cancer or suspected of suffering therefrom and who has undergone a prior surgical resection of colorectal cancer will respond to therapies inhibiting or targeting EGFR, such as cetuximab, and/or cMET, the method comprising
This analysis of cetuximab/cMET response based subtypes forms six integrated gene expression and drug response based subtypes.
A preferred method according to the invention comprises the combination of genes comprising at least at least five genes selected from Table 2, or at least 10 genes selected from Table 2, or at least 20 genes that are selected from Table 2, more preferred at least 30 genes that are selected from Table 2, more preferred at least 40 genes that are selected from Table 2, more preferred at least 50 genes that are selected from Table 2, more preferred at least 60 genes that are selected from Table 2, more preferred at least 70 genes that are selected from Table 2, more preferred at least 80 genes that are selected from Table 2, more preferred at least 90 genes that are selected from Table 2, more preferred at least 100 genes that are selected from Table 2, more preferred at least 120 genes that are selected from Table 2, more preferred at least 140 genes that are selected from Table 2, more preferred at least 160 genes that are selected from Table 2, more preferred at least 180 genes that are selected from Table 2, more preferred at least 200 genes that are selected from Table 2, more preferred at least 220 genes that are selected from Table 2, more preferred at least 240 genes that are selected from Table 2, more preferred at least 260 genes that are selected from Table 2, more preferred at least 280 genes that are selected from Table 2, more preferred at least 300 genes that are selected from Table 2, more preferred at least 320 genes that are selected from Table 2, more preferred at least 340 genes that are selected from Table 2, more preferred at least 360 genes that are selected from Table 2, more preferred at least 380 genes that are selected from Table 2, more preferred at least 400 genes that are selected from Table 2, more preferred at least 420 genes that are selected from Table 2, more preferred at least 460 genes that are selected from Table 2, more preferred at least 480 genes that are selected from Table 2, more preferred at least 500 genes that are selected from Table 2, more preferred at least 520 genes that are selected from Table 2, more preferred at least 540 genes that are selected from Table 2, more preferred at least 560 genes that are selected from Table 2, more preferred at least 580 genes that are selected from Table 2, more preferred at least 600 genes that are selected from Table 2, more preferred at least 620 genes that are selected from Table 2, more preferred at least 640 genes that are selected from Table 2, more preferred at least 660 genes that are selected from Table 2, more preferred at least 680 genes that are selected from Table 2, more preferred at least 700 genes that are selected from Table 2, more preferred at least 720 genes that are selected from Table 2, more preferred at least 740 genes that are selected from Table 2, more preferred at least 760 genes that are selected from Table 2.
In a further preferred embodiment, a method of the invention comprises the combination of genes selected from all 786 genes of Table 2.
More preferably the combination of genes comprises at least five, or at least 10, or at least 20, or at least 30, or at least 40 genes selected from Table 2.
Preferably the combination of genes comprises AREG, EREG, BHLHE41, FLNA, PLEKHB1 and genes listed in Tables 3, 5, 7, 9 and 11. More preferably the combination of genes comprises AREG, EREG, BHLHE41, FLNA, PLEKHB1 genes listed in Tables 4, 6, 8, 10 and 12.
Next, Applicants examined the possibility that the subtypes may exhibit differential response to first line colorectal chemotherapy (i.e. FOLFIRI) using a published FOLFIRI response signature. FOLFIRI is a current chemotherapy regimen for treatment of colorectal cancer. It comprises the following drugs:
The regimen consists of:
This cycle is typically repeated every two weeks. The dosages shown above may vary from cycle to cycle.
Intriguingly, 100% of the stem-like and 77% of the inflammatory subtype samples were predicted to respond to FOLFIRI, as compared to less than 14% of the TA subtype tumors (statistics include only samples with FDR<0.2,
In a further embodiment, the present invention provides an in-vitro method for predicting the likelihood that a subject suffering from colorectal cancer or suspected of suffering therefrom and who has undergone a prior surgical resection of colorectal cancer will respond to cytotoxic chemotherapies such as FOLFIRI, the method comprising
Preferably the combination of genes comprises genes listed in Tables 3, 5, 7, 9 and 11. More preferably the combination of genes comprises genes listed in Tables 4, 6, 8, 10 and 12.
A preferred method according to the invention comprises the combination of genes comprising at least two genes selected from Table 2, or at least five genes selected from Table 2, or at least 10 genes selected from Table 2, or at least 20 genes that are selected from Table 2, more preferred at least 30 genes that are selected from Table 2, more preferred at least 40 genes that are selected from Table 2, more preferred at least 50 genes that are selected from Table 2, more preferred at least 60 genes that are selected from Table 2, more preferred at least 70 genes that are selected from Table 2, more preferred at least 80 genes that are selected from Table 2, more preferred at least 90 genes that are selected from Table 2, more preferred at least 100 genes that are selected from Table 2, more preferred at least 120 genes that are selected from Table 2, more preferred at least 140 genes that are selected from Table 2, more preferred at least 160 genes that are selected from Table 2, more preferred at least 180 genes that are selected from Table 2, more preferred at least 200 genes that are selected from Table 2, more preferred at least 220 genes that are selected from Table 2, more preferred at least 240 genes that are selected from Table 2, more preferred at least 260 genes that are selected from Table 2, more preferred at least 280 genes that are selected from Table 2, more preferred at least 300 genes that are selected from Table 2, more preferred at least 320 genes that are selected from Table 2, more preferred at least 340 genes that are selected from Table 2, more preferred at least 360 genes that are selected from Table 2, more preferred at least 380 genes that are selected from Table 2, more preferred at least 400 genes that are selected from Table 2, more 2, more preferred at least 420 genes that are selected from Table 2, more preferred at least 460 genes that are selected from Table 2, more preferred at least 480 genes that are selected from Table 2, more preferred at least 500 genes that are selected from Table 2, more preferred at least 520 genes that are selected from Table 2, more preferred at least 540 genes that are selected from Table 2, more preferred at least 560 genes that are selected from Table 2, more preferred at least 580 genes that are selected from Table 2, more preferred at least 600 genes that are selected from Table 2, more preferred at least 620 genes that are selected from Table 2, more preferred at least 640 genes that are selected from Table 2, more preferred at least 660 genes that are selected from Table 2, more preferred at least 680 genes that are selected from Table 2, more preferred at least 700 genes that are selected from Table 2, more preferred at least 720 genes that are selected from Table 2, more preferred at least 740 genes that are selected from Table 2, more preferred at least 760 genes that are selected from Table 2.
In a further preferred embodiment, a method of the invention comprises the combination of genes selected from all 786 genes of Table 2.
More preferably the combination of genes comprises at least two, or at least five, or at least 10, or at least 20, or at least 30, or at least 40 genes selected from Table 2.
More preferably the combination of genes comprises LY6G6D, KRT23, CEL, ACSL6, EREG, CFTR, TCN1, PCSK1, NCRNA00261, SPINK4, REG4, MUC2, TFF3, CLCA4, ZG16, CA1, MS4A12, CA4, CXCL13, RARRES3, GZMA, IDO1, CXCL9, SFRP2, COL10A1, CYP1B1, MGP, MSRB3, ZEB1, FLNA.
Also more preferably the combination of genes comprises SFRP2, ZEB1, RARRES3, CFTR, FLNA, MUC2, TFF3.
Methods according to the invention preferably further comprise determining a strategy for treatment of the patient. Treatment may include, for example, radiation therapy, chemotherapy, targeted therapy, or some combination thereof. Treatment decisions for individual colorectal cancer patients are currently based on stage, patient age and condition, the location and grade of the cancer, the number of patient lymph nodes involved, and the absence or presence of distant metastases.
Classifying colorectal cancers into subtypes at the time of diagnosis using the methods disclosed in the present invention provides an additional or alternative treatment decision-making factor, thereby providing additional information for adapting the treatment of a subject suffering from colorectal cancer (see
“Stem-like” type of colorectal cancer indicates good response to FOLFIRI treatment and poor response to cetuximab treatment, which means that patients suffering from or suspected to suffer from “Stem-like” type of colorectal cancer should be rather treated with adjuvant chemotherapy, preferably FOLFIRI treatment, to classic colorectal cancer surgical resection. Chemotherapy, preferably adjuvant FOLFIRI, would be also beneficial in case of metastatic treatment.
“Inflammatory” type of colorectal cancer indicates good response to chemotherapy, preferably FOLFIRI treatment, which means that patients suffering from or suspected to suffer from “Inflammatory” type of colorectal cancer should be rather treated with adjuvant chemotherapy, preferably adjuvant FOLFIRI treatment.
“Transit-amplifying cetuximab-sensitive (CS-TA)” type of colorectal cancer indicates poor response to FOLFIRI treatment and good response to cetuximab treatment, which means that patients suffering from or suspected to suffer from “Transit-amplifying cetuximab-sensitive (CS-TA)” type of colorectal cancer should be rather treated with cetuximab treatment at metastatic setting. Thus at adjuvant setting (adjuvant therapy to surgical resection of colorectal cancer), this CS-TA type indicates that patients will not require any treatment in addition to surgical resection of colorectal cancer, but a watchful-surveillance until the patient recur with the disease to be treated with cetuximab.
“Transit-amplifying cetuximab-resistant (CR-TA)” type of colorectal cancer indicates poor response to FOLFIRI treatment and almost no response to cetuximab treatment but shows good response to cMET inhibition, which means that patients suffering form or suspected to suffer from “Transit-amplifying cetuximab-resistant (CR-TA)” type of colorectal cancer should be rather treated with cMET inhibitor at metastatic setting. Thus at adjuvant setting (adjuvant therapy to surgical resection of colorectal cancer), this CR-TA subtype indicates that patients will not require any treatment, but a watchful-surveillance until the patient recur with the disease to be treated with cMet inhibitors.
“Goblet-like” type of colorectal cancer indicates intermediate response to adjuvant FOLFIRI treatment and poor response to cetuximab treatment.
“Enterocyte” type of colorectal cancer indicates poor response to adjuvant FOLFIRI treatment.
Moreover, “Stem-like” type of colorectal cancer and “Inflammatory” type of colorectal cancer that have a poor or intermediate prognosis, as determined by gene expression profiling of the present invention, may benefit from adjuvant therapy (e.g., radiation therapy or chemotherapy). Chemotherapy for these patients may include FOLFIRI treatment, fluorouracil (5-FU), 5-FU plus leucovorin (folinic acid); 5-FU, leucovorin plus oxaliplatin; 5-FU, leucovorin plus irinotecan; capecitabine, and/or drugs for targeted therapy, such as an anti-VEGF antibody, for example Bevacizumab, and an anti-Epidermal growth factor receptor antibody, for example Cetuximab and/or combinations of said treatments. Radiation therapy may include external and/or internal radiation therapy. Radiation therapy may be combined with chemotherapy as adjuvant therapy.
In another embodiment of the present invention, the patients suffering from or suspected to suffer from “Transit-amplifying” type of colorectal cancer, may take advantage of the following treatment depending on expressions of EREG gene and FLNA gene:
A biological sample comprising a cancer cell of a colorectal cancer or suspected to comprise a cancer cell of a colorectal cancer is provided after the removal of all or part of a colorectal cancer sample from the subject during surgery or colonoscopy. For example, a sample may be obtained from a tissue sample or a biopsy sample comprising colorectal cancer cells that was previously removed by surgery. Preferably a biological sample is obtained from a tissue biopsy.
A sample of a subject suffering from colorectal cancer or suspected of suffering there from can be obtained in numerous ways, as is known to a person skilled in the art. For example, the sample can be freshly prepared from cells or a tissue sample at the moment of harvesting, or they can be prepared from samples that are stored at −70° C. until processed for sample preparation. Alternatively, tissues or biopsies can be stored under conditions that preserve the quality of the protein or RNA. Examples of these preservative conditions are fixation using e.g. formaline and paraffin embedding, RNase inhibitors such as RNAsin (Pharmingen) or RNasecure (Ambion), aqueous solutions such as RNAlater (Assuragen; U.S. Ser. No. 06/204,375), Hepes-Glutamic acid buffer mediated Organic solvent Protection Effect (HOPE; DE 10021390), and RCL2 (Alphelys; WO04083369), and non-aqueous solutions such as Universal Molecular Fixative (Sakura Finetek USA Inc.; U.S. Pat. No. 7,138,226). Alternatively, a sample from a colorectal cancer patient may be fixated in formalin, for example as formalin-fixed paraffin-embedded (FFPE) tissue.
Preferably measuring the expression level of genes in methods of the present invention is obtained by a method selected from the group consisting of:
(a) detecting RNA levels of said genes, and/or
(b) detecting a protein encoded by said genes, and/or
(c) detecting a biological activity of a protein encoded by said genes.
The detecting RNA levels is obtained by any technique known in the art, such as Microarray hybridization, quantitative real-time polymerase chain reaction, multiplex-PCR, Northern blot, In Situ Hybridization, sequencing-based methods, quantitative reverse transcription polymerase-chain reaction, RNAse protection assay or an immunoassay method.
The detecting of protein levels of aforementioned genes is obtained by any technique known in the art, such as Western blot, immunoprecipitation, immunohistochemistry, ELISA, Radio Immuno Assay, proteomics methods, or quantitative immunostaining methods.
According to another embodiment, expression of a gene of interest is considered elevated when compared to a healthy control if the relative mRNA level of the gene of interest is greater than 2 fold of the level of a control gene mRNA. According to another embodiment, the relative mRNA level of the gene of interest is greater than 3 fold, 5 fold, 10 fold, 15 fold, 20 fold, 25 fold, or 30 fold compared to a healthy control gene expression level.
For example the microarray method comprises the use of a microarray chip having one or more nucleic acid molecules that can hybridize under stringent conditions to a nucleic acid molecule encoding a gene mentioned above or having one or more polypeptides (such as peptides or antibodies) that can bind to one or more of the proteins encoded by the genes mentioned above.
For example the immunoassay method comprises binding an antibody to protein expressed from a gene mentioned above in a patient sample and determining if the protein level from the patient sample is elevated. The immunoassay method can be an enzyme-linked immunosorbent assay (ELISA), electro-chemiluminescence assay (ECLA), or multiplex microsphere-based assay platform, e.g., Luminex® platform.
In a further embodiment, the present invention provides a kit for classifying a sample of a subject suffering from colorectal cancer or suspected of suffering there from, the kit comprising a set of primers, probes or antibodies specific for genes selected from the group of genes listed in Table 2.
The kit can further comprise separate containers, dividers, compartments for the reagents or informational material. The informational material of the kits is not limited in its form. In many cases, the informational material, e.g., instructions, is provided in printed matter, e.g., a printed text, drawing, and/or photograph, e.g., a label or printed sheet. However, the informational material can also be provided in other formats, such as Braille, computer readable material, video recording, or audio recording. Of course, the informational material can also be provided in any combination of formats.
In another embodiment, the present invention provides immunohistochemistry and quantitative real-time PCR based assays for identifying CRC subtypes. Immunohistochemistry markers were developed for at least following four CRC subtypes (see
A) TA subtype where CFTR has 3+ staining intensity and other markers have 1+ staining intensity.
B) Goblet-like subtype where MUC2 and TFF3 (2 markers) have 3+ staining intensity and other markers have 1+ staining intensity.
C) Enterocyte subtype where MUC2 has 3+ staining intensity and other markers have 1+ staining intensity.
D) Stem-like subtype where Zeb1 has 3+ staining intensity and other markers have 1+ staining intensity.
Table 15 (A) and (B) shows the quantitative RT-PCR results (qRT-PCR) for subtype-specific markers in CRC patient tumors. The values represent copy number/ng of cDNA for each gene. The positive values in the column represent those values above average value for that marker whereas negative values represent below average value. Using the average cut-off, Applicants could identify 11/19 samples that represent all the 6 subtypes including CR-TA and CS-TA.
(B)
Summary of subtype-specific candidate biomarkers (CRCassignor-7) that were tested using qRT-PCR and immunohistochemistry (IHC) are shown in Table 16:
Applicants herein document the existence of six subtypes of CRC based on the combined analysis of gene expression and response to cetuximab. Notably, these subtypes are predictive of disease-free prognosis and response to selected therapies (
Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variations and modifications without departing from the spirit or essential characteristics thereof. The invention also includes all of the steps, features, compositions and compounds referred to or indicated in this specification, individually or collectively, and any and all combinations or any two or more of said steps or features. The present disclosure is therefore to be considered as in all aspects illustrated and not restrictive, the scope of the invention being indicated by the appended Claims, and all changes which come within the meaning and range of equivalency are intended to be embraced therein.
The foregoing description will be more fully understood with reference to the following Examples. Such Examples are, however, exemplary of methods of practicing the present invention and are not intended to limit the scope of the invention.
Processing of Microarrays.
The processing of microarrays from CEL files was performed as already described. Published microarray data were obtained from GEO Omnibus and the raw CEL files from Affymetrix GeneChip® arrays for all samples were processed, robust multiarray averaged (RMA), and normalized using R-based Bioconductor. The patient characteristics for the published microarray data were obtained from GEO Omnibus using Bioconductor package, GEOquery.
Combining Different Microarray Datasets.
Microarray datasets from different published studies were screened separately for variable genes using standard deviation (SD) cut off greater than 0.8. The screened datasets were column (sample) normalized to N(0,1) and row (gene) normalized and then merged using Java-based DWD. Finally, the rows were median centered before further downstream analysis, as already described.
NMF, SAM and PAM Analysis.
The stable subtypes were identified using consensus clustering-based NMF followed by SAM (using classes defined by NMF analysis) and PAM (using significant genes defined by SAM) analysis to identify gene signature specific to each of the subtypes.
Survival Statistics.
Kaplan-Meier Survival curves were plotted and log-rank test were performed using GenePattern based Survival Curve and Survival Difference programs. Multivariate Cox Regression analysis was performed using R based library, survival.
Cell Lines.
Colon cancer cell lines were grown in DMEM (Gibco, USA) plus 10% FBS (Invitrogen, USA) without antibiotics/antimycotics. All the cell lines were confirmed to be negative for mycoplamsa by PCR (VenorGeM kit, Sigma, USA) prior to use and were tested monthly.
Drug Response in Cell Lines.
Cells were added (5×103) into 96-well plates on day 0 and treated with cetuximab (Merck Serono, Geneva, Switzerland), cMet inhibitor (PFA 665752, Santa Cruz Biotechnology, Inc., Santa Cruz, Calif.) or vehicle control (media alone or DMSO) on day 1. Proliferation was monitored using CellTiter-Glo® assay kit according to the manufacturer's instruction (Promega, Dubendorf, Switzerland) on day 3 (72 h).
RNA Isolation and RT-PCR.
RNA was isolated using miReasy kit (Qiagen, Hombrechtikon, Switzerland) as per the manufacturer's instructions. The sample preparation for Real-time RT-PCR was performed using QIAgility (automated PCR setup, Qiagen) and PCR assay was performed using QuantiTect SYBR Green PCR kit (Qiagen), gene specific primers (see Table 17) and Rotor-Gene Q (Qiagen) real-time PCR machine.
TOP Flash Assay.
The TOP/FOP-flash assay was performed as instructed by the manufacturer (Upstate, USA). Briefly, colon cancer cell lines were plated into 24-well dishes in biological triplicate at 10K cells/well in full growth media (RPMI+10% FBS). The next day, the media was changed to that containing 3 uL of PEI (stock, 1 mg/mL), TOP or FOP-flash DNA (0.25 ug/well) and a plasmid encoding constitutive expression of Renilla luciferase (to normalize for transfection efficiency). Two days later, the cells were assayed. Samples were prepared in biological triplicate (s.d. n=3) and the experiment was repeated twice.
Immunofluorescence.
Colon cancer cell lines were plated, and allowed to set overnight, onto gelatin-coated (0.1% solution in PBS) cover slides in 24-well dishes. The following day, the cells were fixed with 4% paraformaldehyde in PBS (20 minutes, room temperature) and washed twice. Immunofluorescent analysis was performed as described36. Antibody dilutions are as follows: MUC2 (1:100, SC7314; Santa Cruz, USA) and KRT20 (1:50, M7019; DAKO, USA).
Orthotopic Implantation of CRC Cell Lines into Mice and RNA Isolation.
NMRI nu/nu mice (6-8 week old females) were anesthetized with Ketamine and Xylazin, additionally receiving buprenorphin (0.05-2.5 mg/kg) before surgery. The animals were placed on a heated operation table. A midline incision was performed and the descending colon was identified. A polyethylene catheter was inserted rectally and the descending colon was bedded extra-abdominally. To obtain a transplant tumor, human CRC cell lines (2 million cells per site) were injected into the wall of the descending colon. Care was taken not to puncture the thin wall and inject the cells into the lumen of the colon. Presence of growing tumors at the site of injection was detected by colonoscopy or laparatomy 21 days after the initial surgery. The animals were sacrificed and tumors were explanted and immediately frozen in liquid nitrogen, and tumor samples were stored at −80° C. The animals were cared for per institutional guidelines from Charité—Universitätsmdizin Berlin, Berlin, Germany and the experiments were performed after approval from the Berlin animal research authority LAGeSo (registration number G0068/10).
Snap-frozen tissue samples were embedded in Tissue-Tek® OCT™ (Sakura, Alphen aan den Rijn, The Netherlands) and cut into 20 micrometer sections. Sections corresponding to 5-10 mg of tissue were collected in a microtube. RNA from these samples was prepared using the miRNeasy kit (Qiagen, Hilden, Germany) according to the manufacturer's protocol. RNA concentration and purity were determined using spectrophotometric measurement at 260 and 280 nm, integrity of the RNA was evaluated using a total RNA nano microfluidic cartridge on the Bioanalyzer 2100 (Agilent, Böblingen, Germany).
Immunohistochemistry results are shown in Table 18 for subtype-specific markers in CRC patient markers in CRC patient in CRC patient tumors from tissue microarray (Pantomics). If a marker has +++ or ++ while other markers have ++ or +, respectively, the subtype was assigned accordingly. No inflammatory specific assay due to lack of specific antibodies. Out of 120 samples from TMA only the following were useful for analysis.
Number | Date | Country | Kind |
---|---|---|---|
PCTIB2012056728 | Nov 2012 | IB | international |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2013/060416 | 11/26/2013 | WO | 00 |