The present invention relates to methods, kits and systems for the prognosis of the disease outcome of breast cancer in untreated breast cancer patients. More specific, the present invention relates to the prognosis of breast cancer based on measurements of the expression levels of marker genes in tumor samples of breast cancer patients. Marker genes are disclosed which allow for an accurate prognosis of breast cancer in patients having node negative, fast proliferating breast cancer.
Expression of estrogen receptor alpha and proliferative activity of the breast tumors have long been recognized to be of prognostic importance. Patients with ER positive tumors tend to have a better prognosis than ER negative patients (Osborne et al, 1980) and rapid proliferating tumors tend to have a worse outcome (Gentili et al, 1981). Knowledge about the molecular mechanisms involved in the processes of estrogen dependent tumor growth and proliferative activity has led to the successful development of therapeutic approaches, i.e. anti-endocrine and cytotoxic chemotherapy.
Gene expression profiling has greatly extended the possibility to analyze the underlying biology of the heterogeneous nature of breast cancer. Perou and co-worker (2000) described breast cancer subtypes identified after two dimensional hierarchical clustering which they referred to as luminal, basal-like, normal-like and ERBB2-like breast cancer subtypes. These subtypes differed in their clinical outcome and response to chemotherapy (Sorlie et al, 2001; Sorlie et al, 2003; Rouzier et al, 2005). However, the list of genes used to define these subtypes changed often and proliferation genes were largely neglected in the early publications. Furthermore, a simple, reproducible and comprehensible classification algorithm was not deduced. In a more statistically driven case control design, also called supervised analysis, two different groups identified genes differentially expressed in tumors of node negative and untreated patients who developed a metastasis within five years or remained disease free for at least five years van't Veer et al, 2002; Wang et al, 2005). The respective classification algorithms outperformed all other conventional prognostic factors and were confirmed in subsequent validation studies (van de Vijver et al, 2002; Foekens et al, 2006). However, since both lists overlapped by only 3 genes considerable uncertainty about the validity and general applicability of these findings arose in the medical community (Brenton et al, 2005). Meanwhile it is becoming increasingly clear, that most prognostic and predictive classification algorithms rely predominantly on the measurement of estrogen receptor alpha regulated genes and genes involved in the cell cycle (Paik et al, 2004; Sortiriou et al, 2006; Oh et al, 2006).
Another potential prognostic factor, which was largely unattended in gene expression studies, is the immune system. Tumor infiltration by lymphocytes has long been suggested to influence clinical outcome (Aaltomaa et al, 1992).
In particular, medullary breast cancer (MBC), which is characterized by prominent lymphocytic infiltrates, is linked with relatively good outcome despite estrogen receptor negativity and poor histological grade (Ridolfi et al, 1977). Recently, MBC has been identified to be closely related to basal like tumors (Bertucci et al, 2006) which suggests that the poor outcome of the basal subtype could be improved by the influence of the immune system.
Several groups showed that luminal/ER positive breast cancer has a significantly better outcome than basal/ER negative breast cancer (Sorlie et al, 2001; 2003; Chang et al, 2005). The importance of ER status in breast cancer was further underlined by the finding that ER positive and ER negative tumors display remarkably different gene expression phenotypes not solely explained by differences in estrogen responsiveness (Gruvberger et al, 2001). A reciprocal relationship in the expression levels of genes responsible for prediction of ER status and S-Phase of the cell cycle as a marker for proliferation has been suggested (Gruvberger-Saal et al, 2004). These two factors, ER and proliferation, are major determinants of breast cancer biology. Indeed, several recent studies have focused on the association between proliferation and ER in predicting survival in breast cancer (Perreard et al, 2006; Dai et al, 2005).
A relationship between host defense mechanisms and prognosis of breast cancer has been discussed for decades (Di Paola et al, 1974). However, conflicting results led to dispute about the actual role of tumor-associated leucocytes (O Sullivan and Lewis, 1994). Nonetheless, lymphocytic infiltrates were related to good outcome in breast cancer, especially in rapidly proliferating tumors (Aaltomaa et al, 1992). Menard and co-worker (1997) showed in a comprehensive study of 1919 breast carcinomas an independent prognostic influence of lymphoid infiltration only in younger patients. Since younger patients commonly have more rapidly proliferating tumors as compared to older patients, we focused on the subgroup of tumors with high expression of the proliferation metagene.
Immunophenotyping of tumor-infiltrating lymphocytes (TIL) reveals a preponderance of T cells as compared to B cells (Chin et al, 1992; Gaffey et al, 1993). T cells have an important role both in innate, non-specific immunity and in adaptive, antigen-specific immunity. Given the frequency of tumor-infiltrating T cells as compared with B cells, earlier studies analyzed preferentially the significance of tumor-infiltrating T cells in breast cancer. However, these studies yielded inconsistent results regarding the prognostic significance of T cells (Shimokawara et al, 1982; Lucin et al, 1994).
More recently, several reports focused on oligoclonal expansion of B cells both in MBC (Coronella et al, 2001, Hansen et al, 2001) and in ductal breast carcinoma (DBC) (Coronella et al, 2002; Nzula et al, 2003). Hansen and co-workers (2002) described an oligoclonal B cell response targeting actin which was exposed on the cell surface as an early apoptotic event in MBC. The observed IgG antibody response showed all criteria of an antigen-driven, high-affinity response. Furthermore, ganglioside D3 was identified as another target for an oligoclonal B cell response in MBC (Kotlan et al, 2005). These authors interpreted their findings as proof of principle concerning tumor-infiltrating B lymphocytes. Despite tempting implications regarding the prognostic impact of these findings, none of these studies actually analyzed the significance of the described B cell response for survival.
US 2004/0229297-A1, filed 27 Jan. 2004, discloses a method for the prognosis of the breast cancer in a patient said method comprising detecting in human tumor tissues the infiltration of certain immune cells. High infiltration of the tumor with immune cells was associated with poor cancer prognosis. The method, however, does not use information on the nodal status and does not rely on information on the rate of proliferation of the tumor.
In regard to the continuing need for materials and methods useful in making clinical decisions on adjuvant therapy, the present invention fulfills the need for advanced methods for the prognosis of breast cancer on the basis of readily accessible clinical and experimental data.
The present invention is based on the surprising finding that the outcome of breast cancer in breast cancer patients, not receiving chemotherapy, can be accurately predicted from the expression levels of a small number of marker genes in node-negative patients, having fast proliferating tumors. It has been found that the expression of said marker genes are most informative, in this specific group of patients. As the proliferation status of a tumor can also be assessed from gene expression experiments, the present method allows to collect all necessary data from a single gene chip experiment. Accordingly, the present invention relates to prognostic methods for the determination of the outcome of breast cancer in non-treated breast cancer patients, using information on the nodal status of the patient, on the expression of marker genes being indicative of the proliferation status of the tumor, and information on the expression level of a second marker gene, predictive for the outcome of the disease in said patient. The second marker genes are preferably specifically expressed in immune cells, such as T-cells, B-cells or natural killer cells.
The present invention relates to a method for the prognosis of breast cancer in a breast cancer patient, said method comprising
“Prognosis”, within the meaning of the invention, shall be understood to be the prediction of the outcome of a disease under conditions where no systemic chemotherapy is applied in the adjuvant setting.
The present invention further relates to methods for the prognosis of breast cancer in a breast cancer patient in which said prognosis is based on the information that said nodal status is negative and on information on the that said tumor is a fast proliferating tumor and on information on the said expression level of said second marker gene.
For a prognostic method to “be based” on a multiple pieces of information (as is the case in the present invention) all individual pieces of information must be taken into consideration for arriving at the prognosis. This means that all individual pieces of information can influence the outcome of the prognosis. It is well understood that a piece of information, such as e.g. the nodal status of a patient, can influence the outcome of the prognosis in that the prognostic method is only applied when said nodal status is e.g. negative. Likewise, it is understood that a method can “be based” on information relating to the proliferation rate of the tumor, e.g. if fast proliferation is a conditional criterion applied in the course of the prognostic method.
In preferred methods of the invention, said prognosis is entirely based on the information that said nodal status is negative and that said tumor is a fast proliferating tumor and on information on the expression level of said second marker gene in said tumor sample.
In preferred methods of the invention, said prognosis is an estimation of the likelihood of metastasis fee survival of said patient over a predetermined period of time, e.g. over a period of 5 years.
In further preferred methods of the invention, said prognosis is an estimation of the likelihood of death of disease of said patient over a predetermined period of time, e.g. over a period of 5 years.
“Death of disease”, within the meaning of the invention, shall be understood to be the death of a breast cancer patient after recurrence of the disease.
“Recurrence”, within the meaning of the invention, shall be understood to be the recurrence of breast cancer in form of metastatic spread of tumor cells, local recurrence, contralateral recurrence or recurrence of breast cancer at any site of the body of the patient.
In specific embodiments of the invention, the breast cancer patient is not treated with cancer chemotherapy in the adjuvant setting.
In preferred methods of the invention, the expression of said first marker gene is indicative of fast proliferation of the tumor.
In preferred methods of the invention, said first marker gene is selected from Table 1.
In specific embodiments of the invention, a single, or 2, 5, 10, 20, 50 or 100 first marker genes are used.
In a preferred embodiment of the invention, said first marker gene is TOP2A. In another specific embodiment of the invention said first marker gene is a gene co-regulated with TOP2A. Co-regulation of two genes, according to the invention, is preferably exemplified by a correlation coefficient between expression levels of said two genes in multiple tissue samples of greater than 0.5, 0.7, 0.9, 0.95, 0.99, or, most preferably 1. The statistical accuracy of the determination of said correlation coefficient is preferably +/−0.1 (absolute standard deviation).
In a preferred embodiment of the invention, a proliferation metagene expression value is constructed using 2, 3, 4, 5, 10, 20, 50, or all of the genes listed in Table 1.
In a preferred embodiment of the invention, a proliferation metagene expression value is constructed using 2, 3, 4, 5 or 6 genes from the list of TOP2A, UBE2C, STK6, CCNE2, MKI67, or CCNB1.
“Proliferation metagene expression value”, within the meaning of the invention, shall be understood to be a calculated gene expression value representing the proliferative activity of a tumor. In a preferred embodiment of the invention, the proliferation metagene expression value is calculated from multiple marker genes selected from Table 1.
A metagene expression value, in this context, is to be understood as being the median of the normalized expression of multiple marker genes. Normalization of the expression of multiple marker genes is preferably achieved by dividing the expression level of the individual marker genes to be normalized by the respective individual median expression of these marker genes (per gene normalization), wherein said median expression is preferably calculated from multiple measurements of the respective gene in a sufficiently large cohort of test individuals. The test cohort preferably comprises at least 3, 10, 100, or 200 individuals.
Preferably, the calculation of the proliferation metagene expression value is performed by:
The present invention further relates to a prognostic method as defined above, wherein said second marker gene is an immune cell gene or an immune globulin gene. An “immune cell gene” shall be understood to be a gene which is specifically expressed in immune cells, most preferably in T-cells, B-cells or natural killer cells. A gene shall be understood to be specifically expressed in a certain cell type, within the meaning of the invention, if the expression level of said gene in said cell type is at least 2-fold, 5-fold, 10-fold, 100-fold, 1000-fold, or 10000-fold higher than in a reference cell type, or in a mixture of reference cell types. Preferred reference cell types are muscle cells, smooth muscle cells, or non-cancerous breast tissue cells.
Alternatively, an immune cell gene shall be understood as being a gene selected from Table 2. In preferred methods of the invention said second marker gene is selected from Table 2.
Because of the great variability in the primary sequence of immune genes it is conceived that the concept of using metagenes is particularly useful when determining the immune gene status in methods of the invention. Thus, in a preferred embodiment of the invention, the claimed methods use the information on the expression of a single proliferation marker gene (preferably selected from Table 1), but information on the expression of multiple immune genes (preferably selected from Table 2), e.g., an immune system metagene expression is applied.
In further preferred embodiments of the invention, the expression level of multiple first and second marker genes are determined in steps (b) and (d), and a comparison step between the multiple first and the multiple second marker genes is performed by a “majority voting algorithm”.
In a majority voting algorithm, according to the invention, a suitable threshold level is first determined for each individual first and second marker gene used in the method. The suitable threshold level can be determined from measurements of the marker gene expression in multiple individuals from a test cohort. Preferably, the median expression of the first said marker gene in said multiple expression measurements is taken as the suitable threshold value for the first said marker gene. Preferably, the third quartile expression of the second said marker gene in said multiple expression measurements is taken as the suitable threshold value for the second said marker gene.
In a majority voting algorithm, the comparison of multiple marker genes with a threshold level is performed as follows:
“A sufficiently large number”, in this context, means preferably 30%, 50%, 80%, 90%, or 95% of the marker genes used.
Because of the great variability in the primary sequence of immune genes it is conceived that the concept “majority voting” is particularly useful when determining the immune gene status in methods of the invention. Thus, in a preferred embodiment of the invention, the claimed methods use the information on the expression of a single proliferation marker gene (preferably selected from Table 1), but information on the expression of multiple immune genes (preferably selected from Table 2) is compared to a threshold level using a majority voting algorithm.
In specific embodiments of the invention, a single, or 2, 5, 10, 20, 50 or 100 second marker genes are used.
In preferred methods of the invention, said second marker gene is IGHG or a gene co-regulated with IGHG.
In preferred methods of the invention, said second marker gene is IGHG3 or a gene co-regulated with IGHG3.
In a preferred embodiment of the invention, an immune system metagene expression value is constructed using 2, 3, 4, 5, 10, 20, 50, or all of the genes listed in Table 2.
In a preferred embodiment of the invention, an immune system metagene expression value is constructed using 2, 3, or 4 genes from the list of IGHG, IGHG3, IGKC, IGLJ3, IGHN4.
Preferably, the calculation of an immune system metagene is done by
In preferred methods of the invention, the determination of expression levels is on a gene chip, e.g. on an Affymetrix™ gene chip.
In another preferred method of the invention, the determination of expression levels is done by kinetic real time PCR.
The present invention further relates to a system for performing methods of the current invention, said system comprising
The person skilled in the art readily appreciates that a favorable prognosis can be given if said expression level of said first marker gene with said predetermined first threshold value indicates a slow proliferating tumor. According to the invention, this is independent of the expression level determined for the second marker gene. Methods of the invention as described above can be modified accordingly.
In preferred systems of the invention, said prognosis is an estimation of the likelihood of metastasis free survival over a predetermined period of time.
In preferred methods of the invention, the expression of said first marker gene is indicative of fast proliferation of the tumor.
In preferred systems of the invention, said first marker gene is selected from Table 1.
In preferred systems of the invention, said first marker gene is TOP2A. In other preferred systems of the invention, said first marker gene is a gene co-regulated with TOP2A.
In preferred systems of the invention, said second marker gene is an immune cell gene, or is an immune globulin gene. Preferred second marker genes are expressed specifically in T-cells or in B-cells or in natural killer cells.
In preferred systems of the invention, said second marker gene is selected from Table 2. In particularly preferred systems of the invention, said second marker gene is IGHG3 or a gene co-regulated with IGHG3.
In preferred systems of the invention, the determination of expression levels is on a gene chip.
We analyzed 200 node-negative breast cancers not treated with systemic therapy using PCA, a method also described by Alter and co-workers (2000) as singular value decomposition. This method allows for extracting information from high-dimensional datasets. It is well accepted, that the top few principal components identify broad characteristics of the data (Roden et al, 2006). To ensure an optimal visualization of the tumors depending on their most important principal components (PC), we used PC 1-3. Samples are separated on PC1 predominantly according to the expression of the ER metagene. This again underlines the pivotal influence of ER for the molecular profile of breast cancer. The proliferation metagene forms another axis. All ER negative breast cancer samples are characterized by high proliferation. However, samples scored as ER positive by immunohistochemistry showed differences in both, extend of expression of ER co-regulated genes as well as in the extend of proliferation. Interestingly, tumors with intermediate ER expression showed the biggest variation in proliferative activity. High expression of proliferation associated genes in this subtype was linked with similar bad prognosis as for ER negative tumors, indicating that proliferation is the strongest outcome predictor in untreated node negative breast cancer patients. When systematically utilizing different metagenes for an explanation for the noticeable paucity of early metastases in the region with concurrent low ER and high proliferation, we detected a third axis. This axis is almost perpendicular to the proliferation axis. It is formed of the B cell metagene, containing B cell associated genes like immunoglobulins and to a lesser extent the T cell metagene, containing T cell related genes like the T cell receptor (TCR). These two metagenes are largely overlapping. In the region of high expression of these metagenes, only rare metastases occur despite high proliferation and low ER expression.
Gene expression patterns of 200 node-negative breast cancer patients which were not treated in the adjuvant setting, were recorded with the Affymetrix HG-U133A array. After performing an unsupervised hierarchical cluster analysis using 2579 genes selected for variable expression within our dataset, metagenes were constructed for the different cluster. These metagenes were then visualized in a principle component analysis (PCA). The prognostic impact was assessed with univariate statistics. The prognostic power of the method was confirmed with a previously published dataset (Wang et al, 2005).
Using unsupervised hierarchical cluster analysis, several different gene clusters were detected. These could roughly be categorized as basal-like, T-cell, B-cell, interferon, proliferation, estrogen regulated, chromosome 17 (ERBB2), stromal, normal-like (adipocyte), Jun-Fos, and transcription cluster. Visualizing ER and proliferation clusters as well as time to metastasis (TTM) with PCA showed discrete patterns which were highly reproducible in the validation cohort. Both B cell and T cell metagene yielded additional information and had significant prognostic value, in particular, in rapidly proliferating tumors. For the B cell metagene the prognostic value could be independently confirmed in the validation cohort.
We could confirm in two independent cohorts of untreated node-negative breast cancer patients, that especially the humoral immune system plays a pivotal role for the metastasis-free survival of rapidly proliferating tumors.
The population based study cohort consisted of 200 lymph-node negative breast cancer patients treated at the Department of Obstetrics and Gynecology of the Johannes Gutenberg University Mainz between 1988 and 1998. Patients were all treated with surgery and did not receive any systemic therapy in the adjuvant setting. The established prognostic factors (tumor size, age at diagnosis, steroid receptor status) were collected from the original pathology reports of the gynecological pathology division within our department. Grade was defined according to the system of Elston and Ellis.
Patients were treated either with modified radical mastectomy (n=75) or breast conserving surgery followed by irradiation (n=125) and had to be without any evidence of lymph node and distant metastasis at the time of surgery. The median age of the patients at surgery was 60 years (range, 34-89 years). The median time of follow up was 92 months. Within this follow-up period, 68 (34%) patients relapsed, of these 46 (23%) developed distant metastases. 28 (14%) patients died of breast cancer and 26 (13%) patients died of unrelated reasons.
Frozen sections were taken for histology and the presence of breast cancer was confirmed in all samples. Tumor cell content exceeded 40% in all cases. Approximately 50 mg of snap frozen breast tumor tissue was crushed in liquid nitrogen. RLT-Buffer was added and the homogenate was spun through a QIAshredder column (QIAGEN, Hilden, Germany). From the eluate total RNA was isolated by the RNeasy Kit (QIAGEN) according to the manufacturer instruction. RNA yield was determined by UV absorbance and RNA quality was assessed by analysis of ribosomal RNA band integrity on an Agilent 2100 Bioanalyzer RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, Calif.). The study was approved by the ethical review board of the medical association of Rhineland-Palatinate.
Axillary nodal status is the most important prognostic factor in patients with breast cancer. Formal axillary clearance is the best staging procedure, however, it is associated with significant morbidity. About 60% of axillary dissections show no evidence of metastatic disease. As a result, axillary sampling (removal of 4 nodes) has been proposed as an alternative means of assessing nodal status. Staging errors can occur following axillary sampling and this procedure is associated with a higher local recurrence rate. Intra-operative lymph node mapping has been suggested so as to allow identification of the first draining node (the ‘sentinel’ node) and to reduce the morbidity associated with axillary surgery. In this case the node is identified by injection of 2.5% Patent Blue dye adjacent to the primary tumour and the axilla is explored approximately 10 minutes post-injection. The sentinel node is excised and submitted for both frozen section and paraffin histological assessment. It has been shown that histological examination of this node predicted nodal status in 95% of cases. The presence of tumor cells in the histological specimen can alternatively be determined by detection of tumor cell specific nucleic acids using RT-PCR or related methods. In particular, detection of cytokeratin 19 RNA has been proposed for this purpose (Backus et al. 2005).
The Affymetrix (Santa Clara, Calif., USA) HG-U133A array and GeneChip System™ was used to quantify the relative transcript abundance in the breast cancer tissues. Starting from 5 μg total RNA labelled cRNA was prepared using the Roche Microarray cDNA Synthesis, Microarray RNA Target Synthesis (T7) and Microarray Target Purification Kit according to the manufacturer's instruction. In brief, synthesis of first strand cDNA was done by a T7-linked oligo-dT primer, followed by second strand synthesis. Double-stranded cDNA product was purified and then used as template for an in vitro transcription reaction (IVT) in the presence of biotinylated UTP. Labelled cRNA was hybridized to HG-U133A arrays at 45° C. for 16 h in a hybridization oven at a constant rotation (60 r.p.m.) and then washed and stained with a streptavidin-phycoerythrin conjugate using the GeneChip fluidic station. We scanned the arrays at 560 nm using the GeneArray Scanner G2500A from Hewlett Packard. The readings from the quantitative scanning were analysed using the Microarray Analysis Suit 5.0 from Affymetrix. In the analysis settings the global scaling procedure was chosen which multiplied the output signal intensities of each array to a mean target intensity of 500. Samples with suboptimal average signal intensities (i.e., scaling factors>25) or GAPDH 3′/5′ ratios>5 were relabeled and rehybridized on new arrays. Routinely we obtained over 40 percent present calls per chip as calculated by MAS 5.0.
A breast cancer Affymetrix HG-U133A microarray dataset including patient outcome information was downloaded from the NCBI GEO data repository (http://www.ncbi.nlm.nih.gov/geo/). The data set (GSE2034) represents 180 lymph-node negative relapse free patients and 106 lymph-node negative patients that developed a distant metastasis. None of the patients did receive systemic neoadjuvant or adjuvant therapy.
For our unpublished dataset selection of “informative” genes was done using the quality control criteria “absent” or “present” as provided by the Affymetrix software, the absolute median signal intensity and the coefficient of variation of a gene within our dataset. Genes passing the quality control filter of having a “present” call in at least 10 samples, median signal intensity above 75 and a coefficient of variation above 60% within our dataset were considered to be informative and used for subsequent analysis. For unsupervised analysis we performed average linkage hierarchical clustering on all informative genes and samples using Pearson correlation as implemented in GeneSpring 7.0 software (Agilent Technologies, USA). Principle component analysis was performed using GeneSpring 7.0. Clinical information was visualized as categorical or continues variable and relative gene expression was visualized on a relative scale from red, indicating high expression, to blue, indicating low expression. Gene groups were defined after manual selection of nodes of the gene dendrogram as suggested by the occurrence of cluster regions within the heatmap. A metagene was calculated as representative of all genes contained within one gene cluster based on the normalized expression values within the respective dataset. The genes contained within the proliferation cluster are listed in Table 1 and the genes contained within the immune gene clusters are listed in Table 2.
ROC curve was calculated for metagene 5a with 176 samples fulfilling the criteria that patients remained at least five years disease free (n=149) or developed a distant metastasis within five years (n=27) using GraphPad Prism software (ISA). Furthermore, ROC analysis was performed in a sub-cohort of Mainz samples defined by metagene 5a expression>0.99 using metagene 2 and 3 values, respectively. All identified cut off values were used for the analysis of Rotterdam samples without further adjustment. Life tables were calculated according to the Kaplan-Meier method using GraphPad Prism software. Metastasis-free survival (MFS) was computed from the date of diagnosis to the date of diagnosis of distant metastasis. Survival curves were compared with the Log-rank test. Univariate Cox survival analyses were performed using the Cox proportional hazards model. All tests were performed at a significance level of alpha=0.05. All p values are two sided.
Primary tumor tissues from 196 patients with invasive breast carcinoma as well as from four patients with DCIS were analyzed by gene expression profiling using HG U133A oligonucleotide arrays. All patients were node negative and did not receive systemic chemo- or endocrine-therapy after surgery. Details about the population based cohort are given in Table 3.
In order to identify co-regulated genes representing distinct biological processes or cell types we performed an unsupervised two dimensional hierarchical cluster analysis using 2579 genes selected for variable expression within our dataset. As seen in the resulting heat map samples as well as genes are grouped according to overall similarity in relative gene expression (
In order to obtain a clearer view on the molecular heterogeneity of node negative breast cancer we used (unsupervised) principal component analysis (PCA). Since the position of a sample within a PCA plot is determined by its gene expression values, it is of interest to investigate how the relative expression of genes, known to be of relevance for disease outcome contributes to the separation. Proliferation index, tumor grade and estrogen receptor expression have long been recognized to be correlated with disease outcome. Correspondingly, several gene expression profiling studies identified genes involved in certain steps of the cell cycle and estrogen receptor co-regulated genes to be associated with disease outcome. Since we were interested to investigate the complex interrelationships between these biological processes and a potential prognostic role of the immune system we constructed metagenes for the T-cell (metagene 2), B-cell (metagene 3), proliferation (metagene 5a) and estrogen receptor cluster (metagene 6a) by calculating the median of the normalized expression of all genes contained in each respective cluster for each sample.
In our population based cohort samples are separated on principal component 1 (PC1) predominantly according to expression of estrogen receptor 1 (ESR1) and ESR1 co-regulated genes. Accordingly, samples with highest metagene 6a expression cluster on the lower left, those with the lowest values on the lower right. Variable expression is seen in the intermediate area which broadly scatters on PC2. In particular those samples with the lowest metagene 6a values are well separated from all other tumors and appear to constitute a distinct group which may be considered the basal subtype since all samples are PGR and ERBB2 negative and most of them positive for the previously suggested basal like marker KRT5 and KRT17 (data not shown). However, based on the observation that KRT5, KRT17 and other genes proposed as basal like marker genes are expressed in tumors located in a different cluster in the upper region of the PCA these genes are not suited to unequivocally characterize this molecular subtype (data not shown). PC1 in can broadly be considered to form the estrogen receptor axis. Visualization of metagene 5a expression, as indicator of proliferation, in reveals a gradient with samples in the upper left having lowest and samples in the lower right having highest expression. A similar gradient is formed by individual well known cell cycle associated genes like MKI67, CCNE2 and others (data not shown). Therefore, the gradient can be considered to form the proliferation axis. As expected, a high correlation exists between proliferation and tumor grade (data not shown). In addition, expression profiling confirms that tumors of lobular and tubular histology are predominantly estrogen receptor positive and slowly proliferating, whereas ductal tumors highly heterogeneous regarding both. Interestingly, cancers of medullar histology cluster in a region of high proliferation and very low ESR1 expression (data not shown).
When time to distant metastasis is visualized it becomes apparent that most patients suffering an early metastasis are located in the middle and right part along the PC1- and lower part of the PC2-axis of the plot. These samples are characterized by intermediate to low metagene 6a expression and concurrent high proliferation, i.e. metagene 5a expression. Evidently, two different tumor types are less prone to metastasize, one characterized by very high metagene 6a expression and the other by intermediate metagene 6a and simultaneous low expression of metagene 5a. In a region of samples with relative high proliferation, and low metagene 6a levels a paucity of samples with distant metastasis is observed as well. Interestingly, this region is characterized by high expression of metagene 2 (T-cells) and metagene 3 (B-cells), indicating that a lymphoid infiltration in these tumor tissues might be associated with good outcome. Metagene 2 contains information from gene like T-cell receptor TRA@, TRB@ as well as several other genes preferentially expressed in T-cells, whereas metagene 3 is primarily formed by immunoglobulin heavy and light chain genes of several immunoglobulin classes like IGKC, IGHG3, IGHM. Both metagenes form another gradient within the samples in the PCA plot with an axis from the upper right to the lower left. The complete absence of lymphoid infiltrates in the group of highest metagene 6a expression results in a kind of sandwich situation in which good outcome coincides with either very high or virtually no lymphoid infiltration whereas a particular group with intermediate lymphoid infiltration has a high risk of recurrence.
Since it appears that the immune system does not play a positive role in all breast cancer subtypes we sought to identify the subgroup of patients in which the presence of immune cells is linked with an improved prognosis. From the findings above we reasoned that a protective effect of the immune system might be confined to fast proliferating tumors. Therefore, we performed a ROC analysis for metagene 5a values in order to find a suitable cut off for identification of tumors that develop a distant metastasis within five years i.e. high risk tumors (n=27) versus those that remained disease free for at least five years (n=149). The resulting area under the ROC curve was 0.744 (CI 0.631 to 0.856, p<0.0001) with 81.5% sensitivity and 56% specificity at 0.99 as cut off which classified 98 tumors into the high risk category. When we performed a Kaplan Meier survival analysis within this high risk patient sub-cohort which we now stratified according high or low expression of metagene 2 (T-cell) respectively metagene 3 (B-cell) a significant disease free survival benefit was seen for tumors with high metagene 2 expression (hazard ratio 2.77, CI 1.27 to 5.28, p=0.0088), as well as for high metagene 3 expression (hazard ratio 2.63, CI 1.26 to 3.69, p=0.0048). In order to test our hypotheses in an independent patient cohort we analyzed a public available expression dataset of node negative untreated breast cancer patients profiled by the same platform as our samples (Wang et al. 2005). A PCA plot was generated using the expression values of all 2579 genes found to be variably expressed in our dataset. Metagenes for estrogen receptor co-regulated genes, proliferation associated genes and the T-cell and B-cell clusters were calculated using the same probesets as used for the Mainz cohort. Kaplan Meier survival analysis was performed using the same cut offs as defined in our finding cohort. The chosen cut off criteria did not yield a separation of high versus low metagene 2 (T-cell cluster) expressing samples (cut off 1.35) in fast proliferating (cut off 0.99) tumors at a significant level (p=0.2). However, tumors expressing metagene 3, i.e. B-cell related genes, at a cut off above 1.95 had a significant better outcome (p=0.0048) compared with tumors expressing metagene 2 at low levels.
We could build upon these intriguing findings and were for the first time able to prove a strong association of the expression of the B cell metagene with metastasis-free survival of rapidly proliferating node-negative breast cancer. Based on the findings mentioned above, an antigen-specific humoral immune response could serve as an explanation for the improved survival of rapidly proliferating tumors in our cohort. To validate our findings in a separate cohort, we used a previously published cohort which was also analyzed with the Affymetrix Human U133 a gene chip (Wang et al, 2005). Similar to ours, this dataset consists only of untreated node-negative breast cancer patients. These features make the two datasets comparable and allow for estimation of pure prognostic effects without a possible “dilution” by predictive effects. The influence of the B cell metagene was unequivocally confirmed in this separate cohort.
In conclusion, we could confirm in two independent cohorts of untreated node-negative breast cancer patients, that especially the humoral immune system plays a pivotal role for the metastasis-free survival of rapidly proliferating tumors. Further studies are needed to clarify the precise nature of the immunological defense, its failure in certain tumors and to explain its apparent complete lack despite good outcome in others. Extending knowledge about the complex role of immune cells and their interaction in breast cancer tissues should ultimately pave the way for the long awaited successful development of therapeutics aiming at the third prognosis axis.
Number | Date | Country | Kind |
---|---|---|---|
06020209.0 | Sep 2006 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2007/060143 | 9/25/2007 | WO | 00 | 3/18/2009 |