The present invention relates to methods of assessing a propensity of the clinical outcome of a female mammal suffering from breast cancer, preferably after said female mammal has been treated with chemotherapy, for example anthracycline-based chemotherapy.
Breast cancer is the most common nonskin malignancy in women and the second leading cause of female cancer mortality (FEAR et al., IEEE Potentials, vol. 22 (1), p: 12-18, 2003).
Worldwide, breast cancer is the most common cancer in women. It is estimated than in the year 2000, there were 350.000 new breast cancer cases in Europe, while the number of deaths from breast cancer was estimated at 130.000. Breast cancer is responsible for 26.5% of all new cancer cases among women in Europe, and 17.5% of cancer deaths. The highest incidence rates for the year 2000 were in Western Europe, with France in third position (42.000 new cases and 12.000 deaths). Despite these high rates of incidence and mortality, the survival of women diagnosed with breast cancer increased in Europe and in France since the end of the 1970s. This improvement is probably in relation with early diagnosis and screening programs and with adjuvant systemic therapy.
Adjuvant chemotherapy (CT) for breast cancer has undergone major changes over the past two decades. Results from the published update of the overview analysis by the Early Breast Cancer Trialists' Collaborative group indicated that administration of adjuvant CT significantly reduced the risk of recurrence by 23.5% and the risk of death by 15.3%. According to the same overview, the 10-year recurrence-free survival for node-positive patients treated with adjuvant CT was 47.6% for patients younger than 50 years and 43.6% for those 50 to 69 years of age. The 10-year overall survival (OS) was 53.8% and 48.6% respectively. This overview analysis also demonstrated that, as compared with standard combination of cyclophosphamide, methotrexate and 5FU (CMF), regimens that contained anthracyclins reduced the annual risk of recurrence of breast cancer by 12% and the annual risk of death by 11%. Such regimens are significantly (2p=0.0001 for recurrence, 2p<0.00001 for breast cancer mortality) more effective than CMF.
The most commonly used anthracycline-based adjuvant CT regimen in USA consists of four cycles of doxorubicin plus cyclophosphamide (AC) administrated every 21 days. Six cycles of FAC (cyclophosphamide, doxorubicin, and fluorouracil) every 3 weeks were also accepted as appropriate adjuvant regimen. Since epirubicin is less cardiotoxic than doxorubicin at an equimolar dose (recommended cumulative doses of doxorubicin and epirubicin are 550 mg/m2 and 1.000 mg/m2, respectively), several groups introduced epirubicin. A National Cancer Institute of Canada study showed that six cycles of cyclophosphamide, epirubicin, fluorouracil (CEF) were superior to six cycles of CMF. The Groupe Français d'Etudes Adjuvantes (GFEA; The French Adjuvant Trial Group) has studied epirubicin in the treatment of breast cancer for several years. The FEC regimen (fluorouracil, epirubicin, cyclophosphamide) has been evaluated in the trial setting lymph node-positive patients. Six cycles of adjuvant FEC 50 (epirubicin 50 mg/m2) are better than 3 cycles. Subsequently a trial in patients less than 65 years of age, with node-positive operable breast cancer, compared FEC 50 versus FEC 100 (epirubicin 100 mg/m2). Six cycles of FEC 100 was associated with improved relapse rates and better survival. Thus, 6 cycles of FEC every three weeks were generally accepted a few years ago in France as appropriate and “standard” adjuvant regimens for early breast cancer.
Recently, taxanes have emerged as potent agents for the adjuvant treatment of breast cancer. Studies involving more than 20.000 patients have been reported or are ongoing. Recent published adjuvant trials with taxanes (paclitaxel, docetaxel) in node-positive breast cancer have demonstrated an additional benefit (as compared with regimen without taxanes), ranging from 2 to 7% in absolute difference in disease-free survival (DFS) or overall survival (OS) at 5 years. Two trials showed the benefit of incorporating sequentially 4 courses of paclitaxel after 4 cycles of AC: CALGB 9344 and NSABP B-28. Two trials showed the benefit of incorporating docetaxel: BCIRG 01 study, which compared the FAC regimen (6 cycles) to the TAC regimen (docetaxel, doxorubicin, and fluorouracil, 6 cycles), and PACS 01 study. The PACS 01 study (1.999 patients included) was promoted by the French Federation of Anti-Cancer Centers (FNCLCC). It compared the FEC 100 regimen (6 cycles) to a sequential regimen, 3 cycles of FEC100 followed by 3 cycles of docetaxel administered at the dose of 100 mg/m2 every 3 weeks in node-positive patients. At a median follow-up of 60 months, adjuvant CT with 3 cycles of FEC100 followed by 3 cycles of docetaxel improved recurrence-free survival (reduction in the hazard rate of recurrence, 17%, p=0.04) and OS (reduction in the hazard rate of death, 23% p=0.005) (13). The 5-year DFS are 78.3% (3 FEC100-3 docetaxel arm) vs 73.2% (6 FEC100 arm) and the 5-year OS are 90.7 vs 86.7 respectively. In comparison with the BCIRG study, the incidence of febrile neutropenia, infection and cardiac dysfunction is very low especially in the sequential arm. As a consequence of these trials, the combination of anthracyclin and taxane has become the new standard of adjuvant CT for node-positive breast cancer. Several other trials promoted by the FNCLCC (PACS) investigated the optimal scheme of combination eprubicin-docetaxel: the PACS 04 study compared the FEC 100 regimen (6 cycles) to the combination epirubicin 75 mg/m2+docetaxel 75 mg/m2 every 3 weeks in node-positive patients. Follow-up is ongoing with 3.015 patients included (end of inclusions in August 2004). The PACS 06 compared FEC 100×3 cycles every 2 weeks followed by docetaxel 100 mg/m2×3 cycles every 2 weeks, in association with G-CSF, with either a 2-week or a 4-week interval between FEC and docetaxel. The primary endpoint was to define the rate of patients with any toxicity requiring dose reduction or treatment delay by more than one week over the 6 courses. As May 2005, the recruitment was stopped after 74 inclusions with the following conclusion, FEC 100×3 cycles every 2 weeks followed by docetaxel 100 mg/m2×3 cycles every 2 weeks, with a 2-week interval between FEC and docetaxel is not feasible due to an excess of skin/hand-foot syndrome severe toxicities.
Currently, adjuvant CT in early breast cancer is indicated according classical prognostic factors such the axillary lymph node status, the pathological size and grading of tumour, the hormonal receptor expression, and age of patients. These factors remain insufficient for reflecting the whole heterogeneity of disease, and none of them has been validated for selecting the optimal regimen of CT, resulting in the delivery of a combination of anthracyclin-taxane to all node-positive patients. However, recent studies have shown that in sub-groups of patients the addition of taxanes did not provide benefit as compared to FAC or FEC and that these classical regimens without taxanes might provide long survival in certain patients. Altogether with the potential toxicity and cost of the combination of anthracyclin-taxane, as well as the ongoing introduction/development of new drugs in adjuvant regimens (CT such as capecitabine, targeted therapy such as trastuzumab, hormone therapy such as anti-aromatases, diphosphonates), these data call for the identification of parameters predictive of clinical outcome (prognostic and/or predictive of response to CT) after given regimen of adjuvant CT.
A lot of research, mainly retrospective, has been performed to find predictive biological factors of adjuvant CT effectiveness, but, presently, there is still no individual admitted factor. The current prognostic factors evaluate only poorly the heterogeneous clinical behavior of disease. In consequence, many N− patients are subjected to unnecessary anthracycline-based adjuvant CT, and all N+ patients receive regimens based on anthracyclines and taxanes (Piccart et al. The Breast 14:439-445, 2005). However, taxanes are not yet universally accepted as standard treatment (Colozza et al. Oncologist 11:111-125, 2006). Recent randomized studies (Buzdar et al. Clin Cancer Res 8:1073-1079, 2002; Henderson et al. J Clin Oncol 21:976-983, 2003; Mamounas et al. J Clin Oncol 23:3686-3696, 2005; Martin et al. N Engl J Med 352:2302-2313, 2005; Roche et al. J Clin Oncol 24:5664-5671, 2006) have shown that the addition of taxanes provides a significant but small benefit (3 to 7%) in 5-year survival. This suggests that a majority of patients do not benefit from the anthracycline-taxane combination. The availability of new drugs in adjuvant setting and the heterogeneity of breast cancer render necessary the tailoring of treatment without systematically associating all drugs. This challenge supposes to better assess the metastatic risk after CT. No biological factor predictive of anthracycline-based adjuvant CT efficacy (Hayes, The Breast 14:493-499, 2005) has yet been validated and introduced in routine use.
A predictive factor will be of a tremendous interest to select patients who benefit or who do not benefit from a specific regimen of adjuvant CT. Breast cancer is a complex genetic disease characterized by the accumulation of multiple molecular alterations. Pathological and clinical factors are insufficient to capture the complex cascade of events which drive the heterogeneous clinical behaviour of tumours.
High-throughput molecular technologies provide novel tools to tackle this complexity. In particular, DNA microarrays allow the simultaneous and quantitative analysis of the mRNA expression levels of thousands of genes in a single assay. The first research results are promising; comprehensive gene expression profiles of breast tumours are revealing new sub-groups of tumour in groups a priori identical, but with different outcome.
Several retrospective studies confirm the prognostic potential of DNA microarrays in breast cancer (Bertucci et al. Omics 10:429-443, 2006). Most studies focused on survival without any adjuvant systemic therapy (van de Vijver et al. N Engl J Med 347:1999-2009, 2002; van 't Veer et al. Nature 415:530-536, 2002; Wang et al. Lancet 365:671-679, 2005; Foekens et al. J Clin Oncol 24:1665-1671, 2006) after adjuvant HT (Ma et al. Cancer Cell 5:607-616, 2004; Paik et al. N Engl J Med 351:2817-2826, 2004; Oh et al. J Clin Oncol 24:1656-1664, 2006) and after neo-adjuvant CT (Sorlie et al. Proc Natl Acad Sci USA 98:10869-10874, 2001; Sorlie et al. Proc Natl Acad Sci USA 100:8418-8423, 2003). A few studies directly analyzed the response to primary CT (Ayers et al. J Clin Oncol 22:2284-2293, 2004; Bertucci et al. Cancer Res 64:8558-8565, 2004; Chang et al. Lancet 362:362-369, 2003; Hannemann et al. J Clin Oncol. 23:3331-3342, 2005). Only few data with small (Bertucci al. Lancet 360:173-174; discussion 174, 2002; Bertucci et al. Hum Mol Genet 9:2981-2991, 2000) or heterogeneous series (Pawitan et al. Breast Cancer Res 7:R953-964, 2005) are available regarding outcome after adjuvant CT. In all these studies, the prognostic and/or predictive multigenic signatures appeared more performing than individual molecular and pathoclinical parameters.
There is a need of adapting adjuvant CT in patients that are candidate to CT. The ongoing introduction of new drugs in adjuvant setting—in general associated to a low and heterogeneous benefit and a morbid and financial cost—necessitates refining the assessment of the metastatic risk after a given CT regimen and the decision regarding what CT regimen to use.
After exhausting testing we have identified gene marker sets that predict clinical outcome after CT, and methods of use thereof. This represents a step towards molecular tailoring by guiding patients towards the most beneficial CT regimen. This would allow moving away from the “one shoe fits all” strategy used in oncology for many years and from the ongoing therapeutic escalation.
The invention relates to a method for assessing the clinical outcome of a female mammal suffering from breast cancer, comprising the step of:
a) generating a metagene adjusted value underER by comparing the expression level, in a biological sample from said female mammal and in a control, of at least 10 nucleic acid sequences selected in the group comprising or consisting of: SEQ ID No:374 (nm—000212), SEQ ID No:1027 (nm—007365), SEQ ID No:598 (nm—000636), SEQ ID No:717 (nm—024598), SEQ ID No:573 (nm—001527), SEQ ID No:83 (nm—015065), SEQ ID No:12 (nm—002964), SEQ ID No:405 (nm—000852), SEQ ID No:856 (nm—005564), SEQ ID No:384 (nm—002466), SEQ ID No:167 (nm—002627), SEQ ID No:51 (nm—198433), SEQ ID No:999 (nm—145290), SEQ ID No:979 (nm—004414), SEQ ID No:2 (nm—005245), SEQ ID No:98 (nm—016267), SEQ ID No:751 (nm—002423), SEQ ID No:696 (nm—001428), SEQ ID No:1050 (BC034638), SEQ ID No:488 (nm—002979), SEQ ID No:262 (nm—005194), SEQ ID No:1020 (nm—000359), SEQ ID No:1106 (BC015969), SEQ ID No:952 (nm—003878), SEQ ID No:675 (nm—001512), SEQ ID No:289 (nm—020179), SEQ ID No:553 (nm—004701), SEQ ID No:579 (nm—001814), SEQ ID No:760 (nm—005746), SEQ ID No:805 (nm—014624), SEQ ID No:361 (nm—002906), SEQ ID No:448 (nm—198569), SEQ ID No:170 (nm—002428), SEQ ID No:878 (nm—002774), SEQ ID No:1117, SEQ ID No:612 (nm—032515), SEQ ID No:540 (nm—003159), SEQ ID No:823 (nm—000100), SEQ ID No:131 (nm—145280), SEQ ID No:705 (nm—005596), SEQ ID No:31 (nm—005558), and SEQ ID No:199 (nm—024323), fragments, derivatives or complementary sequences thereof.
Preferably, at least 20 nucleic acid sequences selected in said group, and more preferably at least 25 nucleic acid sequences selected in said group.
In one embodiment, said metagene adjusted value underER is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 20 nucleic acid sequences selected in the group consisting of: SEQ ID No:374 (nm—000212); SEQ ID No:1027 (nm—007365); SEQ ID No:598 (nm—000636); SEQ ID No:573 (nm—001527); SEQ ID No:83 (nm—015065); SEQ ID No:12 (nm—002964); SEQ ID No:405 (nm—000852); SEQ ID No:856 (nm—005564); SEQ ID No:167 (nm—002627); SEQ ID No:51 (nm—198433); SEQ ID No:98 (nm—016267); SEQ ID No:751 (nm—002423); SEQ ID No:696 (nm—001428); SEQ ID No:262 (nm—005194); SEQ ID No:1020 (nm—000359); SEQ ID No:579 (nm—001814); SEQ ID No:760 (nm—005746); SEQ ID No:805 (nm—014624); SEQ ID No:878 (nm—002774); and SEQ ID No:612 (nm—032515), fragments, derivatives or complementary sequences thereof.
In another embodiment, said metagene adjusted value underER is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 27 nucleic acid sequences selected in the group consisting of: SEQ ID No:374 (nm—000212); SEQ ID No:1027 (nm—007365); SEQ ID No:598 (nm—000636); SEQ ID No:573 (nm—001527); SEQ ID No:83 (nm—015065); SEQ ID No:12 (nm—002964); SEQ ID No:405 (nm—000852); SEQ ID No:856 (nm—005564); SEQ ID No:167 (nm—002627); SEQ ID No:51 (nm—198433); SEQ ID No:98 (nm—016267); SEQ ID No:751 (nm—002423); SEQ ID No:696 (nm—001428); SEQ ID No:262 (nm—005194); SEQ ID No:1020 (nm—000359); SEQ ID No:579 (nm—001814); SEQ ID No:760 (nm—005746); SEQ ID No:805 (nm—014624); SEQ ID No:878 (nm—002774); SEQ ID No:612 (nm—032515); SEQ ID No:384 (nm—002466); SEQ ID No:2 (nm—005245); SEQ ID No:1050 (BC034638); SEQ ID No:952 (nm—003878); SEQ ID No:361 (nm—002906); SEQ ID No:31 (nm—005558); and SEQ ID No:199 (nm—024323), fragments, derivatives or complementary sequences thereof.
b) generating a metagene adjusted value underPR by comparing the expression level, in a biological sample from said female mammal and in a control, of at least 6 nucleic acid sequences selected in the group comprising or consisting of: SEQ ID No:598 (nm—000636), SEQ ID No:1122, SEQ ID No:364 (nm—002253), SEQ ID No:387 (nm—006563), SEQ ID No:34 (nm—001229), SEQ ID No:657 (nm—000633), SEQ ID No:384 (nm—002466), SEQ ID No:451 (nm—001110), SEQ ID No:999 (nm—145290), SEQ ID No:1056 (AK126297), SEQ ID No:15 (nm—003243), SEQ ID No:1090 (AK125808), SEQ ID No:1120, SEQ ID No:12 (nm—002964), SEQ ID No:743 (nm—006875), SEQ ID No:414 (nm—000546), SEQ ID No:374 (nm—000212), SEQ ID No:711 (nm—002291), SEQ ID No:663 (nm—006928), SEQ ID No:1102 (AK124587), SEQ ID No:237 (nm—002644), SEQ ID No:60 (nm—022640), SEQ ID No:361 (nm—002906), SEQ ID No:119 (nm—004730) (or SEQ ID No:1109 (NM—002019)), SEQ ID No:167 (nm—002627), SEQ ID No:339 (nm—144970), SEQ ID No:333 (nm—145037), SEQ ID No:83 (nm—015065), SEQ ID No:330 (nm—018291), SEQ ID No:1024 (nm—030666), SEQ ID No:229 (nm—004586), SEQ ID No:925 (nm—005257), SEQ ID No:788 (nm—001005369), SEQ ID No:1104 (AK128524), SEQ ID No:1103 (BX108410), SEQ ID No:66 (nm—000416), SEQ ID No:1030 (nm—024007), SEQ ID No:1119, SEQ ID No:1068 (AK024670), SEQ ID No:241 (nm—000801), SEQ ID No:398 (nm—003084), SEQ ID No:74 (nm—000878), SEQ ID No:1087 (AK074131), SEQ ID No:955 (nm—001986), SEQ ID No:71 (nm—004633), SEQ ID No:1105 (BC072392), SEQ ID No:856 (nm—005564), SEQ ID No:231 (nm—006678), SEQ ID No:593 (nm—001511), SEQ ID No:384 (nm—002466), SEQ ID No:519 (nm—020125), SEQ ID No:579 (nm—001814), SEQ ID No:1039 (nm—006209), SEQ ID No:31 (nm—005558), SEQ ID No:327 (nm—173825), SEQ ID No:573 (nm—001527), SEQ ID No:98 (nm—016267), SEQ ID No:1059 (AK091113), SEQ ID No:886 (nm—000075), SEQ ID No:1032 (nm—005688), SEQ ID No:1091 (XM—378178), SEQ ID No:233 (nm—178155), SEQ ID No:938 (nm—003012), SEQ ID No:264 (nm—152862), SEQ ID No:546 (nm—005874), SEQ ID No:1099 (BC066343) SEQ ID No:1037 (nm—023068), SEQ ID No:550 (nm—004848), SEQ ID No:1027 (nm—007365), SEQ ID No:1005 (nm—014938), SEQ ID No:820 (nm—000593), and SEQ ID No:370 (nm—000106), fragments, derivatives or complementary sequences thereof.
Preferably, at least 10 nucleic acid sequences selected in said group, as an example at least 20 nucleic acid sequences or at least 30 nucleic acid sequences, and more preferably at least 36 nucleic acid sequences selected in said group.
In one embodiment, said metagene adjusted value underPR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 6 nucleic acid sequences selected in the group consisting of: SEQ ID No:364 (nm—002253); SEQ ID No:34 (nm—001229); SEQ ID No:657 (nm—000633); SEQ ID No:339 (nm—144970); SEQ ID No:229 (nm—004586); SEQ ID No:1119, fragments, derivatives or complementary sequences thereof.
In another embodiment, said metagene adjusted value underPR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 36 nucleic acid sequences selected in the group consisting of: SEQ ID No:364 (nm—002253); SEQ ID No:34 (nm—001229); SEQ ID No:657 (nm—000633); SEQ ID No:339 (nm—144970); SEQ ID No:229 (nm—004586); SEQ ID No:1119; SEQ ID No:387 (nm—006563); SEQ ID No:1056 (AK126297); SEQ ID No:15 (nm—003243); SEQ ID No:1120; SEQ ID No:414 (nm—000546); SEQ ID No:374 (nm—000212); SEQ ID No:711 (nm—002291); SEQ ID No:663 (nm—006928); SEQ ID No:237 (nm—002644); SEQ ID No:60 (nm—022640); SEQ ID No:119 (nm—004730); SEQ ID No:330 (nm—018291); SEQ ID No:1024 (nm—030666); SEQ ID No:925 (nm—005257); SEQ ID No:1104 (AK128524); SEQ ID No:1103 (BX108410); SEQ ID No:66 (nm—000416); SEQ ID No:1068 (AK024670); SEQ ID No:374 (nm—000212); SEQ ID No:74 (nm—000878); SEQ ID No:231 (nm—006678); SEQ ID No:593 (nm—001511); SEQ ID No:384 (nm—002466); SEQ ID No:1039 (nm—006209); SEQ ID No:327 (nm—173825); SEQ ID No:886 (nm—000075); SEQ ID No:1032 (nm—005688); SEQ ID No:264 (nm—152862); SEQ ID No:1037 (nm—023068); and SEQ ID No:1005 (nm—014938), fragments, derivatives or complementary sequences thereof.
c) generating a metagene adjusted value underEGFR by comparing the level, in a biological sample from said female mammal and in a control, of at least 10 nucleic acid sequences selected in the group comprising or consisting of: SEQ ID No:1071 (NM—001033047), SEQ ID No:254 (nm—005581), SEQ ID No:6 (nm—003225), SEQ ID No:883 (nm—000125), SEQ ID No:543 (nm—005080), SEQ ID No:681 (nm—020974), SEQ ID No:63 (nm—001002295), SEQ ID No:212 (nm—024852), SEQ ID No:635 (nm—001002029), SEQ ID No:535 (nm—003226), SEQ ID No:1125, SEQ ID No:109 (nm—000662), SEQ ID No:342 (nm—001846), SEQ ID No:927 (nm—004703), SEQ ID No:1124, SEQ ID No:124 (nm—014899), SEQ ID No:280 (nm—020764) (or SEQ ID No:1110 (nm—024522)), SEQ ID No:297 (nm—016463), SEQ ID No:791 (nm—016835), SEQ ID No:210 (nm—178840), SEQ ID No:827 (nm—152499), SEQ ID No:1064 (nm—000767), SEQ ID No:147 (nm—014675), SEQ ID No:323 (nm—001014443), SEQ ID No:106 (nm—004619), SEQ ID No:181 (nm—000848), SEQ ID No:376 (nm—057158), SEQ ID No:116 (nm—014034), SEQ ID No:252 (nm—000758), SEQ ID No:797 (nm—022131), SEQ ID No:911 (nm—000168), SEQ ID No:720 (nm—004726), SEQ ID No:889 (nm—000561), SEQ ID No:250 (nm—000930), SEQ ID No:179 (nm—004747), SEQ ID No:786 (nm—033388), SEQ ID No:177 (nm—015996), SEQ ID No:1047 (BC012900), SEQ ID No:301 (nm—004326), SEQ ID No:207 (nm—003940), SEQ ID No:936 (nm—003462), SEQ ID No:916 (nm—001453) (or SEQ ID No:1116 (nm—004040)), SEQ ID No:1052 (BX096026), SEQ ID No:159 (nm—000224), SEQ ID No:1096 (AK127274), SEQ ID No:28 (nm—021800), SEQ ID No:1054 (AK123264), SEQ ID No:25 (nm—012391) (or SEQ ID No:1108 (nm—053279)), SEQ ID No:825 (nm—024704), SEQ ID No:145 (nm—017786), SEQ ID No:491 (nm—004374), SEQ ID No:485 (nm—003834), SEQ ID No:1072 (AY007114), SEQ ID No:274 (nm—032108), SEQ ID No:258 (nm—080545), SEQ ID No:292 (nm—014371), SEQ ID No:803 (nm—183047), SEQ ID No:349 (nm—031946), SEQ ID No:1123, SEQ ID No:763 (nm—014585), SEQ ID No:438 (nm—001759), SEQ ID No:94 (nm—014315), SEQ ID No:845 (nm—001089), SEQ ID No:1084 (BX648964), SEQ ID No:734 (nm—025137), SEQ ID No:943 (nm—002141), SEQ ID No:1085 (nm—000720), and SEQ ID No:276 (nm—012202), fragments, derivatives or complementary sequences thereof.
Preferably, at least 20 nucleic acid sequences selected in said group, as an example at least 24 nucleic acid sequences or at least 30 nucleic acid sequences, and more preferably at least 37 nucleic acid sequences selected in said group.
In one embodiment, said metagene adjusted value underEGFR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 24 nucleic acid sequences selected in the group consisting of: SEQ ID No:1071 (nm—001033047); SEQ ID No:254 (nm—005581); SEQ ID No:6 (nm—003225); SEQ ID No:883 (nm—000125); SEQ ID No:543 (nm—005080); SEQ ID No:681 (nm—020974); SEQ ID No:63 (nm—001002295); SEQ ID No:212 (nm—024852); SEQ ID No:635 (nm—001002029); SEQ ID No:535 (nm—003226); SEQ ID No:1125); SEQ ID No:1124; SEQ ID No:297 (nm—016463); SEQ ID No:791 (nm—016835); SEQ ID No:827 (nm—152499); SEQ ID No:207 (nm—003940); SEQ ID No:916 (nm—001453) (or SEQ ID No:1116 (nm—004040)); SEQ ID No:1052 (BX096026); SEQ ID No:159 (nm—000224); SEQ ID No:25 (nm—012391) (or SEQ ID No:1108 (nm—053279)); SEQ ID No:845 (nm—001089); and SEQ ID No:1085 (nm—000720), fragments, derivatives or complementary sequences thereof.
In another embodiment, said metagene adjusted value underEGFR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 37 nucleic acid sequences selected in the group consisting of: SEQ ID No:1071 (nm—001033047); SEQ ID No:254 (nm—005581); SEQ ID No:6 (nm—003225); SEQ ID No:883 (nm—000125); SEQ ID No:543 (nm—005080); SEQ ID No:681 (nm—020974); SEQ ID No:63 (nm—001002295); SEQ ID No:212 (nm—024852); SEQ ID No:635 (nm—001002029); SEQ ID No:535 (nm—003226); SEQ ID No:1125; SEQ ID No:1124; SEQ ID No:297 (nm—016463); SEQ ID No:791 (nm—016835); SEQ ID No:827 (nm—152499); SEQ ID No:207 (nm—003940); SEQ ID No:916 (nm—001453) (or SEQ ID No:1116 (nm—004040)); SEQ ID No:1052 (BX096026); SEQ ID No:159 (nm—000224); SEQ ID No:25 (nm—012391) (or SEQ ID No:1108 (NM—053279)); SEQ ID No:845 (nm—001089); SEQ ID No:1085 (NM—000720); SEQ ID No:109 (nm—000662); SEQ ID No:342 (nm—001846); SEQ ID No:927 (nm—004703); SEQ ID No:280 (nm—020764) (or SEQ ID No:1110 (NM—024522)); SEQ ID No:210 (nm—178840); SEQ ID No:181 (nm—000848); SEQ ID No:116 (nm—014034); SEQ ID No:250 (nm—000930); SEQ ID No:177 (nm—015996); SEQ ID No:825 (nm—024704); SEQ ID No:145 (nm—017786); and SEQ ID No:276 (nm—012202), fragments, derivatives or complementary sequences thereof.
d) generating a score (SC) from said metagene adjusted values using a mathematical method establishing a relation between the combined metagene values and the clinical outcome of said female mammal.
In one embodiment, the mathematical method used in step d) comprises a Cox regression analysis (Wright et al., Proc. Natl. Acad. Sci. USA, vol. 100 (17), p. 9991-9996, 2003) or a CART analysis (Breiman et al Classification and Regression Trees, Chapman & Hall 1984).
In a particular embodiment, the mathematical method is a Cox regression analysis and the score (SC) is generated according to the following formula: SC=a×underER+b×underPR+c×under EGFR, wherein “a” is comprised in the interval [−6.26; +0.49], “b” is comprised in the interval [−2.65; +0.29] and “c” is comprised in the interval [−6.69; +1.65].
For example the formula is: SC=−2.90279×underER−1.47423×underPR−4.17198×under EGFR.
The invention further relates to a method for assessing the clinical outcome of a female mammal suffering from breast cancer, comprising the step of:
a) generating a metagene adjusted value underEGFR by comparing the expression level, in a biological sample from said female mammal and in a control, of at least one nucleic acid sequence selected in the group consisting of: SEQ ID No:1071 (NM—001033047), SEQ ID No:254 (nm—005581), SEQ ID No:6 (nm—003225), SEQ ID No:883 (nm—000125), SEQ ID No:543 (nm—005080), SEQ ID No:681 (nm—020974), SEQ ID No:63 (nm—001002295), SEQ ID No:212 (nm—024852), SEQ ID No:635 (nm—001002029), SEQ ID No:535 (nm—003226), SEQ ID No:1125, SEQ ID No:109 (nm—000662), SEQ ID No:342 (nm—001846), SEQ ID No:927 (nm—004703), SEQ ID No:1124, SEQ ID No:124 (nm—014899), SEQ ID No:280 (nm—020764) (or SEQ ID No:1110 (nm—024522)), SEQ ID No:297 (nm—016463), SEQ ID No:791 (nm—016835), SEQ ID No:210 (nm—178840), SEQ ID No:827 (nm—152499), SEQ ID No:1064 (NM—000767), SEQ ID No:147 (nm—014675), SEQ ID No:323 (nm—001014443), SEQ ID No:106 (nm—004619), SEQ ID No:181 (nm—000848), SEQ ID No:376 (nm—057158), SEQ ID No:116 (nm—014034), SEQ ID No:252 (nm—000758), SEQ ID No:797 (nm—022131), SEQ ID No:911 (nm—000168), SEQ ID No:720 (nm—004726), SEQ ID No:889 (nm—000561), SEQ ID No:250 (nm—000930), SEQ ID No:179 (nm—004747), SEQ ID No:786 (nm—033388), SEQ ID No:177 (nm—015996), SEQ ID No:1047 (BC012900), SEQ ID No:301 (nm—004326), SEQ ID No:207 (nm—003940), SEQ ID No:936 (nm—003462), SEQ ID No:916 (nm—001453) (or SEQ ID No:1116 (NM—004040)), SEQ ID No:1052 (BX096026), SEQ ID No:159 (nm—000224), SEQ ID No:1096 (AK127274), SEQ ID No:28 (nm—021800), SEQ ID No:1054 (AK123264), SEQ ID No:25 (nm—012391) (or SEQ ID No:1108 (nm—053279)), SEQ ID No:825 (nm—024704), SEQ ID No:145 (nm—017786), SEQ ID No:491 (nm—004374), SEQ ID No:485 (nm—003834), SEQ ID No:1072 (AY007114), SEQ ID No:274 (nm—032108), SEQ ID No:258 (nm—080545), SEQ ID No:292 (nm—014371), SEQ ID No:803 (nm—183047), SEQ ID No:349 (nm—031946), SEQ ID No:1123, SEQ ID No:763 (nm—014585), SEQ ID No:438 (nm—001759), SEQ ID No:94 (nm—014315), SEQ ID No:845 (nm—001089), SEQ ID No:1084 (BX648964), SEQ ID No:734 (nm—025137), SEQ ID No:943 (nm—002141), SEQ ID No:1085 (nm—000720), and SEQ ID No:276 (nm—012202), fragments, derivatives or complementary sequences thereof.
Preferably, said nucleic acid sequence is SEQ ID No:681 (nm—020974), fragments, derivatives or complementary sequences thereof.
Preferably, at least 10 nucleic acid sequences selected in said group, as an example at least 20 nucleic acid sequences or at least 24 nucleic acid sequences, and more preferably at least 37 nucleic acid sequences selected in said group.
In one embodiment, said metagene adjusted value underEGFR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 24 nucleic acid sequences selected in the group consisting of: SEQ ID No:1071 (nm—001033047); SEQ ID No:254 (nm—005581); SEQ ID No:6 (nm—003225); SEQ ID No:883 (nm—000125); SEQ ID No:543 (nm—005080); SEQ ID No:681 (nm—020974); SEQ ID No:63 (nm—001002295); SEQ ID No:212 (nm—024852); SEQ ID No:635 (nm—001002029); SEQ ID No:535 (nm—003226); SEQ ID No:1125); SEQ ID No:1124; SEQ ID No:297 (nm—016463); SEQ ID No:791 (nm—016835); SEQ ID No:827 (nm—152499); SEQ ID No:207 (nm—003940); SEQ ID No:916 (nm—001453) (or SEQ ID No:1116 (nm—004040)); SEQ ID No:1052 (BX096026); SEQ ID No:159 (nm—000224); SEQ ID No:25 (nm—012391) (or SEQ ID No:1108 (NM—053279)); SEQ ID No:845 (nm—001089); and SEQ ID No:1085 (NM—000720), fragments, derivatives or complementary sequences thereof.
In another embodiment, said metagene adjusted value underEGFR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 37 nucleic acid sequences selected in the group consisting of: SEQ ID No:1071 (nm—001033047); SEQ ID No:254 (nm—005581); SEQ ID No:6 (nm—003225); SEQ ID No:883 (nm—000125); SEQ ID No:543 (nm—005080); SEQ ID No:681 (nm—020974); SEQ ID No:63 (nm—001002295); SEQ ID No:212 (nm—024852); SEQ ID No:635 (nm—001002029); SEQ ID No:535 (nm—003226); SEQ ID No:1125; SEQ ID No:1124; SEQ ID No:297 (nm—016463); SEQ ID No:791 (nm—016835); SEQ ID No:827 (nm—152499); SEQ ID No:207 (nm—003940); SEQ ID No:916 (nm—001453) (or SEQ ID No:1116 (nm—004040)); SEQ ID No:1052 (BX096026); SEQ ID No:159 (nm—000224); SEQ ID No:25 (nm—012391) (or SEQ ID No:1108 (NM—053279)); SEQ ID No:845 (nm—001089); SEQ ID No:1085 (NM—000720); SEQ ID No:109 (nm—000662); SEQ ID No:342 (nm—001846); SEQ ID No:927 (nm—004703); SEQ ID No:280 (nm—020764) (or SEQ ID No:1110 (NM—024522)); SEQ ID No:210 (nm—178840); SEQ ID No:181 (nm—000848); SEQ ID No:116 (nm—014034); SEQ ID No:250 (nm—000930); SEQ ID No:177 (nm—015996); SEQ ID No:825 (nm—024704); SEQ ID No:145 (nm—017786); and SEQ ID No:276 (nm—012202), fragments, derivatives or complementary sequences thereof.
b) generating a metagene adjusted value overEGFR by comparing the expression level, in a biological sample from said female mammal and in a control, of at least one nucleic acid sequences selected in the group consisting of SEQ ID No:405 (nm—000852), SEQ ID No:374 (nm—000212), SEQ ID No:1122, SEQ ID No:598 (nm—000636), SEQ ID No:262 (nm—005194), SEQ ID No:1099 (BC066343), SEQ ID No:696 (nm—001428), SEQ ID No:1059 (AK091113), SEQ ID No:751 (nm—002423), SEQ ID No:1121, SEQ ID No:286 (nm—002417), SEQ ID No:244 (nm—199002), SEQ ID No:18 (nm—001880), SEQ ID No:121 (nm—014553), SEQ ID No:1107 (BC073775), SEQ ID No:103 (nm—003619), SEQ ID No:1118, SEQ ID No:42 (nm—000757), and SEQ ID No:1067 (AK123784), fragments, derivatives or complementary sequences thereof.
Preferably, said nucleic acid sequence is SEQ ID No: 1107 (BC073775) or SEQ ID No: 1099 (BC066343), fragments, derivatives or complementary sequences thereof.
More preferably, at least 5 nucleic acid sequences selected in said group, as an example at least 10 nucleic acid sequences, and more preferably at least 12 nucleic acid sequences selected in said group.
In one embodiment, said metagene adjusted value overEGFR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 5 nucleic acid sequences selected in the group consisting of: SEQ ID No:1122; SEQ ID No:598 (nm—000636); SEQ ID No:696 (nm—001428); SEQ ID No:1059 (AK091113); and SEQ ID No:121 (nm—014553), fragments, derivatives or complementary sequences thereof.
In another embodiment, said metagene adjusted value overEGFR is generated by comparing the expression level, in a biological sample from said female mammal and in a control, of the 12 nucleic acid sequences selected in the group consisting of: SEQ ID No:1122; SEQ ID No:598 (nm—000636); SEQ ID No:696 (nm—001428); SEQ ID No:1059 (AK091113); SEQ ID No:121 (nm—014553); SEQ ID No:262 (nm—005194); SEQ ID No:1099 (BC066343); SEQ ID No:751 (nm—002423); SEQ ID No:1121; SEQ ID No:286 (nm—002417); SEQ ID No:103 (nm—003619); and SEQ ID No:1118, fragments, derivatives or complementary sequences thereof.
c) generating a score (SC) from said metagene adjusted values using a mathematical method establishing a relation between the combined metagene values and the clinical outcome of said female mammal.
In one embodiment, the mathematical method used in step c) comprises a Cox regression analysis or a CART analysis.
In another embodiment, the mathematical method is a Cox regression and the score (SC) to the following formula: SC=a×overEGFR+b×underEGFR, wherein “a” is comprised in the interval [−1.85; +0.81] and “b” is comprised in the interval [−3.86; +0.70]
For example the formula is: SC=−1.33×over EGFR×2.28×under EGFR.
The invention further relates to a method of assessing the clinical outcome of a female mammal suffering from breast cancer, comprising the steps of:
a) generating a metagene adjusted value underER by comparing the expression level, in a biological sample from said female mammal and in a control, of at least two genes, e.g. by using nucleic acid sequences selected in the group of Affymetrix® Probe Sets, of table IX or XII, preferably table XII (described below),
b) generating said metagene adjusted value underPR by comparing the expression level, in a biological sample from said female mammal and in a control, of at least two genes, e.g. by using nucleic acid sequences selected in the group of Affymetrix® Probe Sets, of table X or XIII, preferably table XIII (described below),
c) generating said metagene adjusted value underEGFR by comparing the expression level, in a biological sample from said female mammal and in a control, of at least two genes, e.g. by using the nucleic acid sequences selected in the group of Affymetrix® Probe Sets, of table XI or XIV, preferably table XIV (described below),
d) generating a score (SC) from said metagene adjusted values using a mathematical method establishing a relation between the combined metagene values and the clinical outcome of said female mammal.
In one embodiment, the mathematical method used in step d) comprises a Cox regression or CART analysis.
In another embodiment, the mathematical method used in step d) is a Cox regression and the score (SC) is generated according to the following formula: SC=a×underER+b×underPR+c×under EGFR, wherein “a” is comprised in the interval [−6.26; +0.49], “b” is comprised in the interval [−2.65; +0.29] and “c” is comprised in the interval [−6.69; +1.65].
For example, the formula is: SC=−2.90279×underER−1.47423×underPR−4.17198×under EGFR.
Preferably, the comparing of expression level at each step a), b) and c) is performed with at least 5, preferably 10, preferably all of said genes or nucleic acid sequences of each respective group.
In various embodiments, said methods may comprise the first step of quantifying in a biological sample from said female mammal the expression level of said nucleic acids sequences.
In other various embodiments, these methods can comprise the step e) of comparing said score (SC) from the biological sample with a baseline or a score (SC) from a control sample.
In other various embodiments, said biological sample is a breast tumor sample. By “sample” is meant a cell or a tissue.
In other various embodiments, said methods further comprise a step of taking at least one biological sample from said female mammal.
In another embodiment, said methods comprise a step of administrating a pharmaceutical treatment, preferably a chemotherapy treatment to a female mammal, for optimizing the clinical outcome of said female mammal in response to said treatment. The pharmaceutical treatment may comprise the use of one or more taxane compounds, e.g., docetaxel or paclitaxel. This treatment may be administered if the female mammal has not responded to a previous anti-cancer treatment, e.g., a treatment comprising the use of one or more anthracyclin compound, e.g., epirubicin, doxorubicin, pirarubicin, idarubicin, zorubicin or aclarubicin, preferably epirubicin.
In a further aspect, the methods according to the invention may be used for identifying a female mammal that has not responded to a previous anti-cancer treatment, e.g., a treatment comprising the use of one or more anthracyclin compound, e.g., epirubicin, doxorubicin, pirarubicin, idarubicin, zorubicin or aclarubicin, preferably epirubicin.
In other various embodiments, a comparison of or analysis of data may involve a statistical computer mediated analysis. Also, said methods may optionally further involve generating a printed report.
The invention further relates to a computer program comprising instructions for performing said methods.
Finally, the invention relates to a recording medium for recording said computer program.
Unless otherwise noted, technical terms are used according to conventional usage.
In order to facilitate review of the various embodiment of the invention, the following explanation of specific terms is provided:
Mammals corresponds to animals such as humans, mice, rats, guinea pigs, monkeys, cats, dogs, pigs, horses, or cows, preferably to humans, and most preferably to women;
Biological sample: any biological material, such as a cell, a tissue sample, or a biopsy from breast cancer.
A “Metagene” as used herein corresponds to a group of genes for which expression variation (but not necessarily expression level) across tumors is correlated. A metagene can be simply calculated by one of skill in the art according to the method as described in the examples.
A “Control” as used herein corresponds to one or more biological samples from a cell, a tissue sample or a biopsy from breast. Said control may be obtained from the same female mammal than the one to be tested or from another female mammal, preferably from the same specie, or from a population of females mammal, preferably from the same specie, that may be the same or different from the test female mammal or subject. Said control may correspond to a biological sample from a cell, a cell line, a tissue sample or a biopsy from breast cancer. Preferably, the expression of EGFR, RE, PR and/or KI-67 has been established for this biological sample, by IHC (ImmunoHistoChemistry) FISH (Fluorescence In Situ Hybridization) or Quantitative PCR, for example.
In silico research: Literally referring to “in computer” systems, in silico research involves methods to test biological models, drugs, and other interventions using computer models rather than laboratory (in vitro) and animal (in vivo) experiments. In silico methods can involve analyzing an existing database, for instance a database that includes one or more records that include quantitative analysis of nucleic acid sequence expression. Analysis of such databases may include mining, parsing, selecting, identifying, sorting, or filtering of the data in the database. Data in the database can also be subjected to a clustering algorithm, discrimination algorithm, difference test, correlation, regression algorithm or other statistical modeling algorithm.
Using in silico research, drug treatment can be selected, tested and validated, and experimental strategies can be assessed. In silico systems complement laboratory-based research, yet increase productivity and efficiency by minimizing the need for in vitro and in vivo laboratory experiments.
In certain embodiments provided herein, in silico systems are used. In particular, this disclosure provides in silico methods for assessing a condition related to the clinical outcome of a female mammal suffering from breast cancer. Such methods involve assessing data in a database. The data in the database usually includes a quantity of nucleic acids from a biological sample from one or more individuals.
Quantitative data as discussed herein include molar quantitative data or relative data (variation of expression compared to control) for individual nucleic acid sequences, or subsets of nucleic acid sequences. Quantitative aspects of nucleic acids samples may be provided and/or improved by including one or more quantitative internal standards during the analysis, for instance one control nucleic acid sequence. Internal standards described herein enable true quantification of each nucleic acid sequence expression.
Truly quantitative data can be integrated from multiple sources (whether it is work from different labs, samples from different subjects, or merely samples processed on different days) into a single seamless database, regardless of the number of nucleic acid sequences measured in each discrete, individual analysis.
In any of the provided methods, a comparison of or an analysis involves a statistical or computer-mediated analysis.
The mathematical model (or method) for establishing a relation between the combined metagene adjusted values is realized on a population of mammal females showing the same ethnic and the same breast cancer characteristics than the female mammal to be tested.
The metagene coefficients (a, b, c) in the formulas used to calculate the scores (SC) may vary according to the used tumor samples database consisting of mammal females showing the same ethnic and the same characteristics. A skilled person may calculate these coefficients by using a so-called Cox regression as described in Wright et al. (Proc. Natl. Acad. Sci. USA, vol. 100 (17), p. 9991-9996, 2003)
Optionally, in some of the provided embodiments, the methods further involve comparing the score (SC) from the female mammal to the score (SC) from another female mammal, preferably from the same specie, or a compiled score (SC) from a population of females mammal, preferably from the same specie, that may be the same or different from the test female mammal or subject.
In specific examples of such methods, the control is a baseline corresponding to a score (SC) established from a population of females mammal.
The baseline is simply determined by one of skill in the art in view of the protocol described in the examples. An optimal baseline is obtained by using score distribution separating tumors into two groups of most significant different outcome.
As an example (described below), the inventors have established that a woman having a score (SC) of more than 0.136 have at least a double propensity of poor clinical outcome than a woman with a score (SC) of less than 0.0393.
Any of the provided method can further involve generating a printed report, for instance a report of some or all the data, of some or all the conclusions drawn from the data, or of a score or comparison between the results of a subject or individual and other individuals or a control or baseline.
There are many ways to collect quantitative or relative data on nucleic acids sequences, and the analytical methodology does not affect the utility of nucleic acids sequences expression in assessing the clinical outcome of a female mammal suffering from breast cancer. Methods for determining quantities of nucleic acids expression in a biological sample are well known from one of skill in the art. As an example of such methods, one can cite northern blot, cDNA array, oligo arrays or quantitative Reverse Transcription-PCR.
Preferably said methodology is cDNA arrays or oligo arrays, which allows the quantitative study of numerous candidate genes mRNA expression levels.
DNA arrays consist of large numbers of DNA molecules spotted in a systematic order on a solid support or substrate such as a nylon membrane, glass slide, glass beads or a silicon chip. Depending on the size of each DNA spot on the array, DNA arrays can be categorized as microarrays (each DNA spot has a diameter less than 250 microns) and macroarrays (spot diameter is grater than 300 microns). When the solid substrate used is small in size, arrays are also referred to as DNA chips. Depending on the spotting technique used, the number of spots on a glass microarray can range from hundreds to thousands.
Typically, a method of monitoring gene expression by DNA array involves the following steps:
a) obtaining a polynucleotide sample from a subject; and
b) reacting the sample polynucleotide obtained in step (a) with a probe immobilized on a solid support wherein said probe consist of polynucleotides having the nucleic acids sequence as previously described, fragments, derivative or complementary sequence thereof.
c) detecting the reaction product of step (b).
In the present invention, the term “polynucleotide” refers to a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. A polynucleotide in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.
In the present invention, the term “fragment” refers to a sequence of nucleic acids that allows a specific hybridization under stringent conditions, as an example more than 10 nucleotides, preferably more than 15 nucleotides, and most preferably more than 25 nucleotides, as an example more than 50 nucleotides or more than 100 nucleotides.
In the present invention, the term “derivative” refers to a sequence having more than 80% identity with an identified nucleic acid sequence, preferably more than 90% identity, as an example more than 95% identity, and most particularly more than 99% identity.
In the present invention, the term “immobilized on a support” means bound directly or indirectly thereto including attachment by covalent binding, hydrogen bonding, ionic interaction, hydrophobic interaction or otherwise.
The polynucleotide sample isolated from the subject and obtained at step (a) is RNA, preferably mRNA. Said polynucleotide sample isolated from the patient can also correspond to cDNA obtained by reverse transcription of the mRNA, or a product of ligation after specific hybridization of specific probes to mRNA or cDNA.
Preferably, the polynucleotide sample obtained at step (a) is labeled before its reaction at step (b) with the probe immobilized on a solid support. Such labeling is well known from one of skill in the art and includes, but is not limited to, radioactive, colorimetric, enzymatic, molecular amplification, bioluminescent, electrochemical or fluorescent labeling.
Advantageously, the reaction product of step (c) is quantified by further comparison of said reaction product to a control sample.
Detection preferably involves calculating/quantifying a relative expression (transcription) level for each nucleic acids sequence.
Then, the determination of the relative expression level for each nucleic acid sequences previously described enables to assess the clinical outcome of the subject—i.e. female mammal—suffering from breast cancer by the method of the invention.
The method of assessing the clinical outcome of a female mammal suffering from breast cancer can further involve a step of taking a biological sample, preferably breast cancer tissue or cells from a female mammal. Such methods of sampling are well known of one of skill in the art, and as an example, one can cite surgery.
The provided method may also correspond to an in vitro method, which does not include such a step of sampling.
Also provided are methods to determine if a pharmaceutical treatment, especially chemotherapy treatment, influences the clinical outcome of a female mammal suffering from breast cancer, which methods involve quantifying said nucleic acids sequences expression in a biological sample from a female mammal and determining the score (SC) for said female mammal.
Further embodiments are methods to assess or identify a therapeutic or pharmaceutical agent for its potential effectiveness, efficacy or side effects relating to the clinical outcome, which methods involve quantifying said nucleic acids sequences in a biological sample from a female mammal suffering from breast cancer and determining the score (SC) for said female mammal.
Also provided herein are methods of assessing a change in the propensity of clinical outcome from a female mammal suffering from breast cancer, wherein the methods involve taking at least two biological samples from the female mammal, one of which is taken before and one after an event. In various specific embodiments, the event involves passage of time (e.g., minutes, hours, days, weeks, months, or years), treatment with a therapeutic agent (or putative or potential therapeutic agent), treatment with a pharmaceutical agent (or putative or potential pharmaceutical agent).
One specific provided embodiment is a method of determining whether or to what extent a condition influences the clinical outcome of a female mammal suffering from breast cancer. This method involves subjecting a subject to the condition, taking a biological sample from the subject, analyzing the biological sample to produce a score (SC) for said subject, and comparing said score (SC) for the subject with a control. From this comparison, conclusions are drawn about whether or to what extent the condition influences the clinical outcome of female mammal suffering from breast cancer based on differences or similarities between the test score (SC) and the control. As contemplated for this embodiment, a condition to which the subject is subjected can include but is not limited to application of a pharmaceutical or therapeutic agent or candidate agent.
Subject: a female mammal.
In specific examples of such methods, the nucleic acids sequences expression profile is a pre-condition score (SC) from the subject or a compiled score (SC) assembled from a plurality of individual score (SC). In other examples, the control score (SC) is a control or a baseline established from previously described control score (SC).
Pharmaceutical treatment: any agent treatment, regimen, or dosage, such the administration of a protein, a peptide (e.g., hormone), other organic molecule or inorganic molecule or compound, or combination thereof, that has or should have beneficial effects on clinical outcome when properly administrated to a subject, preferably said agents are used in chemotherapy.
Unless otherwise explained, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including explanations of terms, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
In various embodiments, the provided methods further comprise the step of selecting the pharmaceutical treatment that improves the clinical outcome of a female mammal suffering from breast cancer.
The present invention will be understood more clearly on reading the description of the experimental studies performed in the context of the research carried out by the applicant, which should not be interpreted as being limiting in nature.
While it is now possible to assess patients' responses to drugs with respect to their genomic profile, the standard adjuvant chemotherapy (anthracyclines and taxanes) for non metastatic breast cancer may not be systematically appropriate: according to their genomic profile, women may rather benefit from a treatment based on anthracyclines alone without taxane.
The primary objective was to identify a gene set, which discriminate two groups of patients with different clinical outcome based on gene expression. This goal was reached by: defining the gene expression profiles, using 9.000-genes microarrays, of 323 tumours obtained from patients treated with adjuvant anthracycline-based CT without taxanes (identification set), grouping individual genes in metagenes and identifying metagenes closely correlated with the biological status of ER, PR, HER2/Neu, MIB/KI67, EGFR status of the sample as determined by the mean of independent methods such as Immunohistochemistry or FISH. Then we combined these metagenes using a Cox proportional hazard ratio analysis to separate patients according to clinical outcome. This latter step providing a model consisting of a score expressed as a linear combination such as Score=Σβi.xi where βi.is a fixed parameter and xi is the value of the metagene.
The secondary objective was to prospectively validate the Cox model and its metagene component for predicting clinical outcome in an independent cohort of patients (validation set). This goal was reached by defining the gene expression profiles of 164 tumours, using the same technology, obtained from patients treated with adjuvant anthracycline-based CT without taxanes in the context of a multicentric clinical trial.
We profiled a multicentric and retrospective series of 504 early breast cancers (Institut Paoli Calmettes, Centre Léon Bérard, Institut Bergonié and tumours from clinicals trials PACS01 and PEGASE01) treated with adjuvant anthracycline-based and non taxane-based CT. Clinical and pathological criteria for each patient are summarized in the following table and correspond to the identification and the validation sets.
Global population demography:
Identification Set demography (IPC, Lyon, Total):
Validation Set demography (PACS01, Bordeaux, total):
Radio-labeled [A33P]-dCTP cDNA probes are obtained by reverse transcription from 3 μg of total RNA. Probes are then hybridised on IPSOGEN's 10K DiscoveryChip™, consisting of nylon membranes containing 9600 spotted cDNA (Discovery™ platform).
Following hybridization, membranes are washed and exposed to phosphor-imaging plates, then scanned with a Fuji-BAS 5000 machine. Signal intensities are quantified using the Fuji ArrayGauge v1.2 program, and the resulting raw data are analysed.
Raw data are exported from Ipsogen database. Spots for which spotted DNA amount is too low are invalidated from further analysis. Data are then normalized as compared to a reference sample using a non-linear rank based method (Sabatti et al., 2002). Normalized data are then filtered to eliminate low intensity genes, for which expression level is comparable to non-specific signal and the measure highly uncertain.
Data quality controls are performed based on hierarchical clustering grouping samples and genes according to their profile similarity. Biological pertinence of samples and genes clusters insures good quality data and allow for further analysis.
Since we analysed several samples series we performed a supplementary data normalization to insure inter-series comparability. Comparability was checked by hierarchical clustering.
We performed supervised analysis using MaxT method available on Bioconductor (Ge, Dudoit & Speed, 2003) for several phenotypic markers: ER, PR, HER2/Neu, MIB/KI67, EGFR. The five markers were all measured by standard immunohistochemistry (IHC).
Supervised analyses were performed on a 159 samples identification set for ER, PR, HER2/Neu and EGFR markers, and on a 114 samples identification set for MIB/KI67. Each identified signature was then validated on one to four independent datasets.
Validation consisted in status prediction for independent samples using the LPS method (Linear Predictor Score) (Wright et al., PNAS, 2003, vol. 100, no. 17, 9991-9996). Prediction of all independent samples allowed for sensitivity and specificity evaluation for each identified signature.
We considered as a metagene a group of genes for which expression variation (but not necessarily expression level) across tumors is correlated. The assumption is that the error made on the measurement of expression level from a single gene is highly reduced when considering several genes. So even in the case that an individual gene is poorly measured, its contribution in the metagene value is weighted by the number of genes considered and the final value for the metagene is lowly affected.
Metagenes were calculated from both supervised and unsupervised data.
Metagenes from phenotypic signatures: Phenotypic signatures correspond to genes correlated with a given phenotypic marker assessed by current standards such as immunohistochemistry (IHC) or FISH. A gene is considered correlated by a modified t test (MaxT method) which tests the significance of differential expression with a 5% risk. Each phenotypic signature is composed of two gene subsets, which expression levels are anti-correlated. One group of gene is overexpressed in a group of tumours (for example ER+ tumours) while the other group is underexpressed in the same group of tumours. Although expression variation is correlated across samples, expression levels may vary between genes, then leading to non robust average expression. It is assumed that even if expression levels vary, differential expression according to a reference sample belongs to the same dynamic range for all genes, allowing average calculation. For each tumour, each gene measure is divided by the expression level of the gene in a reference sample (log ratio) and the corresponding metagene is the average of those log ratios.
Each signature allowed the calculation of two anti-correlated metagenes. For instance, ER signature gives 2 metagenes, underER (genes under expressed in ER+ tumours) and overER (genes over expressed in ER+ tumours).
Metagenes from unsupervised analyses: we also defined metagenes as groups of genes with correlated expression variation across samples based on hierarchical clustering on a 468 samples set. A group of genes was retained if it contained at least 5 genes and had a node correlation coefficient higher than 0.5. Groups of genes that corresponded to previously identified metagenes by supervised analysis were not further considered. Metagenes were obtained as the mean of the log ratios of the genes contained in a given group.
Since we failed to identify any robust gene signature based on classical supervised analysis for the metastasis, it seems that obviously a single set of correlated genes is not able to predict metastasis.
The biostatistic approach was then based on survival analysis, and the objective was, instead of separating metastasis from non metastasis patients, to identify two groups of patients with significantly different outcome. The event considered is the metastasis without considering any previous event such as local relapse.
Model calculation: We used the Cox regression to identify a combination of metagenes able to add prognostic information to already existing prognostic factors, such as SBR grade, tumour size, or lymph node involvement. Cox proportional hazard ratio analysis consists in the calculation of a likelihood function, which gives for a patient the probability to observe the event at a given time (death, metastasis), knowing that he survived until this time. The likelihood function is independent of time, and takes into account a “baseline” risk which is common to every patient, and the risk which is associated to different explanatory variables (which values differ between patients). The baseline risk function is unknown and eliminated as far as ratios between patients are considered. Then, the log-likelihood is defined as a linear function of explanatory variables, each one being appropriately weighted by a given coefficient. The coefficients are estimated by the algorithm to maximize the log-likelihood function.
For this, we use a forward stepwise approach to select the most significant metagenes, the threshold p-value being fixed to 10%. To obtain a model dependant on metagenes information and not influenced by already known clinical parameters, the analysis was stratified on the clinical parameters SBR grade, tumour size and lymph node synthesized in a single parameter, the NPI (Nottingham Prognostic Index). Moreover, since the identification set was composed of patients originating from different anti-cancer centers, we also stratified the analysis on the center of origin.
Once a combination of metagenes was obtained we calculated for each patient a score based on the linear combination of the metagenes values weighted by the coefficient calculated by the algorithm for each metagene. The exponential value of the coefficient corresponds to the hazard ratio associated to the metagene. For each parameter estimation, the algorithm gives the 95% confidence interval. Hence any combination of values comprised in the confidence intervals can be used to separate patients into significantly different prognostic groups.
Prognostic groups determination: The distribution of the scores in the identification set was used to determine the most significant cut-off to separate patients into two groups of different outcome. We tested three thresholds, 1st, 2nd, and 3rd quartile, and performed in each case the logrank test to compare the two groups of patients. We used a step by step approach to define the optimized threshold, testing all score values as a potential threshold.
The cut-off was the one for which the p value associated to the log rank test was the most significant.
Validation on an independent validation set: for each patient of the validation set, we calculated the score and separated the patients into two prognostic groups using the coefficients and the threshold determined on the identification set. The score was calculated without considering the outcome (DFS-Disease Free Survival) of individual patients.
The validation was appreciated by the p value of the log rank test, which has to be <5% to consider the model validated.
We verified that the identified model effectively added relevant information as compared to standard parameters by performing multivariate Cox analyses which integrate clinical parameters and the model.
Sample prediction: For any new sample to be predicted raw data are normalized according to the reference sample previously defined and metagenes are calculated. The formula calculated on the identification set is then applied to the new sample, allowing the attribution of a specific score to each sample. The score is compared to the threshold optimized from the identification procedure and the patient is declared to belong to the good prognosis group if its score is lower or equal to the threshold and to the poor prognosis group if its score is higher than the threshold.
We started from 9 metagenes calculated from supervised analyses, and 17 metagenes from unsupervised analysis.
A first analysis based on the correlation between metagenes and robustness reduced the potential candidates to 19 metagenes, 7 from supervised analysis and 12 from unsupervised analysis.
Each metagene was first tested in a univariate Cox analysis, and none of them could be found significant alone as shown in the following table.
Multivariate Cox analyses allowed identification of significant metagenes and combinations thereof associated with prognosis. The constituents of the selected metagenes and these combinations are described hereafter.
The Cox analysis using forward stepwise procedure identified the three following significant metagenes (underER, underPR and underEGFR) associated with good or poor prognosis.
Multivariate Cox analysis allowed estimation of parameters corresponding to each of the selected metagenes:
On the basis of these parameters, the score for prognosis has been established as follows:
Score=−2.90279*underER−1.47423*underPR−4.17198*underEGFR
Threshold optimization: we tested all the possible thresholds. As an example 1st, 2nd and 3rd quartile of the score distribution of the training set and found 0.502, 0.0057 and <0.0001 respectively for the p value associated to the log rank test.
The 3rd quartile (cut-off=0.087646) was then defined as the optimal cut-off to separate patients into two groups with the highest significance.
The error on the score was integrated by calculating a confidence interval around the threshold, within which sample classification was considered non robust. Considering the score distribution Gaussian, we estimated the confidence interval around the threshold using standard deviation calculation method (estimated standard deviation of the population/√n).
The inventors have established that a woman having a score (SC) of more than 0.136 have at least a double propensity of poor clinical outcome than a woman with a score (SC) of less than 0.0393.
Model validation: the score was calculated for each of the 164 patients from the validation set and we separated the patients into two groups according the cut-off determined on the identification set. On the 164 patients, the model was well validated (p=4.7 10−02, log rank test) and separated the patients into a good-prognosis group with 80% 5-year MFS (84% of patients) and a poor-prognosis group with 63% 5-year MFS (13% of patients), 3% of patients being not interpretable. On a subset of the validation set, constituted of the clinical trial PACS01 (N=128), we obtained similar validation (p=3.9 10−03, logrank test) with 88% of 5-year MFS in the good-prognosis group (80% of patients) and 65% of 5-year MFS in the poor-prognosis group (16% of patients, 4% of patients not interpretable).
Model performances: we performed multivariate analysis to determine the importance of the model as compared to standard clinical parameters. Even when considering grade, lymph node, ER status, age . . . , the model was still significant in the multivariate analysis, suggesting that it provides an independent, complementary and significant prognostic information.
Multivariate analysis on the global population (N=347)
Multivariate analysis on the identification set (N=222)
Multivariate analysis on the PACS01 clinical trial (N=108)
Metagenes Reduction:
In this model with underER, underPR and underEGFR, we defined the number of genes according to their significance in the metagene identification with the MaxT method. Even if the genes are well correlated between each other, some of them may be removed from further analysis, in order to reduce the number of genes to analyze and simplify the analysis process.
We calculated the correlation between each gene composing the metagene and the metagene, sorted the genes according to their increasing correlation to the metagene and progressively eliminated the genes the least correlated to the metagene, starting from 1 removed gene to all except one removed genes.
For each of these new sets of genes, we calculated a new metagene and its correlation with the original metagene. We selected given correlation cut-offs varying from 0.91 to 0.99 and integrated the corresponding new metagene in the model. This allowed us to generate a new score and prognostic group for each patient and to compare the attribution of a given prognostic group between the original model and the model with the optimized metagene. The criterion was equivalence between the 2 patients classification (with the original model and the optimized one) within the 2 prognostic groups.
As an example, we can reduce the number of genes from the metagene underER from 42 to 27 (Table I), while keeping 97% of equivalence (meaning that only 3% of patients are predicted in the opposite prognostic group when optimizing the metagene) for patient classification in the two prognostic groups on the validation set. With 20 genes (Table I), the concordancy is still of 95%.
In the same way, the metagene underPR may be reduced from 73 to 35 (Table II) and 6 genes (Table II) with 96% and 94% equivalence respectively for patient classification in the validation set.
The metagene underEGFR may be reduced from 71 to 34 (Table III) and 22 genes (Table III) with 95% and 91% concordancy respectively for patient classification in the validation set.
Considering optimization of the 3 metagenes, we reached on the validation set a concordancy of 91% and 90% with 102 and 50 genes respectively instead of the 186 genes used in the original model.
Since ER and EGFR markers are correlated, with the majority of EGFR+ being ER−, we found another combination that could replace the metagenes underER and underPR by a single metagene overEGFR.
Multivariate Cox analysis allowed estimation of parameters corresponding to each of the selected metagenes:
On the basis of these parameters, the score for prognosis has been established as follows:
Score=−1.33*overEGFR−2.28*underEGFR
Threshold optimization: the 3rd quartile was selected (cut-off=0.14) associated with a [0.103-0.177] confidence interval, separating patients into two groups of 79% 5years MFS in the good prognosis group and 60% of 5 years MFS in the poor prognosis group (p=0.041, logrank test).
Model validation: we calculated the score for the 164 patients of the validation set with the formula identified on the training set, and separated the patients according to the defined threshold. The model was well validated (p=1.1 10−03, log rank test), with 82% MFS at 5 years in the good prognosis group (76% of patients), and 54% MFS in the poor prognosis group (20% of patients, 5% of patients not interpretable). On a subset of the validation set, constituted of the clinical trial PACS01 (N=128), we obtained similar validation (p=2.9 10−03, logrank test) with 87% of 5-year MFS in the good-prognosis group (75% of patients) and 60% of 5-year MFS in the poor-prognosis group (19% of patients, 6% of patients not interpretable).
Model performances: we performed multivariate analysis to determine the importance of the model as previously.
Multivariate analysis on the global population (N=347)
Multivariate analysis on the training set (N=222)
Multivariate analysis on the PACS01 clinical trial (N=108)
Metagenes Reduction:
We optimized the number of genes to analyse in underEGFR and overEGFR signature as described previously for the other metagenes.
The metagene overEGFR could be reduced from 19 to 12 (Table IV) or 5 genes (Table IV) with a concordancy of 96% and 94% respectively on the validation set.
Taken with the optimized underEGFR metagene, we obtained a concordancy of 95 and 91% considering 37 (Table III) and 24 genes (Table III) respectively instead of 92.
Some metagenes could be reduced at the level of a single gene still having a significant prognostic value.
An example of such a gene-based model contains SCUBE2 (SEQ ID NO: 681) and IGKC (SEQ ID NO: 1107 or 1099). SCUBE2 is an element of underEGFR metagene, while IGKC is part of overEGFR metagene.
Threshold optimization: the 3rd quartile (cut-off=0.095), confidence interval [0.0513-0.1387]) was the most significant (p=9.1 10−04, logrank test) and separated the identification set in a good-prognosis group (77% MFS at 5 years) and a poor-prognosis group (51% MFS at 5 years).
Model Validation: we used the coefficients and the threshold previously calculated to separate the 164 patients from the validation set into two groups that had statistically significant outcome (p=4 10−04, logrank test). The good prognosis group had a 5 y MFS of 83% (69% of the patients) while the poor prognosis group had a 5 y MFS of 55% (24% of the patients, 7% of patients not interpretable). On a subset of the validation set, constituted of the clinical trial PACS01 (N=128), we obtained similar validation (p=1.3 10−03, logrank test) with 90% of 5-year MFS in the good-prognosis group (69% of patients) and 61% of 5-year MFS in the poor-prognosis group (23% of patients, 7% of patients not interpretable).
Model performances: we performed multivariate analysis to determine the importance of this simplified model as described previously.
Multivariate analysis on the global population (N=330)
Multivariate analysis on the training set (N=222)
Multivariate analysis on the PACS01 clinical trial (N=108)
Different nucleic acids array platforms may be used to work the present invention including, but not limited to, cDNA platforms (Image or “Ipso” clones described below), Affymetrix® platforms (GeneChip® probe sets) and others.
The following tables are examples of metagenes of the invention that may be used on a cDNA platform according to the above described methods. For example, the following underER, underPR and underEGFR metagenes may be used in the above described method using a Cox regression analysis and the score SC=−2.90279×underER−1.47423×underPR−4.17198×under EGFR, with the intervals mentioned previously in the description for “a”, “b” and “c” (and similarly for the above described combination involving underEGFR and over EGFR, as well as the IGKC+SCUBE2 combination). The Seq3′ and Seq5′ in the tables below columns provide the sequences identifying the respective Image or Ipso clones.
Homo sapiens
We profiled 113 samples from the validation set on the Affymetrix® platform to evaluate agreement between the 2 platforms.
A mapping was performed to find the Affymetrix® probesets corresponding to the sequences comprised into the 3 metagenes, using standard sequence alignment (blast) algorithms.
For a given gene, several Image clones may exist, each of them covering a particular region of the gene, more commonly in the 3′ region. Affymetrix® probesets are also designed to target a specific region of a gene, of around 1000 nucleotides. Clone inserts and Affymetrix® targets do not necessarily overlap, even if the same gene is considered.
Given this information, there were two possibilities to find a correspondence between Discovery™ and Affymetrix® plateform:
i) sequence alignment of clone inserts and probesets against a Reference Sequence (ReSeq), which represents a specific gene, and selection of pairs (Clone, Probeset) with homologies to the same Refseq, even if the these sequences do not overlap;
ii) consider only pairs which overlap, assuming that signal may differ according to the region we focus on. This second approach was chosen to select Affymetrix® probe sets corresponding to the Discovery clones.
Raw data from Affymetrix® platform were first normalized using the RMA (Robust Multichip Average) method available in Bioconductor (Irizarry et al. 2 . . . ) (Affymetrix® package), then corrected to take into account the inter-platform effect and calculate the score for each sample. The data processing applied was the same as previously described on the Discovery™ platform for normalization and Metagenes calculation.
As an example, comparing sample classification into good or poor prognosis group on Discovery™ and Affymetrix® platform, we obtained 95% when using appropriate confidence interval around the threshold.
The following tables (IX to XIV) are examples of metagenes of the invention that may be used with an Affymetrix® platform according to the above described methods. For each metagene (IX to XIV), at least two, preferably five, most preferably ten or all of the markers listed, e.g., genes, or marker-derived polynucleotides, e.g., Affymetrix® Probe Sets, may be used to perform these methods. The sequences of the listed Affymetrix® Probe Sets are provided in the enclosed sequence listing and are also publicly available from internet, e.g., www.affymetrix.com. For example, these underER, underPR and underEGFR metagenes may be used in the above described method using a Cox regression analysis and the score SC=a×underER+b×underPR+c×under EGFR, wherein “a” is comprised in the interval [−6.26; +0.49], “b” is comprised in the interval [−2.65; +0.29] and “c” is comprised in the interval [−6.69; +1.65]. For example the formula is: SC=−2.90279×underER−1.47423×underPR−4.17198×under EGFR. Preferably, metagenes of tables IX to XI are used together one the one hand, and metagenes of tables XII to XIV are used together on the other hand.
The error on the score was integrated by calculating a confidence interval around the threshold, within which sample classification was considered non robust. Considering the score distribution Gaussian, we estimated the confidence interval around the threshold using standard deviation calculation method (estimated standard deviation of the population/√n).
The inventors have established that a woman having a score (SC) of more than 0.16 have at least a double propensity of poor clinical outcome than a woman with a score (SC) of less than 0.015.
The above described protocol for finding a correspondence between a cDNA platform (e.g., Discovery™) and another platform (e.g., Affymetrix®) may be similarly applied by a person skilled in the art for the other metagenes according to the present invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB08/02334 | 4/16/2008 | WO | 00 | 5/21/2010 |
Number | Date | Country | |
---|---|---|---|
60923690 | Apr 2007 | US |