The present invention relates to a method of prognosis and personalized therapy.
Lung cancer is the most common cancer diagnosis in the world with 1.5 million new cases in 2007 (Salomon et al., Crit Rev Oncol Hematol., 19:183-232, 1995, SEER-database 05.2010). The high incidence and mortality rates make it the leading cause of cancer-related death with more than 975,000 deaths per year and a 5-year survival rate of 15% (Salomon et al., supra). Lung cancer can be classified as small cell lung cancer, or non-small cell lung cancer (NSCLC). NSCLC accounts for about 80% of the cases. NSCLC can be further subdivided into several histological types, the most common ones are adenocarcinoma (40%) and squamous cell carcinoma (25%).
The current treatment of NSCLC is mainly based on tumor morphology and the tumor-node-metastasis (TNM)-based staging system that classifies tumor in graduated categories (Stage IA, IB, IIA, IIB, IIIA, IIIB and IV) corresponding to the extent of tumor progression. Many staging systems exist (e.g., clinical vs. pathological staging, as well as various editions of staging guidelines such as those issued by International Association for the Study of Lung Cancer (IASLC) (Goldstraw et al, J. Thorac. Oncol. 2, 706-714, 2007). The frequency of the stages, for example according to clinical staging and IASLC 6th edition of TNM staging recommendation, are 23% stage I, 19% stage II, 37% stage III and 21% stage IV (Goldstraw et al. supra). The relative proportions may change substantially depending on the guidelines and whether clinical or pathological staging is used.
Surgery is the standard treatment for early stage NSCLC (stage I and II), followed by adjuvant therapy such as radiation therapy, chemotherapy (for stage II and later), and bevacizumab and epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs) for the advanced NSCLC-stages (Kutikova et al., Lung Cancer; 50(2):143-154, 2005). Clinical guidelines in the United States and Europe for treatment of NSCLC support these treatment options (Mendelssohn et al., 2000; NCCN-Guidelines NSCLC V1, 2010).
Based on the current TNM-based staging system for early lung cancer Stage I NSCLC patients suffer from a 35% chance of relapse within 5 years after surgery (SEER-Database, 2008). Current treatment guidelines do not recommend an adjuvant chemotherapy for these patients. Whereas 30% patients with a TNM-based Stage II will not experience a relapse without any adjuvant chemotherapy (SEER-Database) meaning that these patients experience over treatment based on the current treatment guidelines (i.e., ESMO (D'Addario et al., Annals of Oncology. 2009; 20 (suppl 4):iv68-iv70, 2009), NCCN V1-2010). This is paralleled by reports stating that 60% of patients with early NSCLC will have no relapse after surgery (Arriagada et al., NEJM, 350:351-360, 2004). Based on current clinical data adjuvant chemotherapy treatment of early NSCLC provides evidence that the median benefit for adjuvant chemotherapy is 4% (NSCLC Meta-Analysis Collaborative Group), improving from 60% to 64% at 5 years.
Based on the current shortcomings, there is a medical rational for the need of a prognostic and/or predictive genomic signature for patients with NSCLC.
The present invention relates, in part, to methods for determining a prognosis of early stage lung cancer in an individual using one or more biomarkers described herein. These findings may be used to help to determine appropriate treatments for patients with early stage lung cancer such as identifying those patients who would benefit from receiving adjuvant therapy.
In one aspect, the invention includes a method for prognosing or classifying a subject with non-small cell lung cancer (NSCLC) including obtaining a test sample from a subject suffering from NSCLC following surgical resection; determining the expression level of at least one or more biomarker identified in Table 1, Table 2 and/or Table 3, or any combination of biomarkers identified in Table 1, Table 2 and/or Table 3 in the test sample; and analyzing the expression level to generate a risk score, wherein the risk score can be used to provide a prognosis or classify the subject.
In another aspect, the invention includes a method for prognosing or classifying a subject with non-small cell lung cancer (NSCLC) comprising: obtaining a test sample from a subject suffering from NSCLC following surgical resection; determining the expression level of at least one biomarkers from Table 1, Table 2 and Table 3 in the test sample; and analyzing the expression level to generate a risk score, wherein the risk score can be used to provide a prognosis or classify the subject. In one embodiment, the at least one biomarker identified in Table 1, Table 2 and Table 3 includes CBX7, STX1A, and TPX2. In another embodiment, the at least one biomarker identified in Table 1, Table 2 and Table 3 includes CBX7, TMPRSS2, STX1A, KLK6, TPX2 and UCK. In yet another embodiment, the at least one biomarker identified in Table 1, Table 2 and Table 3 includes CBX7, TMPRSS2, GPR116, STX1A, KLK6, SLC16A3, TPX2, UCK2, PHKA1. In still yet another embodiment, the at least one biomarker identified in Table 1, Table 2 and Table 3 comprises CBX7, TMPRSS2, GPR116, KCNJ15, STX1A, KLK6, SLC16A3, PYGL, TPX2, UCK2, PHKA1, or EIF4A3. In yet another embodiment, the at least one biomarker identified in Table 1, Table 2 and Table 3 includes CBX7, TMPRSS2, GPR116, KCNJ15, PTPN13, STX1A, KLK6, SLC16A3, PYGL, LDHA, TPX2, UCK2, PHKA1, EIF4A3 or TK1. In yet another embodiment, the at least one biomarker identified in Table 1, Table 2 and Table 3 comprises CBX7, TMPRSS2, GPR116, KCNJ15, PTPN13, CTSH, STX1A, KLK6, SLC16A3, PYGL, LDHA, ITGA5, TPX2, UCK2, PHKA1, EIF4A3, TK1, or CCNA2.
In one embodiment, the risk score of the invention can be used for prognosis by mapping subjects to time-specific probability of death due to lung cancer, distance metastasis or local relapse.
In another embodiment, the risk score can classify the subject into a high risk group that would benefit from receiving adjuvant chemotherapy or in a low risk group that would not benefit from receiving adjuvant chemotherapy.
In another aspect, the invention includes a method of predicting prognosis in a subject with non-small cell lung cancer (NSCLC) following surgical resection, comprising determining expression profile of mRNA from tumor samples, either from fresh frozen (FF) or formalin fixed paraffin embedded (FFPE) material. The profile comprises of one or more biomarkers listed in Table 1, Table 2 and/or Table 3, wherein an increase in expression of one or more biomarkers listed in Table 2 and/or Table 3 and a decrease in expression of one or more of the biomarkers listed in Table 1 compared to a control is used to predict whether the subject is in a high risk group having poor survival or a low risk group having good survival. In the method of the invention, a subject in the high risk group is selected for adjuvant chemotherapy and the subject in the low risk group is not selected for adjuvant chemotherapy and then treated accordingly.
In yet another embodiment, the invention includes a method of selecting a therapy for a subject with NSCLC, including obtaining a test sample from a subject suffering from NSCLC who has undergone a resection; determining the expression level of at least two or more biomarkers identified in Table 2 in the test sample to generate an expression value for each gene; and analyzing the expression value to generate a risk score, wherein the risk score can be used to classify whether the subject is selected to receive an angiogenesis inhibitor such as avastin.
In the methods of the invention, the invention includes determining expression of at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14 or at least 15 biomarkers identified in Table 1, Table 2 and/or Table 3, or any combination thereof. For example, the expression of at least any 5 biomarkers from each of Table 1, Table 2 and Table 3 are selected (the signature in this embodiment would include at least 15 biomarkers).
In one embodiment, the NSCLC is stage I NSCLC, stage II NSCLC, or a combination thereof. The NSCLC can be identified in the group consisting of squamous cell carcinoma and/or adenocarcinoma.
In one embodiment, the subject is human. In another embodiment, the test sample can be fresh, frozen, FFPE cells. In another embodiment, the expression level is determined using quantitative PCR or an array.
In yet another embodiment, analyzing expression to generate a risk score is performed using statistical analysis such as Cox regression or parametric survival predictors.
In another aspect, the invention includes a method of selectively treating a subject having NSCLC cancer including obtaining a test sample from a subject suffering from NSCLC following surgical resection, determining the expression level of at least one or more biomarkers identified in Table 1, Table 2 and/or Table 3, or any combination of biomarkers identified in Table 1, Table 2 and/or Table 3 in the test sample to generate a risk score; classifying the subject based on the risk score into a high risk group or a low risk group; and administering adjuvant therapy to the subject classified as belonging to the high risk group or administering no adjuvant therapy to the subject classified as belonging to the low risk group.
In yet another aspect, the invention includes a kit including a plurality of agents for measuring the expression of one or more biomarkers identified in Table 1, Table 2 and/or Table 3 and instructions for use. In yet another aspect, the invention includes a kit for predicting whether a subject with lung cancer would benefit from adjuvant therapy, the kit includes a plurality of agents for measuring the expression of one or more biomarkers identified in Table 1, Table 2 and/or Table 3; means for analyzing the expression and generating a risk score to predict whether a patient would benefit from adjuvant therapy. The agents for measuring expression can include an array of polynucleotides complementary to the mRNAs of the identified biomarkers. The agents that measure expression can include a plurality of PCR probes and/or primers for qRT-PCR. The kit can include agents for measuring at least one biomarker identified in Table 1, Table 2 and Table 3 such as CBX7, STX1A, or TPX2.
In another aspect, the invention includes an array comprising one or more polynucleotide probes complementary and hybrdizable to an expression product of at least two biomarkers etc shown in Table 1, Table 2 and/or Table 3.
In yet another aspect, the invention includes a composition comprising a plurality of isolated nucleic acid sequences, wherein each isolated nucleic acid sequence hybridizes to an RNA product of a biomarker shown in Table 1, e.g., the biomarkers CBX7, STX1A and TPX2, wherein the composition is used to measure the level of RNA expression of the three genes.
In yet another aspect, the invention includes a computer product for predicting a prognosis, or classifying a subject with NSCLC including means for receiving data corresponding to the expression level of one or more biomarkers in a sample from a subject having NSCLC, wherein the one or more biomarkers are identified in Table 1, Table 2 and/or Table 3, means for generating an expression value for each gene; and means for generating a risk score based on inputting the expression value into a database comprising a reference expression profile associated with a prognosis, wherein the risk score predicts a prognosis of survival or classifies the subject into a high risk group or a low risk group.
In yet another aspect, the invention includes a computer product for use with the method of any one of methods described above.
A “biomarker” is a molecule useful as an indicator of a biologic state in a subject. With reference to the present subject matter, the biomarkers disclosed herein can be molecules that exhibit a change in expression and whose presence can be used for prognosis or to predict whether a subject would benefit from receiving a particular treatment. The biomarkers of interest can be determined by detecting for a change in expression of the biomarker. A change in expression describes the conversion of the DNA gene sequence information into transcribed RNA (the initial unspliced RNA transcript or the mature mRNA) or the encoded protein product. The biomarkers disclosed herein include any, or any combination of the biomarkers listed in Tables 1, 2 and 3 and can be transcribed RNA or encoded protein product.
The present invention is based, in part, on methods which can be used for the prognosis or classification of individuals having early stage lung cancer. The invention further includes identifying those patients who are at high risk for disease recurrence and for whom adjuvant therapy might be recommended, as well as patients with a low recurrence risk, who might not benefit from adjuvant therapy. In one example, the prognosis and prediction methods described herein are based upon the differential expression of a plurality of biomarkers in a lung cancer test sample. The biomarkers of the invention can include 38 genes (CBX7, TMPRSS2, GPR116, KCNJ15, PTPN13, CTSH, PPFIBP2, CD302, SFTPB, HSD17B6, DLC1, ADRB2, PARM1, KLRB1, MS4A1, STX1A, KLK6, SLC16A3, PYGL, LDHA, ITGA5, VEGFC, EEF1A2, TPX2, UCK2, PHKA1, EIF4A3, TK1, CCNA2, GGH, CCNB1, MELK, HMMR, EIF2S1, TEAD4, HMGA1, RIMS2, H2AFZ), or a combination thereof, which can be broken up into three modules (Table 1, 2, and 3) based on criteria including biological function. Table 1 (which is referred to herein as also Module 1) includes genes involved in tumor suppression, Table 2 (which is referred to herein as also Module 2) includes genes involved in angiogenesis, and Table 3 (which is referred to herein as also Module 3) includes genes involved in proliferation.
It was discovered that some biomarkers are over-expressed in early stage lung cancer such as those markers involved in angiogensis or proliferation (Table 2 and Table 3, respectively), whereas other biomarkers involved in tumor suppression are under-expressed (Table 1) as compared to a control (e.g., the average expression of these genes in patients with early stage lung cancer (stage I and II)).
The biomarker(s) of the invention includes one or more biomarkers listed in Table 1, Table 2, and/or Table 3, or their gene products. The present invention is based on the finding that the biomarkers listed in Table 1, Table 2, and/or Table 3 are differentially expressed. By analyzing the expression profile levels of one or more biomarkers identified in Table 1, Table 2, and/or Table 3 it is possible to determine the prognosis of an individual with early stage lung cancer.
In one example, the method of the invention includes measuring one or more biomarkers from Table 1. For example, the method of the invention measures at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen or at least fifteen, biomarkers from Table 1. In one example, the level of expression of one gene CBX7 from Table 1 is measured. In another example, the level of expression of two biomarkers CBX7 and TMPRSS2 from Table 1 are measured. In yet another example, the level of expression of three biomarkers CBX7, TMPRSS2 and GPR116 from Table 1 are measured. In yet another example, the level of expression of four biomarkers CBX7, TMPRSS2, GPR116 and KCNJ15 from Table 1 are measured. In yet another example, the level of expression of five biomarkers CBX7, TMPRSS2, GPR116, KCNJ15 and PTPN13 from Table 1 are measured.
In another example, the method of the invention includes measuring one or more biomarkers from Table 2. For example, the method of the invention measures the expression of at least one, at least two, at least three, at least four, at least five, at least six, at least seven or at least eight biomarkers from Table 2. In one example, the level of expression of one gene STX1A from Table 2 is measured. In one example, the level of expression of two biomarkers STX1A and KLK6 from Table 2 are measured. In another example, the level of expression of three biomarkers STX1A, KLK6 and SLC16A3 from Table 2 are measured. In another example, the level of expression of four biomarkers STX1A, KLK6, SLC16A3 and PYGL from Table 2 are measured. In yet another example, the level of expression of five biomarkers STX1A, KLK6, SLC16A3, PYGL and LDHA from Table 2 are measured.
In another example, the method of the invention includes measuring one or more biomarkers from Table 3. For example, the method of the invention measures at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen or at least fifteen, biomarkers from Table 3. In one example, the level of expression of one gene TPX2 from Table 3 is measured. In another example, the level of expression of two biomarkers TPX2 and UCK2 from Table 3 are measured. In another example, the level of expression of three biomarkers TPX2, UCK2 and PHKA1 from Table 3 are measured. In another example, the level of expression of four biomarkers TPX2, UCK2, PHKA1 and EIF4A3 from Table 3 are measured. In yet another example, the level of expression of five biomarkers TPX2, UCK2, PHKA1 EIF4A3 and TK1 from Table 3 are measured.
The biomarkers of the invention can also include any combination of biomarkers identified in Table 1, Table 2 and Table 3 whose level of expression or gene product serves as a predictive marker or biomarker for prognosis of an individual with early stage lung cancer. In one example, the level of expression of one gene selected from each Table, Table 1, Table 2 and Table 3 is measured, e.g., CBX7, STX1A and TPX2. In another example, the level of expression of two biomarkers selected from each of the Tables, Table 1, Table 2 and Table 3 is measured, e.g., CBX7 and TMPRSS2 from Table 1, STX1A and KLK6 from Table 2 and TPX2 and UCK2 from Table 3. See Table 4 below for examples of various combinations of biomarkers from Tables 1, 2 and 3. The combinations shown in Table 4 are not meant to be construed as limiting and any combination of biomarkers shown in Tables 1-3 can be made.
In another example, at least any 3, 4, 5, 6, 7, 8, 9, 10 genes from each Table 1, Table 2, and Table 3 are selected. For example, in one embodiment, at least 15 biomarkers are selected where any 5 biomarkers from Table 1 are selected (e.g., CBX7, TMPRSS2, GPR116, KCNJ15 and PTPN13 or CBX7, TMPRSS2, CTSH, PPFIBP2 and CD302; or SFTPB; HSD17B6; DLC1; ADRB2 and PARM1), any 5 biomarkers from Table 2 are selected (e.g., STX1A, KLK6, SLC16A3, PYGL and LDHA or SLC16A3, PYGL, ITGA5, VEGFC, and EEF1A2 or STX1A, KLK6, SLC16A3, PYGL and LDHA) and any 5 biomarkers from Table 3 are selected (TPX2, UCK2, PHKA1, EIF4A3, and TK1 or CCNA2, GGH, CCNB1, MELK and HMMR or EIF2S1, TEAD4, HMGA1, RIMS2, and H2AFZ).
In another embodiment, the biomarkers of the invention include any one, or any combination, of the following genes: CBX7, TMPRSS2, GPR116, KCNJ15, PTPN13, CTSH, PPFIBP2, CD302, SFTPB, HSD17B6, DLC1, ADRB2, PARM1, KLRB1, MS4A1, STX1A, KLK6, SLC16A3, PYGL, LDHA, ITGA5, VEGFC, EEF1A2, TPX2, UCK2, PHKA1, EIF4A3, TK1, CCNA2, GGH, CCNB1, MELK, HMMR, EIF2S1, TEAD4, HMGA1, RIMS2, H2AFZ. In another embodiment, the biomarkers of the invention include at least 15, 20, 25, 30, 35, 36, 37 or 38 of the following genes: CBX7, TMPRSS2, GPR116, KCNJ15, PTPN13, CTSH, PPFIBP2, CD302, SFTPB, HSD17B6, DLC1, ADRB2, PARM1, KLRB1, MS4A1, STX1A, KLK6, SLC16A3, PYGL, LDHA, ITGA5, VEGFC, EEF1A2, TPX2, UCK2, PHKA1, EIF4A3, TK1, CCNA2, GGH, CCNB1, MELK, HMMR, EIF2S1, TEAD4, HMGA1, RIMS2, H2AFZ. In a particular embodiment, the following 37 biomarkers are selected: CBX7, TMPRSS2, GPR116, KCNJ15, PTPN13, CTSH, PPFIBP2, CD302, SFTPB, HSD17B6, DLC1, ADRB2, PARM1, KLRB1, MS4A1, STX1A, KLK6, SLC16A3, PYGL, LDHA, ITGA5, VEGFC, TPX2, UCK2, PHKA1, EIF4A3, TK1, CCNA2, GGH, CCNB1, MELK, HMMR, EIF2S1, TEAD4, HMGA1, RIMS2, H2AFZ.
In one example, the expression profile can be a set of values representing mRNA levels of one or more biomarkers listed in Table 1, Table 2, and/or Table 3. In another example, the expression profile can include a set of values representing one or more protein or polypeptides encoded by the biomarkers listed in Table 1, Table 2, and/or Table 3.
Any appropriate test sample of cells taken from an individual having early stage lung cancer who has undergone a surgical resection can be used to determine the expression of a plurality of biomarkers of the invention. The type and classification of the early stage lung cancer can vary. The lung cancer can be in Stage I and/or Stage II. The test sample can be a non-small cell lung cancer (NSCLC) which includes squamous cell carcinoma, adenocarcinoma, large cell carcinoma, as well as all histotypes irrespective of the subgroup.
Generally, the test sample of cells or tissue sample will be obtained from the subject with cancer by biopsy or surgical resection. The surgical resection can be curative or non-curative/RO. A sample of cells, tissue, or fluid may be removed by needle aspiration biopsy. For this, a fine needle attached to a syringe is inserted through the skin and into the organ or tissue of interest. The needle is typically guided to the region of interest using ultrasound or computed tomography (CT) imaging. Once the needle is inserted into the tissue, a vacuum is created with the syringe such that cells or fluid may be sucked through the needle and collected in the syringe. A sample of cells or tissue may also be removed by incisional or core biopsy. For this, a cone, a cylinder, or a tiny bit of tissue is removed from the region of interest. CT imaging, ultrasound, or an endoscope is generally used to guide this type of biopsy. More particularly, the entire cancerous lesion may be removed by excisional biopsy or surgical resection. In the present invention, the test sample is typically a sample of cells removed as part of surgical resection.
The test sample of, for example tissue, may also be stored in, e.g., RNAlater (Ambion; Austin Tex.) or flash frozen and stored at −80° C. for later use. The biopsied tissue sample may also be fixed with a fixative, such as formaldehyde, paraformaldehyde, or acetic acid/ethanol. The fixed tissue sample may be embedded in wax (paraffin) or a plastic resin. The embedded tissue sample (or frozen tissue sample) may be cut into thin sections. RNA or protein may also be extracted from a fixed or wax-embedded tissue sample.
The subject with cancer will generally be a mammalian subject such as a primate. In an exemplary embodiment, the subject is a human.
Once a sample of cells or sample of tissue is removed from the subject with cancer, it may be processed for the isolation of RNA or protein using techniques well known in the art and as described below.
In one example, RNA may be extracted from tissue or cell samples by a variety of methods, for example, guanidium thiocyanate lysis followed by CsCl centrifugation (Chirgwin, et al., Biochemistry 18:5294-5299, 1979). RNA from single cells may be obtained as described in methods for preparing cDNA libraries from single cells (see, e.g., Dulac, Curr. Top. Dev. Biol. 36:245, 1998; Jena, et al., J. Immunol. Methods 190:199, 1996). The RNA sample can be further enriched for a particular species. In one embodiment, for example, poly(A)+RNA may be isolated from an RNA sample. In particular, poly-T oligonucleotides may be immobilized on a solid support to serve as affinity ligands for mRNA. Kits for this purpose are commercially available, for example, the MessageMaker kit (Life Technologies, Grand Island, N. Y.). In one embodiment, the RNA population may be enriched for sequences of interest, as detailed on Tables 1-3. Enrichment may be accomplished, for example, by primer-specific cDNA synthesis, or multiple rounds of linear amplification based on cDNA synthesis and template-directed in vitro transcription (see, e.g., Wang, et al., Proc. Natl. Acad. Sci. USA 86:9717, 1989).
In one example, the method includes determining expression of one or more biomarkers listed in Table 1, Table 2, and/or Table 3, or their gene products from a tumor sample of a test cancer patient. The gene sequences of each of the biomarkers listed in Table 1, Table 2, and/or Table 3 can be detected using methods known in the art, e.g., agents that can be used to specifically detect the gene or gene products thereof.
Exemplary detection agents are nucleic acid probes, which hybridize to nucleic acids corresponding to the genes disclosed herein, and antibodies which bind to the encoded products of these genes. The biomarkers listed in Table 1, Table 2, and/or Table 3 are intended to also include naturally occurring sequences including allelic variants and other family members. The biomarkers of the invention also include sequences that are complementary to those listed sequences resulting from the degeneracy of the code and also sequences that are sufficiently homologous and sequences which hybridize under stringent conditions to the biomarkers listed in Table 1, Table 2, and/or Table 3.
In one embodiment, the method includes: providing a nucleic acid probe comprising a nucleotide sequence, for example, at least 10, 15, 25 or 40 nucleotides, and up to all or nearly all of the coding sequence which is complementary to a portion of the coding sequence of a nucleic acid sequence listed in Table 1, Table 2, and/or Table 3; obtaining a tissue sample from a mammal having a cancerous cell; contacting the nucleic acid probe under stringent conditions with RNA obtained from a biopsy taken from a patient with NSCLC (e.g., in a Northern blot or in situ hybridization assay); and determining the amount of hybridization of the probe with RNA.
Conditions for hybridization are known to those skilled in the art and can be found in Current Protocols in Molecular Biology, John Wiley and Sons, N.Y. (1989), 6.3.1-6.3.6. A preferred, non-limiting example of highly stringent hybridization conditions are hybridization in 6× sodium chloride/sodium citrate (SSC) at about 45 degrees centigrade followed by one or more washes in 0.2×SSC, 0.1 percent SDS at 50-65 degrees centigrade. By “sufficiently homologous” it is meant a amino acid or nucleotide sequence of a biomarker which contains a sufficient or minimum number of identical or equivalent (e.g., an amino acid residue which has a similar side chain) amino acid residues or nucleotides to a second amino acid or nucleotide sequence such that the first and second amino acid or nucleotide sequences share common structural domains or motifs and/or a common functional activity. For example, amino acid or nucleotide sequences which share common structural domains have at least about 50 percent homology, at least about 60 percent homology, at least about 70 percent, at least about 80 percent, and at least about 90-95 percent homology across the amino acid sequences of the domains are defined herein as sufficiently homologous. Furthermore, amino acid or nucleotide sequences at least about 50 percent homology, at least about 60-70 percent homology, at least about 70-80 percent, at least about 80-90 percent, and at least about 90-95 percent and share a common functional activity are defined herein as sufficiently homologous.
The comparison of sequences and determination of percent homology between two sequences can be accomplished using a mathematical algorithim. A preferred, non-limiting example of a mathematical algorithim utilized for the comparison of sequences is the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264-68, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-77. Such an algorithm is incorporated into the NBLAST and XBLAST programs (version 2.0) of Altschul, et al. (1990) J. MoI. Biol. 215:403-10. BLAST nucleotide searches can be performed with the NBLAST program, score=100, wordlength=12 to obtain nucleotide sequences homologous to TRL nucleic acid molecules of the invention. BLAST protein searches can be performed with the XBLAST program, score=50, wordlength=3 to obtain amino acid sequences homologous to the protein sequences encoded by the biomarkers listed in Table 1, Table 2, and/or Table 3. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Research 25(17):3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g.,) (BLAST and NBLAST) can be used. See http://www.ncbi.nlm.nih.gov. Another preferred, non-limiting example of a mathematical algorithim utilized for the comparison of sequences is the ALIGN algorithm of Myers and Miller, CABIOS (1989). When utilizing the ALIGN program for comparing amino acid sequences, a PAMl 20 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used.
Nucleic acids may be labeled during or after enrichment and/or amplification of RNAs. For example, reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, for example, a fluorescently labeled dNTP. In another embodiment, the cDNA or RNA probe may be synthesized in the absence of detectable label and may be labeled subsequently, for example, by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent.
Fluorescent moieties or labels of interest include coumarin and its derivatives (e.g., 7-amino-4-methylcoumarin, aminocoumarin); bodipy dyes such as Bodipy FL and cascade blue; fluorescein and its derivatives (e.g., fluorescein isothiocyanate, Oregon green); rhodamine dyes (e.g., Texas red, tetramethylrhodamine); eosins and erythrosins; cyanine dyes (e.g., Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7); FluorX, macrocyclic chelates of lanthanide ions (e.g., Quantum Dye™); fluorescent energy transfer dyes such as thiazole orange-ethidium heterodimer, TOTAB, dansyl, etc. Individual fluorescent compounds which have functionalities for linking to an element desirably detected in an apparatus or assay of the invention, or which may be modified to incorporate such functionalities may also be utilized (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego; Calif.). Chemiluminescent labels include luciferin and 2,3-dihydrophthalazinediones, for example, luminol.
Detecting for the presence of a protein product encoded by one or more of the biomarkers listed in Table 1, Table 2 and/or Table 3 can be done by using any appropriate method known in the art. For example, an agent of interest that can be used to detect a particular protein of interest, for example using an antibody. The method for producing polyclonal and/or monoclonal antibodies that specifically bind to polypeptides useful in the present invention is known to those of skill in the art and may be found in, for example, Dymecki, et al., (J. Biol. Chem. 267:4815, 1992); Boersma and Van Leeuwen, (J. Neurosci. Methods 51:317, 1994); Green, et al., (Cell 28:477, 1982); and Arnheiter, et al., (Nature 294:278, 1981). In one embodiment, an immunoassay can be used to quantitate the levels of proteins in cell samples. The invention is not limited to a particular assay procedure, and therefore, is intended to include both homogeneous and heterogeneous procedures. Exemplary immunoassays that may be conducted according to the invention include fluorescence polarization immunoassay (FPIA)5 fluorescence immunoassay (FIA), enzyme immunoassay (EIA), nephelometric inhibition immunoassay (NIA), enzyme-linked immunosorbent assay (ELISA), and radioimmunoassay (RIA). An indicator moiety, or label group, may be attached to the subject antibodies and is selected so as to meet the needs of various uses of the method that are often dictated by the availability of assay equipment and compatible immunoassay procedures. General techniques to be used in performing the various immunoassays noted above are known to those of ordinary skill in the art. Alternatively other methods can be used such as Western blot analysis that includes electrophoretically separating proteins on a polyacrylamide gel, and after staining the separated proteins, the relative amount of each protein can be quantified by assessing its optical density. Alternatively, other methods such as dot-blot assays, FACS or immunohistochemistry can be used.
The tissue samples are fixed by treatment with a reagent such as formalin, glutaraldehyde, methanol, or the like. The samples are then incubated with an antibody (e.g., a monoclonal antibody) with binding specificity for the marker polypeptides. This antibody may be conjugated to a label for subsequent detection of binding. Samples are incubated for a time sufficient for formation of the immunocomplexes. Binding of the antibody is then detected by virtue of a label conjugated to this antibody. Where the antibody is unlabeled, a second labeled antibody may be employed, for example, that is specific for the isotype of the anti-marker polypeptide antibody. Examples of labels that may be employed include radionuclides, fluorescers, chemiluminescers, enzymes, and the like.
Where enzymes are employed, the substrate for the enzyme may be added to the samples to provide a colored or fluorescent product. Examples of suitable enzymes for use in conjugates include horseradish peroxidase, alkaline phosphatase, malate dehydrogenase, and the like. Where not commercially available, such antibody-enzyme conjugates are readily produced by techniques known to those skilled in the art.
In yet another embodiment, the invention contemplates using a panel of antibodies that are generated against the marker polypeptides of this invention.
mRNA Detection
An important aspect of the present invention is to measure the expression level of one or more biomarkers identified in Table 1, Table 2 and/or Table 3 in a lung cancer tumor biopsy taken from a subject suffering from early stage lung cancer following surgical resection. The expression levels can be analyzed and used to generate a risk score.
In one example, reverse Transcriptase PCR (RT-PCR) can be used for gene expression profiling to compare mRNA levels in different sample populations. The method includes isolating mRNA using any technique known in the art, e.g., by using a purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions or complete DNA and RNA Purification Kit (EPICENTRE®, Madison, WT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling and the cDNA derived can then be used as a template in the subsequent PCR reaction. TaqMan® RT-PCR can then be performed using, e.g., commercially available equipment.
A more recent variation of the RT-PCR technique is the real time quantitative PCR, which measures PCR product accumulation through a dual-labeled fluorigenic probe (i.e., TaqMan® probe). Real time PCR is compatible both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR. For further details see, e.g. Held et al, Genome Research 6:986-994 (1996).
In another example, microarrays are used which include one or more probes corresponding to one or more of biomarkers identified in Table 1, Table 2 and/or Table 3. The method described above results in the production of hybridization patterns of labeled target nucleic acids on the array surface. The resultant hybridization patterns of labeled nucleic acids may be visualized or detected in a variety of ways, with the particular manner of detection selected based on the particular label of the target nucleic acid. Representative detection means include scintillation counting, autoradiography, fluorescence measurement, calorimetric measurement, light emission measurement, light scattering, and the like.
In one example, the method of detection utilizes an array scanner that is commercially available (Affymetrix, Santa Clara, Calif.), for example, the 417Arrayer, the 418Array Scanner, or the Agilent GeneArray.Scanner. This scanner is controlled from a system computer with an interface and easy-to-use software tools. The output may be directly imported into or directly read by a variety of software applications.
As used herein, the control for comparison can be determined by one skilled in the art. In one aspect, the control is determined by choosing a value that serves as a cut-off value. For example, the value can be a value that differentiates between e.g., those test samples that have good survival and those that have bad survival; or between those test samples where the individual would benefit from adjuvant therapy and those that would not; or between those test samples where the individual would benefit form the administration of a particular drug such as an inhibitor of angiogenesis or an inhibitor of proliferation. A patient that might benefit from adjuvant therapy means an improvement in any measure of patient status including those measures ordinarily used in the art such as overall survival, long-term survival, recurrence-free survival, and distant recurrence-free survival.
Other methods for determining levels of gene expression include MassARRAY-based gene expression profiling method, developed by Sequenom, Inc. (San Diego, Calif.) and serial analysis of gene expression (SAGE) (Velculescu et al, Science 270:484-487 (1995); and Velculescu et al, Cell 88:243-51 (1997).
Yet other methods for determining levels of gene expression in FFPE materials are gNPA™ technology (HTG Molecular Diagnostics, Inc., Arizona) and Nanostring™ Technologies (Seattle), where neither RNA extraction nor amplification are required. Using qNPA technology the FFPE sample is first exposed to the HTG lysis buffer and nuclease protection probes complementary to the mRNA of the biomarkers described herein are then added to the solution. The probes hybridize to all RNA biomarkers of interest, soluble and cross-linked. After hybridization, S1 nuclease is added destroying all nonspecific, single stranded nucleic acids, producing a stoichiometric amount of biomarker-mRNA probe duplexes. Base hydrolysis then releases the probe from the duplexes. Probes can then be transferred to a programmed ArrayPlate, detection linker added, and both probes and detection linkers captured onto the array. The ArrayPlate is then washe and a HRP-labeled detection probe added, incubated. The array plate is then washed and a chemiluminescent substrate added. Finally, the ArrayPlate is imaged and expression of each of the biomarkers in all wells measured. Using Nanostring technologies two ˜50 base probes per biomarker mRNA are employed which hybridize to the mRNA in solution. The reporter probe carries the signal, while the capture probe allows the complex to be immobilized for data collection. Following hybridization, excess probes are removed and the probe/target complexes are aligned and immobilized in a Counter Cartridge.
In the method of the invention the level of expression of one or more biomarkers as described above is measured and analyzed and used to generate a risk score as described below. The expression threshold can be used prognostically, e.g., to select for those individuals who have good survival and those that have bad survival.
It is necessary to correct for (normalize away) both differences in the amount of RNA assayed and variability in the quality of the RNA used. Therefore, the assay typically measures and incorporates the expression of certain normalizing genes, including well known housekeeping genes, such as GAPDH and Cypl. Alternatively, normalization can be based on the mean or median signal (Ct) of all of the assayed biomarkers or a large subset thereof (global normalization approach). On a gene-by-gene basis, measured normalized amount of a patient tumor mRNA is compared to the amount found in a lung cancer tissue reference set. The number (N) of lung cancer tissues in this reference set should be sufficiently high to ensure that different reference sets (as a whole) behave essentially the same way. If this condition is met, the identity of the individual lung cancer tissue present in a particular set will have no significant impact on the relative amounts of the biomarkers assayed.
In the methods of the invention, the expression of each biomarker is measured and typically will be converted into an expression value. These expression values then will be used to generate a risk score by weighted averaging. The risk score is associated to risk of death, metastasis or relapse through a calibration database, either through parametric formula or non-parametric, data-driven models. This database is constructed from a reference set of sample with known expression values, risk scores and clinical follow up. The risk score calibration may be available separately for each module (table 1, 2 or 3) or specific to a particular disease subtype as defined by histology, tumor staging or other characteristics such as patient age. For treatment response prediction, separate calibration formulae or databases are constructed for patients treated by specific therapies (or no treatment). A compound risk score (combining modules in table 1, 2, or 3 with certain weighting) may also be used, with its own calibration formula or database. Clinical decision making protocol may be done according to the calibrated risk score or predicted survival or time to events (relapse or metastasis) as described above.
The risk score once calculated may also be used to decide upon an appropriate course of treatment for the subject. A subject having a high risk score (i.e., short survival time or poor prognosis) may benefit from receiving adjuvant therapy. Adjuvant therapy may include appropriate chemotherapy agents, e.g., Paraplatin (carboplatin), Platinol (cisplatin), Taxotere (docetaxel), Adriamycin (doxorubicin), VePesid (etoposide), Gemzar (gemcitabine), Ifex (ifosfamide), Camptosar (irinotecan), Taxol (paclitaxel), Alimta (pemetrexed) and Hycamtin (topotecan) and/or radiation therapy. A subject having a negative risk score (i.e., long survival time or good prognosis) may not benefit from additional treatment.
In another example, the risk score generated using a gene set from a particular Table can be used to determine a course of specific therapy. For example, if an individual has a high risk score based on analysis of the biomarkers of Table 2 that individual may benefit from receiving adjuvant therapy which includes an angiogenesis inhibitor such as avastin, srafinib, sunitinib or pazopanib. Alternatively, if an individual has a high risk score based on analysis of biomarkers of Table 3 that individual may benefit from receiving adjuvant therapy which includes an anti-proliferative agent such as a topoisomerase inhibitor (I & II), Taxane, anthracycline, antitublin, antimetabolite or alkylating agents.
To facilitate the sample analysis operation, the data obtained by the reader from the device may be analyzed using a digital computer. Typically, the computer will be appropriately programmed for receipt and storage of the data from the device, as well as for analysis and reporting of the data gathered, for example, subtraction of the background, verifying that controls have performed properly, normalizing the signals, interpreting fluorescence data to determine the amount of hybridized target, normalization of background, and the like.
The invention further provides kits for determining the expression level of the biomarkers described herein. The kits may be useful for determining prognosis of lung cancer subjects. A kit can comprise a microarray comprising probes of any, of any combinations of biomarkers, identified in Tables 1-3 and/or any other solid support to which probes can be attached and the solid support can be used to measure gene expression of a test sample. In one embodiment, the kit comprises a computer readable medium which includes expression profile analysis software capable of being loaded into the memory of a computer system and which can convert the measured expression values into a risk score. A kit may further comprise nucleic acid controls, buffers, and instructions for use.
One skilled in the art will recognize many methods and materials similar or equivalent to those described herein, which could be used in the practice of the present invention. Indeed, the present invention is in no way limited to the methods and materials described. For purposes of the present invention, the following terms are defined below.
A combination of a Novartis dataset and various public data sets from cohorts of lung patients was studied. Description of the patient selection criteria and clinical characteristics can be found in the respective original articles for the public data sets (see below). For the Novartis dataset, 412 patient samples where collected from NSCLC patients who had undergone surgical resection. Standard staging procedures were performed including CT-Scans, FDG-PET of suspicious Lymphnodes (>1 cm in CT) and MRI. NSCLC histologies was performed to determine if NSCLC was Squamous, Adeno-Carcinoma or others such as BAC. TNM-based staging was also performed to define whether NSCLC was Stage I or Stage II. The fresh frozen tissue was banked for genomic analysis. The primary endpoint was overall survival. Overall survival refers to the time (in years) from first surgery and can be defined by a period such as at least 3 years, for example a 5 year period, which is relapse or recurrence free.
Public Datasets Used in this Example:
Bhattacharjee A et al. Proc Natl Acad Sci USA 98:13790-5, 2001; Takeuchi et al. (2006) J Clin Oncol 24:1679-88; Raponi et al. (2006) Cancer Res 66:7466-72; Lu et al. (2006) PLoS Med 3:e467; Shedden et al. (2008) Nat Med 14:822-7; Hou et al. (2010) PLoS One 5:e10312; Wilkerson et al. (2010) Clin Cancer Res 16:4864-75; Zhu et al. (2010) J Clin Oncol 28:4417-24
The procedure for preparing the datasets for the analysis was similar to that described in Wirapati et. al. 2008 Breast Cancer Research 10:R65, and briefly outlined as follow:
The signature genes were identified by large-scale integrated analysis of a comprehensive gene expression and clinical database consisting of lung cancer datasets newly generated by Novartis (two cohorts totalling 412 patients, unpublished) and publicly available gene expression datasets. Signature genes were selected and grouped them into the three modules (Table 1, 2, and 3) based on criteria including similarity of expression patterns with those of other types of cancer and biological function. Publicly available datasets were chosen such that the prognostic performance could be independently verified using the methods outlined below.
When applying a biomarker signature (a set of biomarkers as specified by Tables 1, 2, and/or 3), the genes that were missing were ignored. A raw score was assigned to each signature by averaging the expression values for genes that were present in a particular platform. The standardized score was produced by subtracting the mean of the raw score, and dividing by the standard deviation. The mean and standard deviation was determined separately for each dataset. Three different scores were produced for each patient. We will refer to them as modules 1, 2, or 3, corresponding to the gene sets in Table 1, 2, or 3, respectively.
To demonstrate that the scores from the three modules were not providing similar information, we did scatter plots of pairwise distributions of the scores in
To show the clinical utility of each of the module scores, we performed survival analysis using Kaplan-Meier curves (Kaplan and Meier J. Am. Statist. Assoc. 53:457-481, 1953), using the quantitative scores to divide the patients into quartiles (groups containing 25% of the patients).
An example of a prognostic system combining the three score is also shown in
In summary, each individual signature (module 1, 2 or 3), as well as their combination (module 1, 2 and 3), showed the ability to distinguish the survival (or equivalently, disease-related mortality) of subgroups of patients. In all cases substantial and statistically significant differences are observed at least between the extreme quartiles.
To show that the proposed signatures add new prognostic information to well-established factors such as histology, tumor staging and age at diagnosis, we performed similar analyses as above, but stratifying the data into groups. In this illustrative example, we only showed the stratification by each factor separately, dividing them into two major groups in order to have sufficient sample size in each group. The factors considered are:
Each of the signatures (module 1, 2, and 3) and the combination are shown separately in
In most instances, the prognostic power of the signatures (individually and in combination) was still observed. This indicates that the proposed signatures provided additional prognosis beyond the traditional factors. That is, the signatures are not merely surrogate markers that are highly correlated with existing factors. In particular, patients with the same tumor stage can be distinguished further into a range of risks. This was not merely refinement of the staging system, since the risk ordering may be reversed. For example, the group of patients with the highest risk in stage I is actually having worse survival than the average survival of stage II patients.
The analyses under various contexts of existing clinico-pathologic factors also highlight that these factors can also be incorporated in the application of the signatures. For example, for squamous cell carcinoma, the individual signatures did not show substantial risk discrimination, but the combined score shows the top quartile having substantially a worst outcome than the rest.
The risk prediction system will utilize database of clinical and gene expression data, similar to the one used in this example to allow projection of risk under alternative treatments (including no treatment). This system is similar to AdjuvantOnline (Ravdin et al 2001, 1 Clin. Oncol. 19:980), except that it also includes the scores from the claimed invention.
Typical application of the biomarkers disclosed in Tables 1, 2 and/or 3 for a patient with lung cancer.
1. A patient diagnosed with lung cancer with small and operable primary tumors undergoes surgery to remove the lession. A part of the tumor tissue is examined by standard pathology procedure such as determination of tumor size, tumor histological types (such as adenocarcinoma, squamous cell carcinoma, or other types). Tumor staging is determined using standard guidelines, based on tumor size, presence of lymph node metastasis or other distant sites of metastasis. The information obtained from the standard clinico-pathologic measurements may modify and enhance the prognostic and predictive application of the invention, but it is not a requirement and not an integral part.
2. A part of the tumor tissue is used as the source material for the claimed invention, either as frozen, paraffin embedded or fresh tissue. Whole transcriptome RNA extraction is performed on the tissue.
3. Measurement of the relative quantity of the RNA for specific set of genes (claimed in the invention) is performed by any of these procedures:
An example of how the risk prediction is modulated by tumor stage is shown in
The risk projection can be calibrated against a database of past observations of patients with records of disease outcome, clinico-pathologic variables and measurements of the claimed invention. The database may be periodically updated with information from new patients. This system is similar to AdjuvantOnline (Ravdin et al. (2001) J Clin Oncol. 2001 Feb. 15; 19(4):980-91), a widely used tool for projecting the survival of cancer patients as the function of various clinico-pathologic variables. We extend the system to include multi-modal scores derived from genomics and transcriptomics technology.
8. The modulating factors put into the risk calculator may include commonly used adjuvant therapy (such as platinum-based chemotherapy) or anti-angiogenesis drugs. In this scenario of response-prediction applications, two or more risk profiles will be presented, corresponding to the projected probability of outcome under alternative treatments, or no treatment.
Application of the signature to formalin-fixed paraffin embedded (FFPE) tumor materials.
The claimed signatures (the set of genes and the formula for deriving the risk scores) can be directly applied to expression data from technology platforms such as Affymetrix, qNPA or nanoString, after tissue preparation and raw data preprocessing procedure suitable for each respective platform.
To assess whether the signature risk scores can potentially provide the same prognostic value in qNPA and nanoString data with FFPE as in Affymetrix with fresh frozen tissue, comparisons were performed on materials from the same lung cancer patients (unpublished data).
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/059784 | 5/24/2012 | WO | 00 | 3/17/2014 |
Number | Date | Country | |
---|---|---|---|
61490021 | May 2011 | US |