The application relates to compositions and methods for classifying, diagnosing and prognosing lung cancer, particularly pulmonary adenocarcinoma (ADC).
Lung adenocarcinoma (ADC) accounts for approximately 35% of all lung cancers and has an overall 5-year survival of 17% (1). The recent World Health Organization (WHO) classification recognized a particular subtype, bronchioloalveolar carcinoma (BAC), for its non-invasive features and excellent prognosis (2). BAC has a distinct histological pattern with tumor cells growing along pre-existing alveolar framework, without evidence of stromal, pleural or vascular invasion. Yet, some invasive ADC, classified as mixed type, may have components or large areas of BAC-like pattern. Multi-stage development of adenocarcinoma putatively involves progression from atypical adenomatous hyperplasia (AAH) through BAC to invasive ADC with BAC features (AWBF) (3-5). Mice that express oncogenic KRAS develop histological changes that range from mild hyperplasia/dysplasia analogous to atypical adenomatous hyperplasia to alveolar adenomas and ultimately displayed overt ADC (6, 7).
The identification of genes/proteins that may distinguish BAC from AWBF, and are predictors of ADC with poor prognosis, would be useful for the establishment of novel molecular pathological classification of lung adenocarcinoma.
Disclosed herein are genomic and expression profiles of different subtypes of lung adenocarcinoma (ADC). A number of altered genomic regions have been identified that distinguish subtype of lung adenocarcinoma (ADC), specifically between bronchioloalveolar carcinoma (BAC) and invasive ADC with BAC features (AWBF), and genes or biomarkers whose expression are altered in individuals with pulmonary ADC according to different survival outcomes. The amplification and/or deletion of these genomic regions, and/or the biomarker expression profiles can be used to classify patients with ADC into a bronchioloalveolar carcinoma (BAC) group with excellent survival outcome, or an invasive ADC with BAC features (AWDF) group with higher risk of developing metastatic recurrence and poorer survival outcome.
Accordingly, one aspect of the application provides a method of classifying or prognosing a subject with lung ADC, comprising the steps:
(a) determining a genomic profile in a test sample from the subject,
(b) comparing the genomic profile with a control;
wherein a difference or a similarity in the genomic profile between the control and the test sample is used to classify the subject with lung ADC into a BAC or an invasive ADC group, and/or prognose the subject as having poor survival or a good survival.
In an embodiment, the control comprises a reference genomic profile associated of a disease free and/or non-tumor sample, and a difference in the genomic profile between the control and the test sample is indicative of invasive ADC. In an embodiment, the control comprises a threshold level, for example a gene copy number fold change threshold, above which the subject is classified as belonging to an invasive ADC group, is diagnosed as having invasive ADC such as AWBF, and/or is prognosed as having poor survival.
In an embodiment, the control comprises a reference genomic profile associated with invasive ADC and/or poor survival, and a similarity in the genomic profile between the control and the test sample is indicative that the subject with lung adenocarcinoma is classified as having invasive ADC, and/or is prognosed as having a poor survival.
In another embodiment, the control is a reference genomic profile corresponding to a subject with BAC and/or good survival, and a similarity in the genomic profile between the control and the test sample is indicative that the subject is classified as having BAC and/or prognosed as having good survival.
The above described genome alterations are reflected in a number of genes or biomarkers which are altered in their copy number and/or differentially expressed in individuals with pulmonary ADC. Detecting the gene copy number e.g. the amplification and/or deletion of these biomarkers and/or their differential expression can be used to classify patients with ADC into a BAC group, or an invasive ADC group, to diagnose the subject as having BAC or invasive ADC, such as AWBF, and/or to prognose the subject as having a good prognosis or a poor prognosis.
The amplification and/or deletion and/or differential expression of these biomarkers, for example the biomarkers in Tables 3 and 4, as well as in Table 13 can also be used to prognose patients with ADC into a poor survival group or a good survival group.
Accordingly, in an aspect, the application provides methods of classifying a subject with ADC into a BAC group with an excellent survival outcome, or an invasive ADC with higher risk of developing metastatic recurrence and a poor survival outcome, using biomarker gene copy number and/or biomarker expression product levels of one or more of the biomarkers described herein. The expression products can include RNA products and polypeptide products of the biomarkers.
An embodiment provides a method of classifying a subject with lung adenocarcinoma, comprising the steps:
(a) determining the gene copy number and/or the expression level of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 1, 2, 3, 4, and/or 13,
(b) comparing the gene copy number and/or the expression level of the one or more biomarkers with a control,
wherein a difference in the gene copy number and/or the expression level of the one or more biomarkers between the control and the test sample is used to classify the subject with lung adenocarcinoma into a BAC or an invasive ADC group.
Another aspect relates to diagnosing a subtype of lung adenocarcinoma in a subject comprising the steps:
(a) determining the expression level of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 1, 2, 3 and/or 4;
(b) comparing the expression of the one or more biomarkers with a control,
wherein a difference or a similarity in the expression of the one or more biomarkers between the test sample and the control is used to diagnose the subject has having BAC or invasive ADC.
Another embodiment provides a method for diagnosing a subtype of lung adenocarcinoma in a subject comprising the steps:
Another embodiment provides a method of prognosing a subject with lung adenocarcinoma, comprising the steps:
(a) determining a gene copy number and/or an expression level of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 1 and/or 3.
(b) comparing the gene copy number and/or the expression level of the one or more biomarkers with a control,
wherein a difference in gene copy number and/or expression level of the one or more biomarkers between control and the test sample is used to prognose the subject with lung adenocarcinoma into a poor survival group or a good survival group.
In another embodiment, the difference in gene copy number comprises a gene amplification of the one or more biomarkers. In another embodiment, the difference in gene copy number comprises a gene deletion of the one or more biomarkers. In certain embodiments, the difference in the gene copy number and/or expression level comprises amplification and/or increased expression of one or more of the genes in Table 1, 3 and/or 13 compared to a control. In other embodiments the gene copy number comprises deletions in and/or decreased expression of one or more genes in Table 2 and/or 4 compared to a control. In other embodiments, the gene copy number and/or expression level comprises amplification or increased expression of one or more genes in Table 1, 3 and/or 13 and deletions or decreased expression in one or more genes in Table 2 and/or 4 compared to a control. In certain embodiments, the control is a gene copy number of a gene in Table 1 or 2 from a disease free, and/or non-tumor sample.
In another embodiment, the gene amplification and/or an increased expression level of the one or more biomarkers of Table 1, 3 and/or 13 and/or a gene deletion and/or a decreased expression level of the one or more biomarkers of Table 2 and/or 4 between the control and the test sample is used to classify the subject with lung adenocarcinoma into a BAC or an invasive ADC group and/or prognose the subject with lung ADC into a poor survival group or a good survival group.
In another embodiment, a gene amplification and/or an increased expression level of the one or more biomarkers of Table 1 3 and/or 13 and/or a gene deletion and/or a decreased expression level of the one or more biomarkers of Table 2 and/or 4 between the control and the test sample is indicative that the subject with lung adenocarcinoma has invasive ADC and/or poor survival.
The one or more biomarkers whose level of expression is determined, is in one embodiment selected from Table 3, 4 and/or 13.
The prognoses, diagnoses and classifying methods of the application can be used to select treatment. For example, the methods can be used to select or identify what type of treatment is indicated.
Another aspect of the application provides compositions useful for use with the methods described herein. In an embodiment, the compositions comprise one or more primers for detecting a biomarker described herein.
The application also provides for kits used to classify, diagnose and/or prognose a subject with ADC into a BAC with good survival outcome or an invasive ADC with poorer survival outcome that includes detection agents that can detect the gene copy number or expression level of one or more of the biomarkers disclosed herein.
Other features and advantages of the present application will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the application, are given by way of illustration only, since various changes and modifications within the spirit and scope of the application will become apparent to those skilled in the art from this detailed description.
The application will now be described in relation to the drawings in which:
The application relates to genomic alterations, gene copy number variations and differential biomarker expression levels and profiles in subjects with lung adenomacarcinoma or NSCLC which are associated with a classification, diagnosis and/or prognosis and provides methods, compositions, detection agents and kits for classifying, diagnosing or prognosing a subject with lung adenocarcinoma or NSCLC.
The term “lung adenocarcinoma” and/or “lung ADC” and/or “pulmonary ADC” as used herein refer to a type of lung cancer and comprises various subtypes including bronchioloalveolar carcinoma (BAC) which is non invasive and/or includes focal invasion and has good prognosis (2) and invasive ADC including mixed type, which can have areas with BAC like pattern and is referred to as invasive ADC with BAC features (AWBF).
The term “invasive ADC” as used herein refers to lung ADC that is invasive, with or without areas of BAC like pattern and includes AWBF. Subjects with invasive ADC can have poor prognosis or good prognosis. Expression levels of biomarkers corresponding to genes, for example one or more genes listed in Table 3 and/or 4, are useful for differentiating more indolent from aggressive forms of invasive ADC, which have good prognosis.
The term “bronchioloalveolar carcinoma” or “BAC” as used herein refers to a subtype of lung ADC which is non-invasive and/or includes focal invasion (i.e. BAC with focal invasion) and has good prognosis.
The term “non-small cell lung cancer” as used herein refers to primary lung cancer that is distinguished from small cell lung cancer and that is composed of multiple different types, including adenocarcinoma, squamous cell carcinoma, large cell carcinoma and other less frequent types.
The term “biomarker” or “marker” as used herein refers to a gene that is altered in its gene copy number and/or is differentially expressed, in individuals with ADC according to ADC classification, diagnosis and/or prognosis. The biomarkers are diagnostic, useful for classifying subjects and predictive of different survival outcomes. For example the term “biomarkers” includes one or more of the genes listed in Table 1, 2, 3, 4 and/or 13 such as EPO, SLC25A17, POP7, PDCD6, SERPINE1, GNB2, and ST13.
As used herein, the term “control” refers to a specific value or dataset e.g., control expression level, control gene copy number, reference expression profile or reference genomic profile, derived from a known subject class e.g., from a sample of a disease free subject; a subject with BAC and/or a subject with invasive ADC, for example AWBF, and/or normal tissue such as tumor adjacent non-neoplastic tissue, that can be compared to and used to classify, diagnose or prognose the value or dataset derived from a test sample, e.g., expression level, gene copy number, expression profile or genomic profile, obtained from the test sample. For example, the control can be normal tissue. Normal tissue with respect to genomic profile refers to a single genomic copy on each of the two alleles. For example, the control can be derived from samples from a group of subjects known to have lung ADC and/or good survival outcome or known to have lung ADC and/or have poor survival outcome. In another example, the dataset can be derived from a sample from a group of subjects known to have BAC, or a group of subjects known to have invasive ADC and/or AWBF. The control is optionally a value such as a threshold level. For example, it is shown herein that for a desired or particular sensitivity and/or specificity the control can be a threshold level as indicated for example in Table 13. Accordingly for example, where the control is a threshold level for a particular biomarker (e.g. gene copy number fold change threshold), samples that have a gene copy number above the threshold value are classified as belonging to an invasive ADC group such as AWBF, diagnosed as having AWBF and/or prognosed to have poor survival and/or tumor progression. A person skilled in the art will recognize that different threshold levels can be used depending upon the desired specificity and sensitivity. Optionally one or more controls can be used, for example an internal control can be used with or without comparison to a control sample, such as a tissue sample. With respect to genomic alterations e.g. gains and losses, the control can for example also refer to an internal control e.g the copy number of a nonaltered region of the chromosome or a different chromosome e.g a chromosome with minimal variance in lung cancer subjects, for example a chromosome not herein or previously identified as associated with prognosis. Such methods wherein an internal control is useful include for example quantitative polymerase chain reaction (PCR) or fluorescent in situ hybridization (FISH). Optionally, the copy number can be compared to the centromere for example when using FISH. Typically a normal or control genomic profile refers to a single genomic copy on each of the two alleles. For example in the array-CGH, the control is a normal reference genomic DNA that is assumed to have 2 copies of each gene. In other examples, the control is optionally a positive control or a negative control, for example for quantitative PCR and/or FISH methods, for example included in quantitative PCR and/or FISH based kits. Based on the teachings herein and knowledge in the field, a person skilled in the art would readily be able to identify suitable controls for the methods described herein. Similarly, an internal control can be used to normalize for expression levels, for example a house keeping gene can be used in a quantitative RT-PCR protocol.
The term “gene copy number fold change threshold” refers to a value that identifies for a particular sensitivity and specificity a copy number that distinguishes between two classes, diagnoses and/or prognoses and which can be used to classify, diagnose or prognose tests samples, e.g the gene copy number in a test sample is compared to the gene copy number fold threshold (e.g. as a control) above which a subject is classified as belonging for example to a class with poor prognosis, diagnosed as having for example AWBF, and/or prognosed as having poor survival. For example, Table 13 indicates that the biomarker PPA1 has a gene copy number fold change threshold of 1.2 for a specificity of 91.7%, and a sensitivity of 53.3%. Test samples having a copy number of PPA1 above 1.2 are for example classified has having a poor prognosis, diagnosed as having AWBF and/or prognosed as having poor survival. For example, the gene copy number fold change threshold can be determined as described in Example 3.
The term “disease free subject” refers to a subject that is free of lung adenocarcinoma.
The term “reference profile” as used herein refers to a reference expression profile, a reference genomic profile, and/or a reference gene copy number profile according to the context.
A “reference expression profile” as used herein refers to the expression signature of a subset of biomarkers which correspond to genes associated with a clinical classification, diagnosis and/or outcome in a lung adenocarcinoma patient and/or ADC disease free subject. The reference expression profile can comprise a plurality of values, each value representing the expression level of a biomarker in a control, wherein each biomarker corresponds to a gene in Table 1, 2, 3, 4 and/or 13. For example, with respect to classification, the reference expression profile can refer to the expression signature of a subset of biomarkers listed in Table 1 and/or 2 which are differentially expressed in BAC and invasive ADC groups. With respect to prognosis, for example, the reference expression profile can refer to the expression signature of a subset of biomarkers listed in Table 3 and/or 4, which are differentially expressed in patients in a poor survival group or a good survival group. The reference expression profile is optionally derived de novo from a control and/or can be a standard value previously derived from one or more known control samples. For example, the reference expression profile can be a predetermined value for each biomarker or set of biomarkers derived from ADC patients whose biomarker expression values and/or survival outcomes are known. Using values from known samples allows one to develop an algorithm for classifying new patient samples into good and poor prognostic groups as described in the Example. The reference expression profile is identified using one or more samples comprising tumor wherein the expression is similar between related samples defining a class e.g., BAC or invasive ADC and/or an outcome group such as poor survival or good survival and is different to unrelated samples defining a different class and/or outcome group such that the reference expression profile is associated with a particular clinical class or outcome. The reference expression profile is accordingly a reference profile or reference signature of the expression of a subset of genes, for example the genes in Table 1, 2, 3, 4 and/or 13, to which the subject expression levels of the corresponding genes in a test sample can be compared in methods for determining or predicting clinical class or outcome. A person skilled in the art will recognize that a variety of methods can be used to determine a reference expression profile or an expression signature. For example, a reference expression profile or an expression signature can be determined by amplification of polynucleotides.
The term “expression level” as used herein refers to the absolute or relative amount of the transcription and/or translation product of a biomarker described herein and includes RNA and polypeptide products. A person skilled in the art will be familiar with a number of methods that can be used to determine RNA transcription levels, such as qRT-PCR and/or polypeptide levels such as immunohistochemistry.
A “reference gene copy number profile” as used herein refers to the gene copy number of a subset of genes listed in Tables 1, 2, 3, 4 and/or 13 associated with ADC classification, diagnosis and/or clinical outcome in a lung adenocarcinoma patient and/or ADC disease free subject. The reference gene copy number profile comprises a plurality of values, each value representing the copy number of a gene in Tables 1, 2, 3, 4 and/or 13. The reference gene copy number profile is identified using for example normal human tissue and/or cells and/or tissue and/or cells from lung ADC subtypes. Normal tissue and/or cells includes for example, tumor adjacent non-neoplastic tissue and/or cells and/or tissue and/or cells from a lung cancer disease free subject. The reference gene copy number profile is accordingly a reference signature of the copy number of a subset of genes in Tables 1, 2, 3, 4 and/or 13, to which the subject gene copy number of the corresponding genes in a test sample are compared.
The term “genomic profile” as used herein refers to the genomic structural signature of an individual's genome. A number of variations and alterations referred to as copy number variations, have been characterized including amplifications and deletions (e.g. losses and gains), a subset of which are associated with disease subtype and/or prognosis. The alterations can comprise small and large amplifications and/or deletions which can occur through out the genome.
The term “loss” or “gain” refers with respect to a genomic profile refers to a change in copy number, for example the loss can be on the plus strand or the minus strand and can involve loss of one or both alleles. Similarly, a “gain” can for example be a gain on the plus strand or the minus strand and can involve gain on one or both alleles. The gain can additionally be the gain of 1 or more copies.
The phrase “determining a genomic profile” as used herein refers to detecting the presence, frequency, variability and/or length of one or more genomic alterations including amplifications and deletions which may or may not comprise alterations in the nucleic acid sequence of genes e.g., can comprise alterations in the intergenic regions of the genome. Genomic alterations comprising amplifications and deletions in genes comprise those listed in Tables 1, 2, 3, 4 and/or 13. A person skilled in the art will appreciate that a number of methods can be used to determine a genomic profile, including for example fluorescence and other non-fluorescent types of in situ hybridization (FISH, CISH or others), amplification methods such as quantitative PCR (qPCR), multiplex PCR including for example multiplex ligation dependent probe amplification (MLPA) as well as array CGH.
Amplification of polynucleotides utilizes methods such as the polymerase chain reaction (PCR), including for example quantitative PCR, multiplex PCR and multiplex ligation dependent probe amplification (MLPA), ligation amplification (or ligase chain reaction, LCR) and amplification methods based on the use of Q-beta replicase. These methods are well known and widely practiced in the art. Reagents and hardware for conducting PCR are commercially available. Primers useful to amplify specific sequences from selected genomic regions are preferably complementary to, and hybridize specifically to sequences flanking the target genomic regions.
The term “reference genomic profile” as used herein refers a genomic signature comprising genomic alterations, associated with classification and/or clinical outcome in lung ADC patients and/or an ADC disease free subject. The reference genomic profile comprises a plurality of values, each value representing a change in a genomic region. The reference genomic profile is for example derived from normal human tissues and/or cells. The reference genomic profile is accordingly for example, normal genomic copy to which a subject genomic profile is compared for classifying the tumor, diagnosing a clinical subtype or determining or predicting clinical outcome.
The terms “complementary” or “complementarity”, as used herein, refer to the natural binding of polynucleotides under permissive salt and temperature conditions by base-pairing. For example, the sequence “A-G-T” binds to the complementary sequence “T-C-A”. Complementarity between two single-stranded molecules may be “partial”, in which only some nucleotides or portions of the nucleotide sequences of the nucleic acids bind, or it may be complete when total complementarity exists between the single stranded molecules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands.
The term “similar” or “similarity” as used herein with respect to a reference profile refers to similarly in both the identity and quantum of change in expression level of a biomarker, genomic alteration, or gene copy number variation compared to a control where the control is for example derived from a normal cell and/or tissue or has a known diagnosis, or outcome class such as poor survival or good survival.
The term “similarity in expression” as used herein means that there is no or little difference, for example no statistical difference, in the level of expression of the biomarkers between the test sample and the control and/or between classes, diagnostic groups, and good and poor prognosis groups defined by biomarker expression levels.
The term “most similar” in the context of a reference profile refers to a reference profile that is associated with a class, diagnosis or clinical outcome that shows the greatest number of identities and/or degree of changes with the subject profile.
The term “differentially expressed” or “differential expression” as used herein refers to biomarkers described herein that are expressed at one level in an ADC class, diagnostic or prognostic group and expressed at another level in a control. The differential expression can be assayed by measuring the level of expression of the transcription and/or translation products of the biomarkers, such as the difference in level of messenger RNA transcript expressed or polypeptide expressed in a test sample and a control. The difference can be statistically significant.
The term “difference in the level of expression” refers to an increase or decrease in the measurable expression level of a given biomarker expression product as measured by the amount of messenger RNA transcript and/or the amount of polypeptide in a sample as compared with the measurable expression level of a given biomarker in a control. In an embodiment, the differential expression can be compared using the ratio of the level of expression of a given biomarker or biomarkers as compared with the expression level of the given biomarker or biomarkers of a control, wherein the ratio is not equal to 1.0. For example, an RNA or polypeptide is differentially expressed if the ratio of the level of expression in a first sample as compared with a second sample is greater than or less than 1.0. For example, a ratio of greater than 1, 1.2, 1.5, 1.7, 2, 3, 4, 5, 10, 15, 20 or more, or a ratio less than 1, 0.8, 0.6, 0.4, 0.2, 0.1, 0.05, 0.001 or less. In another example, the differential expression is measured using p-value. For instance, when using p-value, a biomarker is identified as having a “difference in the level of expression” as between a first sample and a second sample when the p-value is less than 0.1, preferably less than 0.05, more preferably less than 0.01, even more preferably less than 0.005, the most preferably less than 0.001.
The term “prognosis” as used herein refers to a clinical outcome group such as a poor survival group or a good survival group which is reflected by a reference profile such as a reference expression profile, or a reference gene copy number profile, or reflected by an expression level of one or more biomarkers disclosed herein. It can also be reflected by genomic alterations. The prognosis provides an indication of disease progression and includes an indication of likelihood of recurrence, metastasis, death due to disease, tumor subtype or tumor type. The clinical outcome class includes a good survival group and a poor survival group.
The term “classifying” as used herein means identifying and/or diagnosing the clinical subtype of lung ADC. For example, lung ADC includes subtypes bronchioloalveolar carcinoma (BAC) which is non invasive and/or has focal invasions and has good prognosis (2) and invasive ADC including mixed type, which can have areas with BAC like pattern and is referred to as invasive ADC with BAC features (AWBF), which can have poor prognosis. “Classifying” can therefore refer to a method or process of determining whether an individual with ADC has BAC or invasive ADC and/or AWBF.
The term “diagnosing” as used herein means identifying an illness or subtype such as BAC or invasive ADC.
The term “prognosing” as used herein means predicting the course of disease or identifying the clinical outcome group a subject belongs to according to the subject's similarity to a control and/or a reference profile and/or biomarker expression level associated with the prognosis. For example, prognosing comprises a method or process of determining whether an individual with ADC has a good or poor survival outcome, or grouping an individual with ADC into a good survival group or a poor survival group. The term “good survival” as used herein refers to an increased chance of survival as compared to patients in the “poor survival” group. For example, the biomarkers of the application can prognose patients into a “good survival group” for example which includes subjects with BAC and/or less aggressive invasive ADC. These patients are at less risk of death 5 years after surgery. The good survival group comprises subjects having a 5 year survival rate of about 80% or more.
The term “poor survival” as used herein refers to an increased risk of death as compared to patients in the “good survival” group. For example, biomarkers or genes of the application can prognose patients into a “poor survival group” which include for example patients with more aggressive forms of invasive ADC and/or subjects with mixed type adenocarcinoma with BAC features (AWBF). These patients are at greater risk of death within 5 years from surgery. For example the poor survival group comprises subjects having a 5 year survival rate of less than about 80%.
As used herein, “treatment” is an indicated approach for obtaining beneficial or desired results, including clinical results, for example an indicated approach for lung ADC subtypes. Beneficial or desired clinical results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions, diminishment of extent of disease, stabilized (i.e. not worsening) state of disease, preventing spread of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, prolonging survival as compared to expected survival if not receiving treatment and remission (whether partial or total), whether detectable or undetectable. For example surgery or chemotherapy are indicated treatments for subjects with invasive ADC while BAC patients may be treated with limited resection or non-invasive or minimally invasive procedures.
“Palliating” a disease or disorder means that the extent and/or undesirable clinical manifestations of a disorder or a disease state are lessened and/or time course of the progression is slowed or lengthened, as compared to not treating the disorder.
The phrase “selecting a treatment” as used herein refers to selecting any indicated treatment that is useful for obtaining beneficial results such as prolonging survival and/or palliation.
The term “subject” as used herein refers to any member of the animal kingdom, preferably a human being. A “subject with ADC” as used herein includes a subject that has ADC or that is suspected of having ADC.
The term “test sample” as used herein refers to any fluid, cell or tissue sample from a subject which can be assayed for biomarker expression products and/or a reference expression profile, e.g. genes differentially expressed in subjects with ADC according to survival outcome and/or for which a genomic profile can be determined and includes without limitation tumor tissue and/or cells, derived from, for example, lung biopsy, for example obtained by bronchoscopy, needle aspiration, thoracentesis and/or thoracotomy, and/or derived from cells found in sputum.
The phrase “determining the expression level of biomarkers” as used herein refers to determining a level, including a relative level, or quantifying RNA transcripts and/or polypeptides expressed by the biomarkers. The term “RNA” includes mRNA transcripts, and/or specific spliced variants of mRNA. The term “RNA product of the biomarker” as used herein refers to RNA transcripts transcribed from the biomarkers and/or specific spliced variants. In the case of “polypeptide”, it refers to polypeptides translated from the RNA transcripts transcribed from the biomarkers. The term “polypeptide product of the biomarker” refers to polypeptide translated from RNA products of the biomarkers.
The term “nucleic acid” includes DNA and RNA and can be either double stranded or single stranded.
The term “hybridize” refers to the sequence specific non-covalent binding interaction with a complementary nucleic acid. In a preferred embodiment, the hybridization is under high stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, 6.0× sodium chloride/sodium citrate (SSC) at about 45° C., followed by a wash of 2.0×SSC at 50° C. may be employed.
The term “detection agent” as used herein refers to any molecule or compound that is useful for assessing the expression level, gene copy or genome profile of a biomarker in Tables 1, 2, 3, 4 and/or 13 or gene alteration described herein.
The term “primer” as used herein refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis of when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used. A primer typically contains 15-25 or more nucleotides, although it can contain less. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art. The term “primer” as used herein refers a set of primers which can produce a double stranded nucleic acid product complementary to a portion of the RNA products of the biomarker or sequences complementary thereof.
The term “probe” as used herein refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence. In one example, the probe hybridizes to an RNA product of the biomarker or a nucleic acid sequence complementary thereof. The length of probe depends on the hybridize conditions and the sequences of the probe and nucleic acid target sequence. In one embodiment, the probe is at least 8, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length.
The term “antibody” as used herein is intended to include monoclonal antibodies, polyclonal antibodies, and chimeric antibodies. The antibody may be from recombinant sources and/or produced in transgenic animals. The term “antibody fragment” as used herein is intended to include Fab, Fab′, F(ab′)2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, and multimers thereof and biospecific antibody fragments. Antibodies can be fragmented using conventional techniques. For example, F(ab′)2 fragments can be generated by treating the antibody with pepsin. The resulting F(ab′)2 fragment can be treated to reduce disulfide bridges to produce Fab′ fragments. Papain digestion can lead to the formation of Fab fragments. Fab, Fab′ and F(ab′)2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, biospecific antibody fragments and other fragments can also be synthesized by recombinant techniques.
The definitions and embodiments described in particular sections are intended to be applicable to other embodiments herein described for which they are suitable as would be understood by a person skilled in the art.
Disclosed herein are biomarkers, which are differentially expressed according to classification and/or prognosis in subjects with lung adenocarcinoma, including biomarkers whose gene copy number and/or expression level is increased, and biomarkers whose gene copy number and/or expression level is decreased. The biomarkers whose gene copy number and/or expression level is increased include, in one embodiment, one or more of the genes listed in Tables 1 and/or 3 and biomarkers whose gene copy number and/or expression level is decreased include in one embodiment, one or more of the genes listed in Table 2 and/or 4. Comparing biomarker gene copy number and/or expression level of one or more of these biomarkers to a control wherein the control optionally comprises a reference profile is useful for classifying a subject as belonging to a BAC group or an invasive ADC group, diagnosing a subject as having BAC or invasive ADC and/or is prognostic for poor survival or good survival. Combinations of these biomarkers are useful for prognosing, diagnosing and classifying subjects.
In a first aspect, the application provides a method of classifying or prognosing a subject with lung adenocarcinoma, comprising the steps:
(a) determining the expression level of a biomarker in a test sample from the subject, wherein each biomarker corresponds to a gene listed in Table 1, 2, 3 and/or 4,
(b) comparing the expression level of the one or more biomarkers with a control,
wherein a difference or a similarity in the expression level of the one or more biomarkers between the one or more controls and the test sample is used to classify the subject with lung adenocarcinoma into BAC or invasive ADC group and/or prognose the subject with lung adenocarcinoma into a poor survival group or a good survival group.
In another embodiment, the application provides methods for diagnosis. In an embodiment a method for diagnosing a subtype of lung adenocarcinoma in a subject is provided, the steps comprising:
(a) determining the expression level of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 1, 2, 3 and/or 4;
(b) comparing the expression of the one or more biomarkers with a control,
wherein a difference or a similarity in the expression of the one or more biomarkers between the test sample and the control is used to diagnose the subject has having BAC or invasive ADC.
The expression level of a biomarker can be determined for example by contacting the sample comprising nucleic acids (eg nucleic acid test sample) or polypeptides (e.g. polypeptide test sample) with a detection agent, such as a probe, primer set or antibody, to form for example a complex between the detection agent and the transcription product to thereby determine the level of expression of the biomarker (e.g. for comparison to control).
Another embodiment provides a method comprising;
(a) obtaining a nucleic acid test sample from a subject;
(b) contacting the sample with at least one nucleic acid probe to detect, or primer to amplify and identify the level of expression of one or more biomarkers selected from Tables 1, 2, 3, 4 and/or 13 in the subject's test sample,
wherein the level of expression of the one or more genes selected from Tables 1, 2, 3, 4 and/or 13 indicates the subtype of lung ADC, and/or classifies the subject as belonging to a BAC group or an invasive ADC group and/or indicates the subject has a poor prognosis or good prognosis.
Another embodiment provides a method comprising;
(a) obtaining a polypeptide test sample from a subject;
(b) contacting the sample with at least one antibody to detect the level of expression of one or more biomarkers selected from Tables 1, 2, 3, 4 and/or 13 in the subject's test sample,
wherein the level of expression of the one or more genes selected from Tables 1, 2, 3, 4 and/or 13 indicates the subtype of lung ADC, and/or classifies the subject as belonging to a BAC group or an invasive ADC group and/or indicates the subject has a poor prognosis or good prognosis.
In an embodiment, the one or more biomarkers correspond to one or more genes in Table 1 and/or 3, and wherein an increase in expression in one or more of the biomarkers is indicative the subject has invasive ADC.
In another embodiment, the one or more biomarkers correspond to one or more genes in Table 2 and/or 4 and wherein a decrease in expression of one or more biomarkers compared to the control is indicative the subject has invasive ADC.
In a further embodiment, the one or more biomarkers comprises one or more of the genes listed in Table 3, wherein an increase in the expression level of the one or more biomarkers compared to the control indicates the subject has invasive ADC.
In yet a further embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, or 27 or 28 of AP1S1, AP4M1, BRD9, CCDC21, CCL8, COPS6, CSDE1, EP300, GNB2, HIPK1, HRSP12, LAPTM4B, MCM7, MGC4677, OLFM2, POP7, PPA1, PDCD6, RABL4, RPL30, SERPINE1, SH3BGRL3, SLC25A17, ST13, TAF6, TLE3, TOB2, and ZNF561, wherein an increase in the expression level of the one or more biomarkers compared to the control indicates the subject has invasive ADC.
In another embodiment, the one or more biomarkers comprise SERPINE1, GNB2 and/or ST13 wherein an increase in the expression level of the one or more biomarkers compared to the control indicates the subject has invasive ADC.
In another embodiment, the one or more biomarkers comprises one or more of the genes listed in Table 4, wherein a decrease in the expression level of the one or more biomarkers compared to the control indicates the subject has invasive ADC.
In an embodiment, the subject with lung ADC is classified into a BAC or invasive ADC group or diagnosed as having BAC or invasive ADC. In another embodiment, BAC is non-invasive BAC. In another embodiment, the BAC is BAC with focal invasions. In another embodiment, the invasive ADC is AWBF.
In another embodiment, the subject with lung ADC is prognosed into a poor survival group or a good survival group.
The application discloses that the biomarkers are independently prognostic of outcome. These biomarkers are useful alone or in combination with other biomarkers disclosed herein. For example PDCD6 expression level has been found to be prognostic of poor survival.
Accordingly in one embodiment, the biomarker comprises PDCD6. In one embodiment, the subject PDCD6 expression level is increased significantly in a subject with poor survival compared to a control e.g., normal lung. In another embodiment, the subject PDCD6 expression level is increased significantly in a subject with poor survival compared to a control e.g., normal lung. In one embodiment, the significant difference is at least P<0.5%. In certain embodiments, the control comprises an average or mean expression level for more than one control, e.g., more than one normal lung or matched non-tumor sample. In one embodiment the increase is at least 25%, at least 50%, at least 75%, at least 100%, at least 2, at least 3, and/or 4 fold. In one embodiment the increase is at least 3 fold.
It was also determined that SERPINE1, GNB2 and/or ST13 expression is increased in subjects with poor outcome.
Accordingly, one embodiment of the application is a method of prognosing a subject with lung adenocarcinoma, comprising the steps:
(a) determining the expression of a biomarker in a test sample from the subject, wherein the biomarker comprises one or more of PDCD6, SERPINE1, GNB2, and ST13,
(b) comparing the expression of the one or more biomarkers a control,
wherein a difference or similarity in the expression of the one or more biomarkers between the control and the test sample is used to prognose the subject into a poor survival group or a good survival group.
In one embodiment, the biomarkers comprise at least 2 of PDCD6, SERPINE1, GNB2, and ST13.
In certain embodiments, the control is normal lung and/or non-tumor matched control and an increase in the expression of the one or more biomarkers between the test sample and the control is indicative that the subject with lung ADC is in a poor survival group. In other embodiments where the control is normal lung and/or non-tumor matched control, a similarity in the expression of the one or more biomarkers between the test sample and the control e.g. no or no statistical change, is indicative that the subject with lung ADC has a good survival.
In another aspect the application provides a method of classifying or prognosing a subject with lung adenocarcinoma, comprising the steps:
Another embodiment provides a method for diagnosing a subtype of lung adenocarcinoma in a subject comprising the steps:
In another aspect, the methods are used for identifying patients with poor prognosis. The application demonstrates that various expression profiles are associated with poor survival. For example, increased expression of genes listed in Table 3 and decreased expression of genes listed in Table 4 are associated with poor survival.
Accordingly in one embodiment the application provides, a method of predicting poor prognosis in a subject with lung adenocarcinoma, comprising the steps:
In one embodiment, the reference expression profile associated with poor survival comprises the expression level of at least one gene from Table 3. In one embodiment, the biomarker reference expression profile associated with poor survival comprises the expression level of 2 or more of AP1S1, AP4M1, BRD9, CCDC21, CCL8, COPS6, CSDE1, EP300, GNB2, HIPK1, HRSP12, LAPTM4B, MCM7, MGC4677, OLFM2, POP7, PPA1, PDCD6, RABL4, RPL30, SERPINE1, SH3BGRL3, SLC25A17, ST13, TAF6, TLE3, TOB2, or ZNF561.
In another embodiment, the reference expression profile associated with poor survival comprises the expression level of at least one gene from Table 4. In one embodiment, the biomarker reference expression profile associated with poor survival comprises the expression level of 2 or more of C5orf21, C5orf29, CACNA1D, CCDC13, CNTN6, CRTAP, DMXL1, EOMES, ERBB2IP, FEZF2, HRH1, LRAP, MEGF10, NPCDR1, PAM, PPWD1, RAB5A, SEMA6A, SFRS12, SNRK, TRIM36, TTC21A, ULK4, VIPR1, ZNF502.
The application further provides a method of predicting poor prognosis in a subject with lung adenocarcinoma, comprising the steps:
(a) determining the expression of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene listed in Table 1 and/or 3,
(b) comparing the expression of the one or more biomarkers to a control,
wherein an increase in expression of the one or more biomarkers between the test sample and the one or more controls is indicative of poor survival.
In one embodiment, the one or more biomarker is selected from Table 3. In one embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 of the genes AP1S1, AP4M1, BRD9, CCDC21, CCL8, COPS6, CSDE1, EP300, GNB2, HIPK1, HRSP12, LAPTM4B, MCM7, MGC4677, OLFM2, POP7, PPA1, PDCD6 RABL4, RPL30, SERPINE1, SH3BGRL3, SLC25A17, ST13, TAF6, TLE3, TOB2, or ZNF561.
In another embodiment, the application provides a method of predicting poor prognosis in a subject with lung adenocarcinoma, comprising the steps:
(a) determining the expression level of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene listed in Table 2 or 4,
(b) comparing the expression level of the one or more biomarkers with a control,
wherein a decrease in expression of the one or more biomarkers between the test sample and the control is indicative of poor survival.
In another embodiment, the one or more biomarker is selected from Table 4. In one embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 of the genes C5orf21, C5orf29, CACNA1D, CCDC13, CNTN6, CRTAP, DMXL1, EOMES, ERBB21P, FEZF2, HRH1, LRAP, MEGF10, NPCDR1, PAM, PPWD1, RAB5A, SEMA6A, SFRS12, SNRK, TRIM36, TTC21A, ULK4, VIPR1, or ZNF502.
In certain embodiments, the biomarkers comprise at least 2 of the genes listed in Table 1, 2, 3, 4 and/or 13. In another embodiment, the biomarkers comprise at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more of the gene listed in Tables 1, 2, 3, 4 and/or 13. In a further embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or 27 of the genes listed in Table 3. In a further embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 of the genes listed in Table 4. In another embodiment, the biomarkers comprise at least 2, 3-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, 101-100, 111-113 of the genes listed in Table 1 or 2. In another embodiment, the biomarkers comprise more than 113 of the genes listed in Table 1.
In another aspect, the application relates to genomic alterations in subjects with pulmonary adenocarcinoma according to different disease subtypes and survival outcomes. These genomic alterations can be used to classify individuals into a BAC or invasive ADC group and/or prognose individuals with ADC into a poor survival group or a good survival group.
Accordingly, one aspect of the application is a method of classifying a subject with lung adenocarcinoma or diagnosing the subject with a subtype of lung ADC in a subject, comprising the steps:
(a) determining a genomic profile in a test sample from the subject,
(b) comparing the genomic profile to a control,
wherein a difference or a similarity in the genomic profile between the control and the test sample is used to classify the subject with lung ADC into a BAC group or an invasive ADC group or diagnose the subject as having BAC or invasive ADC.
In an embodiment, the genomic alteration and/or difference in the genomic profile is an amplification (e.g. increased gene copy compared to normal gene copy) in the test sample and is used to classify the subject with lung adenocarcinoma into non-invasive BAC with minimal risk to develop metastasis or die of the disease, or invasive ADC with risk to develop and die of recurrence and metastasis. In another embodiment, the genomic alteration and/or difference in the genomic profile is a deletion.
In another embodiment, the control comprises normal human tissue or cells, for example lung tissue or cells.
Another aspect provides a method of prognosing a subject with lung ADC, comprising the steps:
(a) determining a genomic profile in a test sample from the subject,
(b) comparing the genomic profile with a control,
wherein a difference or a similarity in the genomic profile between the control and the test sample is used to prognose the subject into a poor survival group or a good survival group.
In another embodiment, the application provides methods for diagnosis. In an embodiment a method for diagnosing a subtype of lung adenocarcinoma in a subject is provided, the steps comprising:
(a) determining the gene copy number of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 1, 2, 3 and/or 4;
(b) comparing the gene copy number of the one or more biomarkers with a control,
wherein a difference or a similarity in the gene copy number of the one or more biomarkers between the test sample and the control is used to diagnose the subject has having BAC or invasive ADC.
Another embodiment provides a method for diagnosing a subtype of lung adenocarcinoma in a subject comprising the steps:
Another embodiment provides a method comprising;
(a) obtaining a nucleic acid test sample from a subject;
(b) contacting the sample with at least one nucleic acid probe to detect, or primer to amplify and identify the level of expression or gene copy number of one or more biomarkers selected from Tables 1, 2, 3, 4 and/or 13 in the subject's test sample,
wherein the level of expression or gene copy number of the one or more genes selected from Tables 1, 2, 3, 4 and/or 13 indicates the subtype of lung ADC, and/or classifies the subject as belonging to a BAC group or an invasive ADC group and/or indicates the subject has a poor prognosis or good prognosis.
The gene copy number of a biomarker can be determined for example by contacting the sample comprising nucleic acids (eg nucleic acid test sample) with a detection agent, such as a probe, or primer set, to form for example a complex between the detection agent and the genomic region to thereby determine the gene copy number (or relative gene copy number) of the biomarker (e.g. for comparison to control). The control in one embodiment is a gene copy number fold change threshold.
In an embodiment, the control comprises a threshold level, for example a gene copy number fold change threshold, above which the subject is classified as belonging to an invasive ADC group, is diagnosed as having invasive ADC such as AWBF, and/or is prognosed as having poor survival. In an embodiment, the gene copy number fold change threshold is at least 1.9, at least 1.8, at least 1.7, at least 1.6, at least 1.5, at least 1.4, at least 1.3, at least 1.2, or at least 1.1. In an embodiment, the biomarker is selected from Table 12 and has a gene copy number fold change threshold of at 1.5.
Another embodiment provides a method comprising;
(a) determining the expression level, genomic alteration or gene copy number of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 1, 2, 3 and/or 4;
(b) normalizing the value of the expression level, genomic alteration or gene copy number to an internal control house keeping gene;
(c) comparing the normalized value for the expression level, genomic alteration or gene copy number of the one or more biomarkers with the average normalized expression value, genomic alteration or gene copy number of the one or more genes in a control,
(d) predicting the subtype of lung ADC, and/or the prognosis,
wherein a difference or a similarity in the normalized expression level, genomic alteration or gene copy number of the one or more biomarkers between the test sample and the control is used to diagnose the subject as having BAC or invasive ADC and/or to prognose the subject as having a poor prognosis or a good prognosis.
In an embodiment, the house keeping gene is selected from MAP2 (microtubule-associated protein 2) and B2M (beta-2-microglobulin), ACTB (Actin, beta), B2M, TBP (TATA box binding protein) and BAT1 (HLA-B associated transcript 1). The housekeeping gene can be used to normalize gene copy number and/or expression levels.
In an embodiment, the genomic alteration and/or difference in the genomic profile is an amplification (e.g. increased gene copy compared to normal gene copy) in the test sample and is used to prognose the subject with lung adenocarcinoma into a poor survival group or a good survival group. In another embodiment, the genomic alteration and/or difference in the genomic profile is a deletion.
In another embodiment, the control comprises normal human tissue or cells, for example lung tissue or cells.
The genome amplifications can comprise genes or portions thereof. For example amplified genes associated with increased tumor invasion and progression and/or higher gene content detected in subjects with AWBF compared with BAC include genes listed in Table 1, 3 and 13. Accordingly, in one embodiment, the genome amplification comprises one or more genes listed in Table 1, 3 or 13. In another embodiment, the genomic alteration comprises one or more of EPO, SERPINE1, SLC25A17, and POP7.
In a further embodiment, the genomic alterations comprise genome deletions. In one embodiment, the genome deletions comprise deletions in 3p, 5q, 4q and/or 6q. In a further embodiment, the genome deletions in 3p and 5q comprise one or more of the genes listed in Table 2.
As mentioned, the genomic alterations can comprise amplifications and deletions comprising genes or gene segments which result in gene copy number variations. For example, wherein a gene is amplified, the gain is referred to as a gene copy gain; wherein a gene is deleted, the deletion is referred to as a gene deletion. A subject without disease is typically diploid for genes in somatic cells. Accordingly the application provides a method of detecting gene copy number variations associated with lung ADC subtype and prognosis.
In one embodiment, the application provides a method of classifying a subject with lung adenocarcinoma and/or diagnosing the subject with a subtype of lung ADC, comprising the steps:
In another embodiment, the application provides a method of prognosing a subject with lung adenocarcinoma, comprising the steps:
In an embodiment, the one or more genes are selected from Tables 3, 4 and/or 13. In another embodiment, the genes are selected from EPO, SERPINE1, SLC25A17, and POP7.
In certain embodiment, the gene copy number of the control is a diploid gene copy number of the gene
A further aspect provides a method of predicting prognosis in a subject with lung adenocarcinoma, comprising the steps:
In one embodiment, the genes are selected from Table 3 and/or 4. In another embodiment, the genes are selected from Table 13. In yet a further embodiment, the genes are selected from EPO, SERPINE1, SLC25A17, and POP7. In certain embodiments, the prognosis associated with the one or more reference gene copy number profiles comprise a poor survival group and a good survival group.
In yet a further embodiment, the application provides a method of classifying a subject with lung adenocarcinoma, comprising the steps:
In another embodiment, the application provides a method of diagnosing a subtype of lung ADC in a subject with lung adenocarcinoma. The methods described herein are also useful for screening subjects for early diagnosis as described in the examples. In an embodiment, one or more biomarkers selected from the genes listed in Tables 1, 2, 3, 4 and 13 can be used with the methods described herein to screen a subject suspected of having lung cancer or lung ADC. In another embodiment, an expression profile or gene copy number profile of a subject suspected of having lung cancer or lung ADC is compared to a reference profile to determine if the subject has BAC or invasive ADC.
In certain embodiments, the one or more gene copy gains comprises TERT and/or PDCD6. In another embodiment, the one or more gene copy gains comprises 2 or more genes listed in Table 1, 3 and/or 13. In another embodiment, the one or more gene copy gains comprises at least 3, 4, 5, 6, 7, 8, 9 or 10 genes listed in Table 1, 3 and/or 13. In yet a further embodiment, the gene copy gains comprise gains in at least 10-20, or 20-30 genes listed in Table 1, 3 and/or 13. In another embodiment, the gene copy gains consist of gains in the genes listed in Table 3 or 13.
Another aspect provides detecting gene deletions. In one embodiment, the gene deletions comprise at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 of the genes listed in Table 2 and/or 4. In another embodiment, the gene deletions comprise at least 10-20, or 20-30 of the genes listed in Table 2 and/or 4. In yet a further embodiment, the gene deletions comprise gains in at least 10-20, or 20-30 genes listed in Table 2. In another embodiment, the gene deletions consist of gains in the genes listed in Table 4.
In certain embodiments, the biomarkers or genes comprise at least 1 of the genes listed in Table 1, 2, 3, 4 and/or 13. In another embodiment, the biomarkers and/or genes comprise at least 2 of the genes listed in Table 1, 2, 3, 4 and/or 13. In another embodiment, the biomarkers comprise at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more of the gene listed in Tables 1, 2, 3, 4 or 13. In a further embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or 27 of the genes listed in Table 3, and/or 13. In a further embodiment, the biomarkers comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 of the genes listed in Table 4. In another embodiment, the biomarkers comprise at least 2, 3-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, 101-100, 111-113 of the genes listed in Table 1 or 2. In another embodiment, the biomarkers comprise more than 113 of the genes listed in Table 1.
BAC and AWBF are subtypes of pulmonary adenocarcinoma which vary in disease outcome. BAC has good survival approaching 100% by certain assessments whereas AWBF has a relatively poor 5 year survival rate. It was demonstrated that BAC and AWBF have different genomic profiles. For example, it was shown that the 28 genes listed in Table 3 and the 25 genes listed in Table 4 are prognostic in ADC stage I patients. Further 33 genes listed in Table 13 have a higher gene content in AWBF compared with BAC and are therefore useful for diagnosing the subtype of lung ADC and prognosing survival, tumor invasion and progression. A ranking of these genes is provided in Table 13. The genes in Table 13 are ranked according to their diagnostic utility. For example, EPO, SERPINE1, SLC25A17, POP7 were found to have maximal ROC AUC and minimal QPCR copy number fold change threshold. Further it was demonstrated that PDCD6 is prognostic in ADC stages I-II and in the entire group stage I-II NSCLC. Accordingly, the application provides in certain embodiments methods that are useful for classifying, diagnosing, screening and/or identifying subjects with BAC. The application also demonstrates that the methods disclosed herein are useful for prognosing survival in early stage adenocarcinoma. Accordingly in one embodiment, the methods disclosed herein diagnose, prognose or classify a subject having or suspected of having lung adenocarcinoma in stage I-II lung adenocarcinoma.
In one embodiment, increased PDCD6 expression in stage I-II ADC is prognostic for poor survival.
In addition to the methods disclosed being prognostic in lung adenocarcinoma, they are also useful for predicting prognosis in subjects with non-small cell lung cancer (NSCLC). For example, PDCD6 is useful for prognosing subjects with NSCLC. Accordingly in one embodiment, the application provides a method of predicting poor prognosis in a subject with NSCLC comprising the steps:
(a) determining the expression level of PDCD6 in a test sample from the subject,
(b) comparing the expression level of PDCD6 with one or more controls, wherein an increase in expression of PDCD6 between the test sample and the one or more controls is indicative of poor survival.
In another embodiment, the PDCD6 gene copy number is assessed according to a method described herein wherein increased PDCD6 gene copy number is indicative of poor survival.
The methods described herein for classifying and diagnosing lung ADC subjects and prognosing survival can be combined with other methods for classifying, diagnosing and/or prognosing subjects with lung ADC such as other methods described herein or known in the art. A person skilled in the art would understand for example, that classification methods described herein can be combined with other methods of classifying and/or diagnosing lung ADC subtypes to obtain a confirmed and/or more accurate diagnosis. Similarly, other methods of prognosing survival can be combined with the methods described herein for more accurate prediction of survival.
In another aspect, the application provides a method of selecting a treatment for a subject with lung ADC.
Accordingly, the application provides a method of selecting a treatment for a subject with adenocarcinoma, comprising the steps:
In an embodiment, the application provides a method of selecting a treatment for a subject with lung ADC, the method comprising:
(a) determining the expression level or gene copy number of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 3, 4 and/or 13;
(b) comparing the expression level or gene copy number of the one or more biomarkers with a control,
(c) selecting chemotherapy or surgery when the subject has an increase in the expression level of one or more biomarkers from Table 3, a decreased expression of one or more biomarkers from Table 4 and/or an increase in the copy number of one or more biomarkers from Table 13.
Another embodiment provides a method of selecting a treatment for a subject with lung ADC comprising the steps:
In another embodiment, the application provides a method of selecting a treatment for a subject with lung adenocarcinoma comprising the steps:
In an embodiment, the application provides a method of treating a subject with lung ADC, the method comprising:
(a) determining the expression level or gene copy number of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 3, 4 and/or 13;
(b) comparing the expression level and/or gene copy number of the one or more biomarkers with a control,
(c) treating the subject with chemotherapy or surgery when the subject has an increase in the expression level of one or more biomarkers from Table 3 a decreased expression of one or more biomarkers from Table 4, and/or an increase in the gene copy number of one or more biomarkers listed in Table 13.
Another embodiment provides a method of treating a subject with lung ADC comprising the steps:
Another embodiment provides use of chemotherapy or surgery to treat a subject with invasive ADC, comprising:
(a) determining the expression level or gene copy number of one or more biomarkers in a test sample from the subject, wherein each biomarker corresponds to a gene in Table 3, 4 and/or 13; and
(b) comparing the expression level or gene copy number of the one or more biomarkers with a control,
wherein chemotherapy or surgery is indicated when the subject has an increase in the expression level of one or more biomarkers from Table 3, a decrease in the expression level of one or more biomarkers from Table 4 and/or an increase gene copy number of one or more biomarkers listed in Table 13.
In another embodiment, the application provides use of chemotherapy or surgery to treat a subject with invasive ADC comprises:
For example treatments associated with good survival include local radiation and limited/localized surgery, localized treatment (radiofrequency ablation), whereas treatments associated with poor survival include surgery and/or chemotherapy and/or targeted therapy (biopathway targeting, drugs). In an embodiment the treatment selected for a subject identified as having BAC or in the good survival group comprises local radiation and limited/localized surgery, localized treatment (radiofrequency ablation). In another embodiment, the treatment selected for a subject identified as having aggressive ADC or in the poor survival group comprises surgery and/or chemotherapy and/or targeted therapy (biopathway targeting, drugs).
The test sample and/or control can be any fluid, cell or tissue sample from a subject which can be assayed for biomarker expression products, particularly genes differentially expressed in subjects with ADC according to survival outcome and/or for which the genomic profile can be determined, including detecting genomic alterations, and gene copy number variations. In one embodiment, the test sample is a cell, cells or tissue from a tumor biopsy from the subject. In an embodiment, the test sample comprises a tissue sample comprising at least one tumor cell. For example, methods for detecting gene expression in a single cell are known in the art.
The sample and/or control is in one embodiment tumor tissue and/or cells, derived from, for example, lung biopsy, for example obtained by bronchoscopy, needle aspiration, thoracentesis and/or thoracotomy, and/or derived from cells found in sputum. In an embodiment, the sample and control are similar or the same sample type, eg both are lung biopsies.
The biomarker expression levels described herein can be determined by for example immunohistochemical staining and/or in situ. Accordingly in one embodiment, the test sample comprises a tissue sample suitable for immunohistochemistry or in situ hybridization.
The test sample and the control (e.g. reference profiles) can be similar sample types for example they can both comprise tumor cells from a subject with ADC. In another embodiment, the control can be an actual sample from a subject known to have ADC and good survival outcome or known to have ADC and have poor survival outcome. More specifically, in one embodiment, the control can be an actual sample from a subject known to have BAG, or known to have AWBF. The control is in certain embodiments, a normal or non-tumor cell sample. As mentioned previously, the control can be a threshold value, such as a gene copy number fold change threshold, and/or a previously determined expression or gene copy number.
A person skilled in the art will appreciate that the comparison between the genomic profile, gene copy number profile, and/or expression of the biomarkers in the test sample and the reference genomic profile, reference gene copy number profile, reference expression profile and/or expression level of the biomarkers in the control will depend on the control used. For example, if the control is from a subject known to have ADC and poor survival, and there is a difference in the genomic profile and/or expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a good survival group. If the control is from a subject known to have ADC and good survival, and there is a difference in the genomic profile and/or expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a poor survival group. For example, if the control is from a subject known to have ADC and good survival, and there is a similarity in the genomic profile and/or expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a good survival group. For example, if the control is from a subject known to have ADC and poor survival, and there is a similarity in the genomic profile and/or expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a poor survival group.
A person skilled in the art will appreciate that a number of methods can be used to detect or quantify the level of RNA products of the biomarkers within a sample, including microarrays, RT-PCR (including quantitative RT-PCR), nuclease protection assays and Northern blot analyses and in situ hybridization.
In addition, a person skilled in the art will appreciate that a number of methods can be used to determine the amount of a protein product of the biomarker of the application, including immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
In addition, a person skilled in the art will appreciate that a number of methods can be used to detect or quantify genomic alterations and gene copy number variations such as amplifications and deletions including array comparative genome hybridization, quantitative PCR (qPCR) and FISH.
Accordingly in one embodiment, determining the biomarker expression level comprises use of RT-qPCR, expression array (for example the U122 plus 2 array) or immunohistochemistry In another embodiment, obtaining an expression profile comprises use of quantitative PCR (qPCR), RT-qPCR or an array. In certain embodiments, the array is a U122 plus 2 array.
In other embodiments, determining the biomarker expression level comprises use of an antibody.
In certain embodiments, the step of determining genome alteration or gene copy number comprises PCR and/or quantitative PCR (qPCR).
FISH analysis can also be utilized to detect genomic alterations. Accordingly in one embodiment, the step of determining the genome alteration or gene copy number comprises FISH analysis.
A person skilled in the art will appreciate that a number of detection agents can be used to determine the expression of the biomarkers. For example, to detect RNA products of the biomarkers, probes, primers, complementary nucleotide sequences or nucleotide sequences that hybridize to the RNA products can be used.
Similarly, a person skilled in the art will appreciate that a number of detection agents can be used to determine genomic alterations and gene copy number variations of the biomarkers. For example, to detect gene copy number, probes such as probes suitable for FISH, primers, complementary nucleotide sequences or nucleotide sequences that hybridize to the gene can be used.
To detect protein products of the biomarkers, ligands or antibodies that specifically bind to the protein products can be used.
Antibodies having specificity for a specific protein, such as the protein product of a biomarker, may be prepared by conventional methods. A mammal, (e.g. a mouse, hamster, or rabbit) can be immunized with an immunogenic form of the peptide, which elicits an antibody response in the mammal. Techniques for conferring immunogenicity on a peptide include conjugation to carriers or other techniques well known in the art. For example, the peptide can be administered in the presence of adjuvant. The progress of immunization can be monitored by detection of antibody titers in plasma or serum. Standard ELISA or other immunoassay procedures can be used with the immunogen as antigen to assess the levels of antibodies. Following immunization, antisera can be obtained and, if desired, polyclonal antibodies isolated from the sera.
To produce monoclonal antibodies, antibody producing cells (lymphocytes) can be harvested from an immunized animal and fused with myeloma cells by standard somatic cell fusion procedures thus immortalizing these cells and yielding hybridoma cells. Such techniques are well known in the art, (e.g. the hybridoma technique originally developed by Kohler and Milstein (Nature 256:495-497 (1975)) as well as other techniques such as the human B-cell hybridoma technique (Kozbor et al., Immunol. Today 4:72 (1983)), the EBV-hybridoma technique to produce human monoclonal antibodies (Cole et al., Methods Enzymol, 121:140-67 (1986)), and screening of combinatorial antibody libraries (Huse et al., Science 246:1275 (1989)). Hybridoma cells can be screened immunochemically for production of antibodies specifically reactive with the peptide and the monoclonal antibodies can be isolated.
A person skilled in the art will appreciate that the detection agents can be labeled.
The label is preferably capable of producing, either directly or indirectly, a detectable signal. For example, the label may be radio-opaque or a radioisotope, such as 3H, 14C, 32P, 35S, 123I, 125I, 131I; a fluorescent (fluorophore) or chemiluminescent (chromophore) compound, such as fluorescein isothiocyanate, rhodamine or luciferin; an enzyme, such as alkaline phosphatase, beta-galactosidase or horseradish peroxidase; an imaging agent; or a metal ion.
Conventional techniques of molecular biology, microbiology and recombinant DNA techniques, which are within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook, Fritsch & Maniatis, 1989, Molecular Cloning: A Laboratory Manual, Second Edition; Oligonucleotide Synthesis (M. J. Gait, ed., 1984); Nucleic Acid Hybridization (B. D. Harnes & S. J. Higgins, eds., 1984); A Practical Guide to Molecular Cloning (B. Perbal, 1984); and a series, Methods in Enzymology (Academic Press, Inc.); Short Protocols In Molecular Biology, (Ausubel et al., ed., 1995).
Another aspect of the application provides a composition for detecting a biomarker expression level or a genomic alteration.
Accordingly, one aspect provides a composition comprising a plurality of two or more isolated nucleic acid sequences, wherein each isolated nucleic acid sequence hybridizes to:
In one embodiment, the composition comprises a detection agent optionally an antibody, a probe or a primer, said detection agent binding a biomarker from Tables 1, 2, 3, and/or 4 and/or a suitable carrier.
In one embodiment the composition comprises primers that specifically amplify a gene or gene expression product listed in Tables 1, 2, 3, 4 and/or 13. In another embodiment, the composition comprises one or more probes that specifically bind to a gene, its expression product or the complement of either of a gene listed in Tables 1, 2, 3, 4 and/or 13. In one embodiment the composition comprises one or more primers listed in Table 10 and/or 11. In one embodiment, the composition comprises one or more primers listed in Table 10 for amplifying one or more of AP1S1, AP4M1, BRD9, CCDC21, CCL8, COPS6, CSDE1, EP300, GNB2, HIPK1, HRSP12, LAPTM4B, MCM7, MGC4677, OLFM2, POP7, PPA1, PDCD6, RABL4, RPL30, SERPINE1, SH3BGRL3, SLC25A17, ST13, TAF6, TLE3, TOB2, and/or ZNF561. In another embodiment, the composition comprises one or more primers listed in Table 11 for amplifying one or more of C5orf21, C5orf29, CACNA1D, CCDC13, CNTN6, CRTAP, DMXL1, EOMES, ERBB21P, FEZF2, HRH1, LRAP, MEGF10, NPCDR1, PAM, PPWD1, RAB5A, SEMA6A, SFRS12, SNRK, TRIM36, TTC21A, ULK4, VIPR1, and/or ZNF502. A person skilled in the art would readily be able to design additional primers that are suitable for quantitatively detecting gene alterations, gene copy number and biomarker expression level of one or more of the genes listed in Tables 1, 2, 3, 4 and/or 13.
In one embodiment, the composition comprises isolated nucleic acids which are useful for amplifying and/or hybridize to the RNA products of PDCD6, SERPINE1, GNB2 and/or ST13.
Another aspect provides a composition comprising a plurality of two or more detection agents such as antibodies, wherein each antibody specifically binds to a biomarker polypeptide product of 2 or more gene listed in Tables 1 and/or 2 wherein the composition is used to detect the level of biomarker polypeptide product of 2 or more genes.
Another aspect of the application provides an array for use in the methods described herein. In one embodiment, the application provides an array comprising a substrate having a plurality of addresses, wherein each address has disposed thereon a capture probe that can specifically bind a nucleic acid corresponding to each gene or a subset of genes listed in Tables 1, 2, 3, 4 and/or 13.
In another embodiment, the application provides an array comprising for each gene in a plurality of genes, the plurality of genes being at least 3 of the genes listed in Tables 1, 2, 3, 4 and/or 13 one or more polynucleotide probes complementary and hybridizable to an expression product in the gene. In one embodiment, the plurality of genes comprises the genes listed in Table 3 and/or 4. In another embodiment, the plurality of genes consists of the genes listed in Table 3, 4 and/or 13.
The application also provides for kits used to diagnose, prognose or classify a subject with ADC into a good survival group or a poor survival group that includes detection agents that can detect the expression products of the biomarkers.
In one embodiment, the application provides a kit to diagnose, prognose or classify a subject with ADC, comprising one or more detection agents that can detect the expression products of biomarkers, wherein the biomarkers comprise 1 or more of the genes in Tables 1, 2, 3, 4 and/or 13 and instructions for use.
Accordingly, the application includes a kit to diagnose, prognose or classify a subject with pulmonary adenocarcinoma, comprising detection agents that can detect the expression products of biomarkers, wherein the biomarkers comprise at least one biomarker listed in Tables 1, 2, 3, 4 and/or 13. In one embodiment, the biomarkers comprise at least one of PDCD6, SERPINE1, GNB2, and ST13.
The application also provides kits used to diagnose, prognose or classify a subject with ADC into a good survival group or a poor survival group that includes agents that can be used determine a genomic profile of a subject.
Accordingly, in one embodiment, the application provides a kit to diagnose, prognose or classify a subject with ADC, comprising one or more detection agents that can detect genomic alterations comprising genes, wherein the genes comprise 1 or more of the genes in Tables 1, 2, 3, 4 and/or 13 and instructions for use.
In another embodiment, the application provides a kit that can be used to diagnose, prognose or classify a subject with ADC into a good survival group or a poor survival group that includes agents that can be used to detect gene copy number variations.
In one embodiment, the application provides a kit to diagnose, prognose or classify a subject with early stage non-small cell lung cancer, comprising detection agents that can detect the expression products of biomarkers and instructions for use, wherein the biomarkers comprise one or more of PDCD6, SERPINE1, GNB2, and ST13.
In another embodiment, the application provides a kit to select a treatment for a subject with ADC, comprising one or more detection agents that can detect the expression products of biomarkers, wherein the biomarkers comprise one or more of the genes in Tables 1, 2, 3, 4 and/or 13 and instructions for use.
In an embodiment the kit comprises primers that specifically amplify a gene or gene expression product listed in Tables 1 and/or 2 and instructions for use. In another embodiment, the kit comprises one or more probes that specifically bind to a gene, its expression product or the complement of either of a gene listed in Tables 1 and/or 2 and instructions for use. In one embodiment the kit comprises one or more primers listed in Table 10 and/or 11 and instructions for use. In one embodiment, the kit comprises one or more primers listed in Table 10 for amplifying one or more of AP1S1, AP4M1, BRD9, CCDC21, CCL8, COPS6, CSDE1, EP300, GNB2, HIPK1, HRSP12, LAPTM4B, MCM7, MGC4677, OLFM2, POP7, PPA1, PDCD6, RABL4, RPL30, SERPINE1, SH3BGRL3, SLC25A17, ST13, TAF6, TLE3, TOB2, and/or ZNF561 and instructions for use. In another embodiment, the kit comprises one or more primers listed in Table 11 for amplifying one or more of C5orf21, C5orf29, CACNA1D, CCDC13, CNTN6, CRTAP, DMXL1, EOMES, ERBB21P, FEZF2, HRH1, LRAP, MEGF10, NPCDR1, PAM, PPWD1, RAB5A, SEMA6A, SFRS12, SNRK, TRIM36, TTC21A, ULK4, VIPR1, and/or ZNF502 and instructions for use.
The kit can also include a control or reference standard and/or instructions for use thereof. In addition, the kit can include ancillary agents such as vessels for storing or transporting the detection agents and/or buffers or stabilizers.
The above disclosure generally describes the present application. A more complete understanding can be obtained by reference to the following specific examples. These examples are described solely for the purpose of illustration and are not intended to limit the scope of the application. Changes in form and substitution of equivalents are contemplated as circumstances might suggest or render expedient. Although specific terms have been employed herein, such terms are intended in a descriptive sense and not for purposes of limitation.
The following non-limiting examples are illustrative of the present application:
Bronchioloalveolar carcinoma (BAC), a subtype of lung adenocarcinoma (ADC) without stromal, vascular or pleural invasion is considered an in situ tumor with 100% survival rate. However, the histological criteria for invasion remain controversial. BAC-like areas may accompany otherwise invasive adenocarcinoma, referred to as mixed type adenocarcinoma with BAC features (AWBF). AWBF are considered to evolve from BAC, representing a paradigm for malignant progression in ADC. However, the supporting molecular evidence remains forthcoming. The genomic changes of BAC and AWBF were studied by array comparative genomic hybridization (CGH). Using submegabase resolution tiling set array CGH, the genomic profiles of 14 BAC or BAC with focal area suspicious for invasion were compared to those of 15 AWBF. Threshold-filtering and Frequency-scoring analysis found that genomic profiles of non-invasive and focally-invasive BAC are indistinguishable, and show fewer aberrations than tumor cells in BAC-like area of AWBF. These aberrations occurred mainly at the sub-telomeric chromosomal regions. Increased genomic alterations were noted between BAC-like and invasive areas of AWBF. 113 genes that best differentiated BAC from AWBF were identified and were considered candidate marker genes for tumor invasion and progression. Correlative gene expression analyses demonstrated a high percentage of them as poor prognosis markers in early stage ADC. Quantitative polymerase chain reaction also validated the amplification and overexpression of PDCD6 and TERT on 5p, and the prognostic significance of PDCD6 in early stage ADC patients. Provided are novel candidate genes that may be responsible for and be markers for malignant progression in AWBF.
Most chromosomal changes in both BAC and AWBF were subtle indicating low levels of genomic alteration, as well as partial attenuation by contaminating non-neoplastic host cells. The profiles of BAC and BAC with focal areas suspicious for invasion were indistinguishable and showed low copy gains. AWBF had similar chromosomal changes but with greater variability and frequency and longer segmental alterations. Deletions were also more common in AWBF. In two patients with synchronous BAC and invasive AWBF, the BAC-like area of the latter showed greater aberrations than the BAC. In two other AWBF, greater alterations were also noted in invasive compared to BAC-like areas. Normal lung samples showed no alteration of these regions.
Using threshold-filtering, 119 clones that distinguished BAC from AWBF were identified. Hierarchical clustering of all cases using these clones separated BAC from AWBF samples In addition, a Fisher's Exact Test comparing the frequency of genomic changes between the BAC and AWBF groups yielded a list of 517 clones that best differentiated the two lesions. Integrating these two analyses was accomplished by applying a 10 clone “window” to identify shared regions. This resulted in a list of 256 candidate clones of high interest, from which a shorter list of 58 clones with gains in AWBF compared to BAC was selected. These clones included 113 unique amplified genes (Table 1) that could represent invasion and tumor progression markers for AWBF.
qPCR validated the gene content changes in 33 of the 113 candidate marker genes. Among the evaluated genes were TERT and PDCD6, which were selected for further validation by qPCR and/or FISH, based on their location on chromosome 5p that showed prominent genomic changes (
Using RT-qPCR, it was demonstrated that in 10 separate pairs of invasive ADC and their corresponding non-neoplastic lung tissues, PDCD6 was overexpressed in tumor compared to normal lung tissue (p<0.01), with a mean 3-fold increase in expression (
A correlative gene expression study using external and internally available lung adenocarcinoma gene expression microarray datasets was then performed, starting with the 113 amplified genes. Analysis of the Toronto, Harvard and Michigan datasets discovered that 35%, 33% and 29% of the genes were overexpressed; a fraction are expected to be based on gene amplification. These datasets included only 87, 59 and 42 of the 113 genes, respectively, and overexpression was noted in 42%, 36% and 34% of them (Table 1 and Table 5). These results indicate a slight enrichment of the candidate amplified gene list for overexpression.
Univariate analysis of the Duke microarray dataset showed that 10,023 of 54,675 (18%) probe sets were prognostic for overall survival (p<0.05), with 4879 (9%) of overexpressed genes associated with poor prognosis. Among the 113 candidate amplified genes, 112 were represented by 227 probe sets on the U133 plus 2 array. The expression of 46/227 (20%) probe sets was significantly associated with prognosis. This is not significantly different from the overall proportion of all microarray probe sets that were prognostic (p=0.507). However, 34 of the 227 probe sets (15%), representing 27/113 (24%) putatively amplified and overexpressed genes (Table 3) were associated with poor prognosis. This is significantly higher than the 9% of all probe sets (p=0.002) with such association. The most prognostic overexpressed genes included SERPINE1 (HR=6.02, 95% Cl 1.98-16.23, p=0.001), GNB2 (HR=5.8, 95% CI 1.83-14.52, p=0.002) and ST13 (HR=5.37, 95% CI 1.67-13.05, p=0.003), (
Using frequency scoring the most common deletions were identified, as described below. The majority of deleted clones in AWBF were on 3p and 5q and they showed more continuity in their chromosomal location than on the other chromosomes. The deleted clones on chromosome 3p and 5q included 149 genes (Table 2), among which are FHIT and DLEC1. The 149 genes mapped to 441 probe sets on the U133 plus 2 array. The downregulation of the 28 probe sets (25 genes) (Table 4) was significantly associated with poor prognosis (HR<1). Similar to the candidate gained genes, correlative gene expression analysis using external and internally available lung adenocarcinoma datasets found that 22%, 20% and 16% of the genes in the Toronto, Harvard and Michigan datasets were downregulated. Among the 149 candidate genes with loss, only 113, 84 and 48 respectively were represented in these three datasets. Downregulation was found in 45%, 26% and 20% of them (Table 6). These results also showed an enrichment of the candidate deleted gene list for downregulation.
It was demonstrated that the genomic profile of BAC is distinguishable from that of invasive AWBF, with the latter displaying greater genomic aberrations. It was also demonstrated that there is progression at the genomic level from BAC-like to invasive areas of AWBF. The 113 differentially gained genes in AWBF compared to BAC represent candidate marker genes for tumor invasion and malignant progression. Correlative gene expression studies on microarray datasets suggest that a high percentage of these genes are prognostic markers for early stage ADC patients. Using qPCR, the common amplification of 25 genes including TERT and PDCD6 was validated, and PDCD6 overexpression was found to be an independent prognostic marker for poor overall survival in early stage ADC. Further validation may lead to use of these genes as markers for differentiating aggressive AWBF from non-invasive and prognostically excellent BAC.
The differential genomic changes noted between BAC and invasive AWBF provide important evidence for a better understanding of the pathogenesis of ADC. Two independent algorithms were used to enhance the certainty of the profile that distinguishes BAC from invasive AWBF. The inability to clearly differentiate BAC from BAC with focal area of invasion at the genomic level suggests that both may have a similar behavior with low metastatic potential, and early invasion is likely determined at gene expression levels by epigenetic mechanisms. The finding also suggests that BAC or BAC with focal invasion, which are negative for the overexpression of identified marker genes, could potentially be grouped into a single diagnostic entity with excellent prognosis (11, 15).
The 113 candidate marker genes that were identified may represent part of the “signature of chromosomal instability” for invasion and malignant progression in AWBF (22). The correlative gene expression validation rate (−35%) in the Harvard and Michigan datasets was limited by the low number of probesets in the microarray platform that matched the genomic gene list (less than half). Nevertheless, it confirms the importance of some of the candidate markers in lung carcinoma (Table 1) and the overexpression of others such as SAR1A (23), SYCP1 (24) and MCM7 (22) that have been linked to other malignancies as well as lung cancer. The poor prognostic significance of TERT gene amplification in NSCLC has been previously reported (25). The findings disclosed herein extend the importance of TERT amplification to AWBF and increased TERT gene copy due to chromosome 5 polysomy.
PDCD6, programmed cell death 6, or apoptosis-linked gene 2 (ALG-2) is located on chromosome 5pter-5p15.2 and is in close proximity to TERT. It encodes a 191 amino acid protein that was originally considered pro-apoptotic (26). PDCD6 belongs to the penta-EF hand Ca2+-binding protein family (27) and is ubiquitously expressed in the body. PDCD6 is required for T-cell receptor (TCR), glucocorticoid (26) and FAS (28) induced cell death. It interacts with the SH3-binding domain containing pro-apoptotic protein AIP1 (ALG-2-interacting protein-1) (29), peflin (30) and annexin XI (31) in a Ca2+-dependent way as well as with DAPK1 (death-associated protein kinase 1) (32). During FAS-induced apoptosis, PDCD6 which is a 22-kDa protein, is cleaved in its N-terminal to yield a 19-kD protein and translocates from the cytoplasmic membrane to the cytosol (28). More recent work questioned the need of PDCD6 for apoptosis, as it may be compensated by other functionally redundant proteins (33). Immunohistochemical staining has revealed high expression of PDCD6 in primary tumors compared to normal tissues of the breast, liver and lung (34, 35). Both nuclear and cytoplasmic over-expression have been reported for lung cancer, especially metastatic ADC, indicating that it plays a role in survival pathways (35). It has been demonstrated that PDCD6 is significantly overexpressed in lung ADC (35). Moreover, it has also been demonstrated herein that PDCD6 is a poor prognostic factor in both early stage NSCLC as well as ADC, and thus may serve as one of the markers to differentiate more indolent from aggressive AWBF.
Potti et al (19) reported a genomic strategy to refine prognosis for early stage NSCLC and identify patients at high risk of relapse after initial surgery. They constructed a lung metagene model based on gene expression data and showed that its prognostic accuracy surpasses that of a model based on traditional clinical data. Their model was applied to all histologic types of early stage disease but did not consider BAC as a special entity. Although none of the 122 genes in the published metagenes matched the 113 genes disclosed herein, analysis of the genes disclosed herein in their dataset showed that the overexpression of 27 genes (24%) was associated with poor prognosis in early stage ADC patients. Significantly higher gene copy number in AWBF compared with BAC was confirmed by qPCR on genomic DNA.
The 27 candidate markers that were identified include SERPINE1, GNB2, and ST13. SERPINE1, serpin peptidase inhibitor, clade E, member 1, also known as plasminogen activator inhibitor-1 (PAI1) is the primary physiological inhibitor of both tissue-type plasminogen activator (tPA) and urokinase-like PA (uPA), thus promotes the stabilization and formation of thrombi. Aside from regulating the fibrinolytic system, SERPINE1 has de-adhesive properties and is capable of inducing cell detachment that is dependent on the presence of complexes of uPA:uPA-receptor matrix-engaged integrins (36). Interestingly, SERPINE1 high expression has been linked previously with poor prognosis in a number of malignancies (37), including lung ADC (17). High expression of SERPINE1 may activate cellular scattering, promote migration and possibly enhance metastatic spread, all of which could account for the poor prognosis observed. The study relates the high expression to amplification present at the genomic level. SERPINE1 is located on the same locus, 7q21.3-q22, as GNB2, which is a novel prognostic marker for lung ADC. GNB2, guanine nucleotide-binding protein, beta-2, is the second of five possible genes encoding the beta-subunit of G proteins. As of yet, no other study associates GNB2 with lung cancer, but it is well established that G protein-coupled receptors can promote cancer progression and metastasis in a variety of tumors including NSCLC (38). ST13, suppression of tumorigenicity 13, whose aliases are P48, HOP and Hip (Hsc70-interacting protein), acts as a co-chaperone of heat-shock protein (Hsp) 70 to stabilizes its activity (39). Hsp70 is known to promote survival in cancer cells (40), thus making it is reasonable to hypothesize that ST13 amplification would lead to tumor progression. To date, ST13 has not been associated with NSCLC or its prognosis; hence it is another novel prognostic marker for lung ADC.
Two 1 mm diameter cores were sampled for each tumor, and proper sampling was confirmed by post-coring HE section. DNA was isolated from tissue cores using standard phenol-chloroform method after Proteinase K (Roche, Laval, QC) digestion. The DNA was hybridized to the “27 K” high-density human bacterial artificial chromosome (hBAC) SMRT (Sub Megabase Resolution Tiling set) array CGH (BCCRC, Vancouver, BC), which contains two replicates of each hBAC clone. These arrays allow detection of 0.4 Mb single-copy gains and deletions even with 50% contamination of tumor by normal cells and up to 0.1 Mb in pure tumor samples (46). The hybridization, scanning and data processing were performed as previously described (42). Data was normalized with a three-step normalization framework (47) and log2 ratio replicate data points that exceeded a standard deviation of 0.075 were excluded.
Threshold Filtering:
The range of signal ratios recorded for normal samples defined the threshold by which a genuine genomic change was recognized and was calculated separately for each clone. A short list of 119 clones that best differentiate BAC from AWBF was created by filtering only clones that had array CGH ratio (aCGHR) above or below the threshold and with p value≦0.05 in Student's t-Test (two-tailed, unequal variance) followed by exclusion of any clone that had data for ≦4 tumors. Unsupervised hierarchical clustering using average linkage clustering of these selected clones by the Genesis software package (44) followed.
Frequency Scoring:
As previously described (42) array CGH data was smoothed by aCGH-Smooth (48) and the settings per chromosome of λ and breakpoints were 6.75 and 100, respectively. This data was displayed with Frequency Plot program, provided within the SeeGH software package. The frequency of gain loss and retention for each clone in the BAC group was compared to AWBF using the Fisher's Exact Test. A total of 517 clones with p value ≦0.05 were selected for further analysis.
Overlapping Threshold Filtering with Frequency Scoring:
To increase the certainty of clone selection and avoid exclusion of critical clones, the two short lists generated by Threshold filtering and Frequency scoring were overlapped. They were aligned according to their chromosomal location, any clone within 10 clone “window” created by at least one call from each of the short lists was selected and a list of 256 clones distributed in 34 continuous regions in 16 different chromosomes was obtained. Further manual selection based on log2 ratio data of array CGH retrieved a shorter list of 58 clones that best differentiate BAC from AWBF. The analysis concentrated on clone gains rather than losses since they out numbered the latter. Annotations for genes of interest within these 58 clones were obtained from UCSC Genome Browser Gateway website (http://genome.ucsc.edu/cgi-bin/hgGateway) assembly April 2003 (49) via the SeeGH software (43). Relevant gene name and Entrez GeneID update was done based on information downloaded from Entrez Gene on Nov. 28, 2006 (http://www.ncbi.nlm.nih.gov/entrez/query.fcg i?CM D=search&DB=gene).
Deletions Analysis:
Based on Frequency scoring 8875 clones that were deleted in AWBF were identified. From them only 489 clones were deleted in at least a quarter of the AWBF cases. Annotations for the genes of interest within these 489 clones were obtained as described above for the gained clones. The sequences for each gene identified and referred to by Entrez Gene ID is incorporated herein by reference.
Validation by Realtime qPCR
Primer sets for genomic DNA were designed for exons of target genes and two housekeeping genes, MAP2 (microtubule-associated protein 2) and B2M (beta-2-microglobulin), chosen for their rare involvement in genomic alterations in lung cancer (Progenetix CGH database—http://www.progenetix.de/˜pgscripts/progenetix/Aboutprogenetix.html). Primer Express software v. 2.0 (Applied Biosystems, Foster City, Calif.) was used for the design of all primer sets. To exclude amplification of contaminating pseudogene sequences, primers (sequences provided in Table 10) were first aligned using the BLASTN program, followed by dissociation curve and primer efficiency tests. The qPCR assays were conducted in duplicate in 384-well plate using the SYBR Green assay in the ABI PRISM 7900-HT (Applied Biosystems, Foster City, Calif.) with 5 ng of genomic DNA (gDNA) in a 10 μl qPCR reaction. The reactions were activated at 95° C. for 10 minutes followed by 40 cycles of denaturing at 95° C. for 15s and annealing and extension at 60° C. for 1 min. The normalized, relative original copy number of each gene prior to the PCR procedure was calculated by the formula 2−ΔΔCt (50) with the geometric mean of the two housekeeping genes serving as an endogenous reference, and the average of 8 normal lung samples as a calibrator.
Total RNA was isolated from fresh frozen tissues using the guanidium thiocyanate-phenol-chloroform method, DNAse I treated with DNA-free DNAse (Ambion, Austin, USA) and column purified using the RNeasy Mini Kit (Qiagen, Hilden, Germany). Five nanogram of total RNA were reverse transcribed using Superscript II Reverse Transcription reagents and Oligo dT (Invitrogen, Carlsbad, Calif.) to produce cDNA. The housekeeping genes ACTB (Actin, beta), B2M and TBP (TATA box binding protein) were used for the 10 paired samples and ACTB, B2M, TBP and BAT1 (HLA-B associated transcript 1) for the cohort of 94 tumors. PCR primer sets were designed as described above for qPCR on genomic DNA (sequences provided in the Table 10). Each of the realtime quantitative PCR amplifications were performed in a final volume of 10 μL in a 384-well plate, where a 5 ng equivalent of cDNA was used for the 10 paired samples and a 2 ng equivalent of cDNA was used for the 94 tumor cohort. All samples were run in duplicate. The reactions for the 10 paired samples were activated at 95° C. for 10 minutes followed by 40 cycles of denaturing at 95° C. for 15s and annealing and extension at 60° C. for 1 min. The reactions for the 94 tumor cohort were activated at 95° C. for 3 minutes followed by 40 cycles of denaturing at 95° C. for 15s, annealing at 65° C. for 15s and extension at 72° C. for 20s. Transcript number/ng cDNA was obtained using standard curves generated with a pool of 10 non-tumor lung genomic DNAs (51). Technical replicates were collapsed by averaging. Normalization and standardization of data was accomplished using the geometric mean of the expression levels of common house-keeping genes. The normalization method has been recently published (41).
FISH was performed using the TERT/5q dual-color FISH probe cocktail (Qbiogene, Montreal, QC) that contains the TERT locus (5p15) specific probe labelled directly with dGreen and the 5q31 (D5S89) specific probe directly labelled with Rhodamine, according to a published protocol (25). Fifty intact, non-overlapping tumor interphase nuclei were scored for TERT and 5q31 copy number. Results are presented as the mean gene copy number per nucleus.
The ‘Harvard’ raw data was pre-processed with the RMA algorithm (52) in the R statistical environment (v2.1.1) using the affy package (v1.6.7) (53) of the BioConductor open-source library (54). Replicate arrays were collapsed by taking the arithmetic mean of their log2 expression values. Pre-processed log2 converted values for ADC were compared to normal lung values using SAM (v2.21) (55), and the number of positively- and negatively-regulated ProbeSets were determined with two-class unpaired analysis (median false detection rate (FDR)=3.98%).
The ‘Michigan’ raw data was pre-processed using RMAExpress (v0.3) (52, 56) followed by SAM (v2.21) analysis of tumor vs. normal (median FDR=4.98%).
The ‘Duke’ raw data was pre-processed using RMAExpress (v0.3) (52, 56) and log2 transformed.
The ‘Toronto’ raw data from each chip was pre-processed separately using RMAExpress (v0.3) and log2 converted (52, 56). The data for all samples was adjusted and merged using the Distance Weight Discrimination algorithm with ‘Standard output’ setting (57, 58). Duplicate data for the 4 tumor arrays which were profiled on both U133A and U133A2, were collapsed by taking the arithmetic mean of their adjusted expression values. The merged data was then used for SAM (v2.21) analysis of tumor vs. normal (median FDR=4.06%).
Genes from array CGH clones were matched to Affymetrix ProbeSets of all four studies based on LocusLink ID from array CGH data and Entrez GeneIDs from Affymetrix annotation Tables (Nov. 15, 2006. Release #21; https://www.affymetrix.com/analysis/releasedocs/netaffx_release 21.affx).
Pearson correlation coefficients for FISH were calculated using the ratio between mean TERT score and the mean control (5q) score.
The NSCLC cohort used for the mRNA expression study initially comprised of 94 samples. Nine cases that had equivalent survival curve as the remaining 85 cases had to be excluded from the study since they had no TBP or BAT1 read that was required for the normalization.
To study the association of PDCD6 mRNA expression with survival, overall survival (from date of surgery to date of death) of 85 NSCLC patients was used. PDCD6 adjusted expression was dichotomized at the 25th percentile following identification of a distinctive survival pattern of this first quartile. Survival curves were plotted as the Kaplan-Meier graphs and compared using the log rank test. Univariate and multivariate analyses were done using the Cox proportional hazards regression model.
Survival analysis of 34 stage I ADC patients from the ‘Duke’ dataset was done using the Cox proportional hazards regression model. The genes whose expression was found to be significantly associated with prognosis were compared to the 113 candidate genes. The χ2 test was used to compare the percentage of prognostic ProbSets and those predictive of poor prognosis between the entire microarray ProbSets and those corresponding to the 113 candidate genes. The expression of selected genes (SERPINE1, GNB2 and ST13) was dichotomized at the median in order to create Kaplan-Meier survival curves that were compared by the log rank test.
The study protocol was approved by the University Health Network Research Ethics Board and included 26 resected lung cancers (1996-2005) classified histologically as non-mucinous BAC or invasive-AWBF. For each case, the histology slides were reviewed independently by the study pathologists (SAR and MST) and tumors were classified according to the 2004 WHO criteria (2). Twelve cases were classified as AWBF when they had prominent non-mucinous BAC-like pattern (>50% of the tumor), but also had frank invasive adenocarcinoma of other histological types, such as acinar, papillary or solid (
Tissue Sampling, DNA Isolation and Array CGH
DNA was isolated from formalin-fixed paraffin embedded (FFPE) tissue. Guided by Hematoxylin-eosin (HE) stained sections, representative paraffin blocks with tumor areas containing >50% tumor cell nuclei were marked and cored using the needle for tissue array (Beecher Instrument, Sun Prairie, Wis.). The process of tissue sampling, DNA isolation and array CGH is detailed below.
Array CGH data analysis was based on two independent algorithms, Threshold-filtering and Frequency-scoring (42) using multiple software tools including SeeGH (43), Genesis (44), aCGH-Smooth (45) and Frequency-Plot (42). The algorithms and the overlap between them are described below. The analysis concentrated on clone gains rather than losses since clone gains involved more chromosomes, their prevalence was higher (
Validation by Realtime Quantitative PCR (qPCR)
Gene copy numbers were evaluated on DNA used in the array CGH studies by realtime qPCR using primer sets for target and house keeping genes. The evaluation of 33 genes including TERT and PDCD6 was performed on all the array CGH samples asides from two BACs (Table 12). The mRNA expression study was carried out on two groups of samples: 10 pairs of matched ADC and their adjacent normal lung tissue and 85 NSCLC samples. Primer sets design are included in Tables 10 and 11.
The 21 cases studied by FISH included 7 BAC with or without suspicion for invasion and 14 AWBF; three of the latter were scored in both their BAC and invasive areas. Additional case of AWBF was scored only in the invasive area. Among these cases is one with synchronous BAC and invasive AWBF sampled from the BAC area. FISH failed in 6 samples. The FISH protocol is detailed below.
RNA was extracted by phenol-chloroform method from 39 adenocarcinomas (Table 9) and 10 normal lung tissue samples. RNA quality was assessed by gel electrophoresis and Agilent Bioanalyzer. cRNA synthesis, hybridization and scanning were performed following the manufacturer's protocol. The adenocarcinomas RNA was profiled on Affymetrix U133A chip and the normal lung RNA on Affymetrix U133A2 chip. To ensure the compatibility of these 2 platforms, 4 of the 39 adenocarcinomas were re-profiled on the U133A2 chip.
The 113 amplified genes and the 149 deleted genes from array CGH analysis on the Toronto microarray dataset and on two publicly available lung cancer microarray expression datasets (17, 18) referred to as ‘Harvard’ and ‘Michigan’, respectively, were validated. For a detailed description of the analytic process and a summary of the validation see below, Tables 1 and 6.
In addition, univariate analysis was performed on microarray expression data of stage I ADC patient samples from a third dataset referred to as ‘Duke’ (19) in order to identify prognostic markers and compare them to the 113 candidate markers, as detailed in below and Table 7.
The Mann-Whitney test was used to compare the genomic copy number of 33 genes including TERT and PDCD6. Pearson correlation coefficients assessed the correlation between array CGH, qPCR and FISH results. The Wilcoxon signed rank test was used to compare PDCD6 expression in the paired ADC-normal samples. Survival analysis of PDCD6 mRNA of 85 NSCLC patients and 34 stage I ADC patients from the ‘Duke’ dataset is described above.
A heavy smoker patient joins a screening program for early diagnosis of lung cancer in high risk (heavy smokers) patients. A coin lesion for example, of 3.0 cm in the right upper lobe of lung is detected on chest CT scan. Right upper lobectomy is performed and a tumor with predominant bronchioloalveolar growth pattern is found. The tumor is associated with a large fibrotic area, where invasion is suspected. The differential diagnosis between BAC and AWBF is critical for the decision to administer adjuvant chemotherapy. At this point an additional section from the formalin-fixed paraffin embedded tumor block is cut and DNA is extracted. Quantitative PCR of the genomic DNA is run for 5 genes: PDCD6, TERT, SERPINE1, GNB2 and ST13.
The results are compared to control of normal lung tissue and show high content of PDCD6, SERPINE1 and GNB2 in the tumor. TERT is equivocally gained and ST13 shows normal content. Using additional section of the tumor FISH for TERT probe is performed and demonstrates clear amplification of TERT. Based on the ancillary studies, the tumor is diagnosed as AWBF with less favorable prognosis. Consequently the patient receives adjuvant chemotherapy.
Array CGH analysis of BAC and ADC identified one hundred and thirteen (113) genes as demonstrating differential frequencies of alteration in BAC and ADC. Thirty three (33) of these genes were further validated by Quantitative PCR analysis of gene copy number, and examined for potential diagnostic/prognostic utility (Table 13).
The Receiver Operating Characteristic (ROC) area under the curve (AUC) analysis was performed to determine the ability of each gene to separate the BAC and ADC samples into their appropriate diagnostic groups. Briefly, ROC analysis is based on comparison of true positive and false positive rates at various cut-offs. An ROC AUC value of 0.5 would indicate that the marker is no better than random chance at separating two groups, while a score of 1 would indicate that the marker is perfect at separating the two groups. Generally a marker with and AUC of 0.8 to 0.9 is considered good, while a AUC of 0.7 to 0.8 would represent a “fair” marker. Calculations were performed using the calculator at: http://www.rad.jhmi.edu/jeng/javarad/roc/JROCFITi.html (Eng J. ROC analysis: web-based calculator for ROC curves. Baltimore: Johns Hopkins University [updated 2006 May 17] Available from: http://www.jrocfit.org).
Although ROC analysis gives an indication of a marker's diagnostic value, it does not identify optimal cut-offs for maximal sensitivity and specificity. In order to generate relative risk and sensitivity/specificity scores for each gene, the QPCR copy number fold change threshold that gave maximal sensitivity while preserving a specificity of at least 90% was first identified. This was calculated on a per gene basis and a smaller threshold indicates both a lower copy number level and frequency of gains in BAC samples (for example a QPCR fold change threshold of 1.2 for PPA1 indicates that samples having greater than 1.2 copies of the gene (e.g a gain) are classified and diagnosed as having AWBF and/or prognosed as having poor survival with a 91.7% specificity and 53.3% sensitivity. Relative Risk is defined as the proportion of ADC samples with a gain divided by the proportion of BAC samples with a gain, as defined by the QPCR threshold identified above. This score thus represents the relative likelihood that a ADC will carry the alteration compared to a BAC. Similarly Sensitivity and Specificity are indicated for each gene.
Genes were prioritized based on a combination of maximal ROC value and minimal QPCR threshold. These genes represent the strongest diagnostic markers of ADC with minimal alterations in BAC patients (EPO, SERPINE1, SLC25A17, POP7).
While the present application has been described with reference to what are presently considered to be the preferred examples, it is to be understood that the application is not limited to the disclosed examples. To the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
All publications, patents and patent applications are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety. Specifically, the sequence associated with each accession number provided herein is incorporated by reference in its entirely
Drosophila)
Identified in silico as poor prognosis markers in early stage (stage 1) ADC in the ‘Duke’ microarray expression dataset (19) or from NSCLC samples from University Health Network (59).
Number | Date | Country | |
---|---|---|---|
61059085 | Jun 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12996064 | Feb 2011 | US |
Child | 13671912 | US |