Lung cancer prognostics

Abstract
A method of providing a prognosis of lung cancer is conducted by analyzing the expression-of a group of genes. Gene expression profiles in a variety of medium such as microarrays are included as are kits that contain them.
Description
REFERENCE TO SEQUENCE LISTING, OR A COMPUTER PROGRAM LISTING COMPACT DISK APPENDIX

Reference to a “Sequence Listing,” a table, or a computer program listing appendix submitted on a compact disc and an incorporation by reference of the material on the compact disc including duplicates and the files on each compact disc shall be specified.


BACKGROUND

This application claims the benefit of U.S. Patent Application No. 60/632,053, filed Nov. 30, 2005 which is incorporated herein by reference.


This invention relates to prognostics for lung cancer based on the gene expression profiles of biological samples.


Lung cancer is the leading cause of cancer deaths in developed countries killing about 1 million people worldwide each year. An estimated 171,900 new cases are expected in 2003 in the US, accounting for about 13% of all cancer diagnoses. Non-small cell lung cancer (NSCLC) represents the majority (˜75%) of bronchogenic carcinomas while the remainder is small cell lung carcinomas (SCLC). NSCLC is comprised of three main subtypes: 40% adenocarcinoma, 40% squamous, and 20% large cell cancer. Adenocarcinoma has replaced squamous cell carcinoma as the most frequent histological subtype over the last 25 years, peaking the early 1990's. This may be associated with the use of “low tar” cigarettes resulting in deeper inhalation of cigarette smoke. Wingo et al. (1999). The overall 10-year survival rate of patients with NSCLC is a dismal 8-10%.


Approximately 25-30% of patients with NSCLC have stage I disease and of these 35-50% will relapse within 5 years after surgical treatment. Depending upon stage, adenocarcinoma has a higher relapse rate than squamous cell carcinoma with approximately 65% and 55% of SCC and adenocarcinoma patients surviving at 5 years, respectively. Mountain et al. (1987). Currently, it is not possible to identify those patients with a high risk of relapse. The ability to identify high-risk patients among the stage I disease group will allow for the consideration of additional therapeutic intervention leading to the potential for improved survival. Indeed, recent clinical trials have shown that adjuvant therapy following resection of lung tumors can lead to improved survival. Kato et al. (2004). Specifically, Kato et al. demonstrated that adjuvant chemotherapy with uracil-tegafur improves survival among patients with completely resected pathological stage I adenocarcinoma, particularly T2 disease.


Microarray gene expression profiling has recently been utilized to define prognostic signatures in patients with lung adenocarcinomas, (Beer et al. (2002)) however, no large studies have investigated gene expression profiles of prognosis in the squamous cell carcinoma population. Here, we have profiled 134 SCC samples and 10 normal matched lung samples on the Affymetrix U133A chip. Hierarchical clustering and Cox modeling has identified genes that correlate with patient prognosis. These signatures can be used to identify patients who may benefit from adjuvant therapy following initial surgery.


SUMMARY OF THE INVENTION

The present invention provides a method of assessing lung cancer status by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are indicative of lung cancer status.


The present invention provides a method of staging lung cancer patients by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are indicative of the lung cancer stage.


The present invention provides a method of determining lung cancer patient treatment protocol by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below predetermined cut-off levels are sufficiently indicative of risk of recurrence to enable a physician to determine the degree and type of therapy recommended to prevent recurrence.


The present invention provides a method of treating a lung cancer patient by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are indicate a high risk of recurrence and; treating the patient with adjuvant therapy if they are a high risk patient.


The present invention provides a method of determining whether a lung cancer patient is high or low risk of mortality by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 4 where the expression levels of the Marker genes above or below pre-determined cut-off levels are sufficiently indicative of risk of mortality to enable a physician to determine the degree and type of therapy recommended.


The present invention provides a method of generating a lung cancer prognostic patient report by determining the results of any one of the methods described herein and preparing a report displaying the results and patient reports generated thereby.


The present invention provides a composition comprising at least one probe set selected from the group consisting of: Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.


The present invention provides a kit for conducting an assay to determine lung cancer prognosis in a biological sample comprising: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.


The present invention provides articles for assessing lung cancer status comprising: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.


The present invention provides a microarray or gene chip for performing the method described herein.


The present invention provides a diagnostic/prognostic portfolio comprising isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.




BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts hierarchical clustering of 129 lung SCC patients.



FIG. 2 depicts plots of AUC vs. number of genes.



FIG. 3 depicts error rates of LOOCV v various cutoffs in the 65-sample training set.



FIG. 4 depicts Kaplan Meier plots of the 50-gene signature in the testing set.



FIG. 5 depicts unsupervised clustering identifies epidermnal differentiation pathway as being down-regulated in high-risk patients. A. Clustering of patients based on top 121 showed two clusters of patients. The majority of genes in cluster I were down-regulated (green). B. List of 20 genes associated with epidermal differentiation pathway. C. Kaplan Meier curve of clustered patient groups defined by the-20 epidermal-related genes.



FIG. 6 depicts verification of gene expression data using real-time RT-PCR. Four genes (NTRK2, FGFR2, VEGF, KRT13) were selected for RT-PCR. Expression correlate very well with Affymetrix chip data (R=0.71-0.96).




DETAILED DESCRIPTION OF THE INVENTION

Non-small cell lung cancer (NSCLC) represents the majority (˜75%) of lung carcinomas and is comprised of three main subtypes: 40% squamous, 40% adenocarcinoma, and 20% large cell cancer. Approximately 25-30% of patients with NSCLC have stage I disease and of these 35-50% will relapse within 5 years after surgical treatment. Current histopathology and genetic biomarkers are insufficient for identifying patients who are at a high risk of relapse. As described in the present invention, 129 primary squamous cell lung carcinomas and 10 matched normal lung tissues were profiled using the Affymetrix U133A gene chip. Unsupervised hierarchical clustering identified two clusters of patients with lung carcinoma that had no correlation with stage of disease but had significantly different median overall survival (p=0.036). Cox proportional hazard models were then utilized to identify an optimal set of 50 genes (Table 1) in a 65 patient training set that significantly predicted survival in a 64 patient test set. This signature achieved 52% specificity and 82% sensitivity and provided an overall predictive value of 71%. Kaplan-Meier analysis showed clear significant stratification of high and low risk patients (p=0.0075). The identification of prognostic signatures allows identification of patients with high-risk squamous cell lung carcinoma who could benefit from adjuvant therapy following initial surgery.

TABLE 1SEQ ID NO:Rank228128427631244281586630373118443928710131137812362131814791523016416174091878194202058215322254239124270254462642731028422910308031123244033753460356336283372938221392794028041267421894310344194452684625247461483724941450


A Biomarker is any indicia of the level of expression of an indicated Marker gene. The indicia can be direct or indirect and measure over- or under-expression of the gene given the physiologic parameters and in comparison to an internal control, normal tissue or another carcinoma. Biomarkers include, without limitation, nucleic acids (both over and under-expression and direct and indirect). Using nucleic acids as Biomarkers can include any method known in the art including, without limitation, measuring DNA amplification, RNA, micro RNA, loss of heterozygosity (LOH), single nucleotide polymorphisms (SNPs, Brookes (1999)), microsatellite DNA, DNA hypo- or hyper-methylation. Using proteins as Biomarkers can include any method known in the art including, without limitation, measuring amount, activity, modifications such as glycosylation, phosphorylation, ADP-ribosylation, ubiquitination, etc., imunohistochemistry (IHC). Other Biomarkers include imaging, cell count and apoptosis markers.


The indicated genes provided herein are those associated with a particular tumor or tissue type. Marker gene may be associated with numerous cancer types but provided that the expression of the gene is sufficiently associated with one tumor or tissue type to be identified using the algorithm described herein to be specific for a lung cancer cell, the gene can be using in the claimed invention to determine cancer status and prognosis. Numerous genes associated with one or more cancers are known in the art. The present invention provides preferred Marker genes and even more preferred Marker gene combinations. These are described herein in detail.


A Marker gene corresponds to the sequence designated by a SEQ ID NO when it contains that sequence. A gene segment or fragment corresponds to the sequence of such gene when it contains a portion of the referenced sequence or its complement sufficient to distinguish it as being the sequence of the gene. A gene expression product corresponds to such sequence when its RNA, mRNA, or cDNA hybridizes to the composition having such sequence (e.g. a probe) or, in the case of a peptide or protein, it is encoded by such mRNA. A segment or fragment of a gene expression product corresponds to the sequence of such gene or gene expression product when it contains a portion of the referenced gene expression product or its complement sufficient to distinguish it as being the sequence of the gene or gene expression product.


The inventive methods, compositions, articles, and kits of described and claimed in this specification include one or more Marker genes. “Marker” or “Marker gene” is used throughout this specification to refer to genes and gene expression products that correspond with any gene the over- or under-expression of which is associated with a tumor or tissue type. The preferred Marker genes are described in more detail in Table 8.


The present invention provides a method of assessing lung cancer status by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are indicative of lung cancer status.


The present invention provides a method of staging lung cancer patients by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are indicative of the lung cancer stage. The stage can correspond to any classification system, including, but not limited to the TNM system or to patients with similar gene expression profiles.


The present invention provides a method of determining lung cancer patient treatment protocol by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are sufficiently indicative of risk of recurrence to enable a physician to determine the degree and type of therapy recommended to prevent recurrence.


The present invention provides a method of treating a lung cancer patient by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 where the expression levels of the Marker genes above or below pre-determined cut-off levels are indicate a high risk of recurrence and; treating the patient with adjuvant therapy if they are a high risk patient.


The present invention provides a method of determining whether a lung cancer patient is high or low risk of mortality by obtaining a biological sample from a lung cancer patient; and measuring Biomarkers associated with Marker genes corresponding to those selected from Table 4 where the expression levels of the Marker genes above or below pre-determined cut-off levels are sufficiently indicative of risk of mortality to enable a physician to determine the degree and type of therapy recommended.


In the above methods, the sample can be prepared by any method known in the art including, but not limited to, bulk tissue preparation and laser capture microdissection. The bulk tissue preparation can be obtained for instance from a biopsy or a surgical specimen.


In the above methods, the gene expression measuring can also include measuring the expression level of at least one gene constitutively expressed in the sample.


In the above methods, the specificity is preferably at least about 40% and the sensitivity at least at least about 80%.


In the above methods, the pre-determined cut-off levels are at least about 1.5-fold over- or under-expression in the sample relative to benign cells or normal tissue.


In the above methods, the pre-determined cut-off levels have at least a statistically significant p-value over-expression in the sample having metastatic cells relative to benign cells or normal tissue, preferably the p-value is less than 0.05.


In the above methods, gene expression can be measured by any method known in the art, including, without limitation on a microarray or gene chip, nucleic acid amplification conducted by polymerase chain reaction (PCR) such as reverse transcription polymerase chain reaction (RT-PCR), measuring or detecting a protein encoded by the gene such as by an antibody specific to the protein or by measuring a characteristic of the gene such as DNA amplification, methylation, mutation and allelic variation. The microarray can be for instance, a cDNA array or an oligonucleotide array. All these methods and can further contain one or more internal control reagents.


The present invention provides a method of generating a lung cancer prognostic patient report by determining the results of any one of the methods described herein and preparing a report displaying the results and patient reports generated thereby. The report can further contain an assessment of patient outcome and/or probability of risk relative to the patient population.


The present invention provides a composition comprising at least one probe set selected from the group consisting of: Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.


The present invention provides a kit for conducting an assay to determine lung cancer prognosis in a biological sample comprising: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7. The kit can further comprise reagents for conducting a microarray analysis, and/or a medium through which said nucleic acid sequences, their complements, or portions thereof are assayed.


The present invention provides articles for assessing lung cancer status comprising: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7. The articles can further contain reagents for conducting a microarray analysis and/or a medium through which said nucleic acid sequences, their complements, or portions thereof are assayed.


The present invention provides a microarray or gene chip for performing the method of claim 1, 2, 5, 6 or 7. The microarray can contain isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7. Preferably, the microarray is capable of measurement or characterization of at least 1.5-fold over- or under-expression. Preferably, the microarray provides a statistically significant p-value over- or under-expression. Preferably, the p-value is less than 0.05. The microarray can contain a cDNA array or an oligonucleotide array and/or one or more internal control reagents.


The present invention provides a diagnostic/prognostic portfolio comprising isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7. Preferably, the portfolio is capable of measurement or characterization of at least 1.5-fold over- or under-expression. Preferably, the portfolio provides a statistically significant p-value over- or under-expression. Preferably, the p-value is less than 0.05.


The mere presence or absence of particular nucleic acid sequences in a tissue sample has only rarely been found to have diagnostic or prognostic value. Information about the expression of various proteins, peptides or mRNA, on the other hand, is increasingly viewed as important. The mere presence of nucleic acid sequences having the potential to express proteins, peptides, or mRNA (such sequences referred to as “genes”) within the genome by itself is not determinative of whether a protein, peptide, or mRNA is expressed in a given cell. Whether or not a given gene capable of expressing proteins, peptides, or mRNA does so and to what extent such expression occurs, if at all, is determined by a variety of complex factors. Irrespective of difficulties in understanding and assessing these factors, assaying gene expression can provide useful information about the occurrence of important events such as tumorogenesis, metastasis, apoptosis, and other clinically relevant phenomena. Relative indications of the degree to which genes are active or inactive can be found in gene expression profiles. The gene expression profiles of this invention are used to provide diagnosis, status, prognosis and treatment protocol for lung cancer patients.


Sample preparation requires the collection of patient samples. Patient samples used in the inventive method are those that are suspected of containing diseased cells such as cells taken from a nodule in a fine needle aspirate (FNA) of tissue. Bulk tissue preparation obtained from a biopsy or a surgical specimen and Laser Capture Microdissection (LCM) are also suitable for use. LCM technology is one way to select the cells to be studied, minimizing variability caused by cell type heterogeneity. Consequently, moderate or small changes in Marker gene expression between normal or benign and cancerous cells can be readily detected. Samples can also comprise circulating epithelial cells extracted from peripheral blood. These can be obtained according to a number of methods but the most preferred method is the magnetic separation technique described in U.S. Pat. No. 6,136,182. Once the sample containing the cells of interest has been obtained, a gene expression profile is obtained using a Biomarker, for genes in the appropriate portfolios.


Preferred methods for establishing gene expression profiles include determining the amount of RNA that is produced by a gene that can code for a protein or peptide. This is accomplished by reverse transcriptase PCR (RT-PCR), competitive RT-PCR, real time RT-PCR, differential display RT-PCR, Northern Blot analysis and other related tests. While it is possible to conduct these techniques using individual PCR reactions, it is best to amplify complementary DNA (cDNA) or complementary RNA (cRNA) produced from mRNA and analyze it via microarray. A number of different array configurations and methods for their production are known to those of skill in the art and are described in U.S. Patents such as: U.S. Pat. Nos. 5,445,934; 5,532,128; 5,556,752; 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501; 5,561,071; 5,571,639; 5,593,839; 5,599,695; 5,624,711; 5,658,734; and 5,700,637.


Microarray technology allows for the measurement of the steady-state mRNA level of thousands of genes simultaneously thereby presenting a powerful tool for identifying effects such as the onset, arrest, or modulation of uncontrolled cell proliferation. Two microarray technologies are currently in wide use. The first are cDNA arrays and the second are oligonucleotide arrays. Although differences exist in the construction of these chips, essentially all downstream data analysis and output are the same. The product of these analyses are typically measurements of the intensity of the signal received from a labeled probe used to detect a cDNA sequence from the sample that hybridizes to a nucleic acid sequence at a known location on the microarray. Typically, the intensity of the signal is proportional to the quantity of cDNA, and thus mRNA, expressed in the sample cells. A large number of such techniques are available and useful. Preferred methods for determining gene expression can be found in U.S. Pat. Nos. 6,271,002; 6,218,122; 6,218,114; and 6,004,755.


Analysis of the expression levels is conducted by comparing such signal intensities. This is best done by generating a ratio matrix of the expression intensities of genes in a test sample versus those in a control sample. For instance, the gene expression intensities from a diseased tissue can be compared with the expression intensities generated from benign or normal tissue of the same type. A ratio of these expression intensities indicates the fold-change in gene expression between the test and control samples.


Gene expression profiles can also be displayed in a number of ways. The most common method is to arrange raw fluorescence intensities or ratio matrix into a graphical dendogram where columns indicate test samples and rows indicate genes. The data are arranged so genes that have similar expression profiles are proximal to each other. The expression ratio for each gene is visualized as a color. For example, a ratio less than one (indicating down-regulation) may appear in the blue portion of the spectrum while a ratio greater than one (indicating up-regulation) may appear as a color in the red portion of the spectrum. Commercially available computer software programs are available to display such data including “GENESPRING” from Silicon Genetics, Inc. and “DISCOVERY” and “INFER” software from Partek, Inc.


In the case of measuring protein levels to determine gene expression, any method known in the art is suitable provided it results in adequate specificity and sensitivity. For example, protein levels can be measured by binding to an antibody or antibody fragment specific for the protein and measuring the amount of antibody-bound protein. Antibodies can be labeled by radioactive, fluorescent or other detectable reagents to facilitate detection. Methods of detection include, without limitation, enzyme-linked immunosorbent assay (ELISA) and immunoblot techniques.


Modulated Markers used in the methods of the invention are described in the Examples. The genes that are differentially expressed are either up regulated or down regulated in patients with various lung cancer prognostics. Up regulation and down regulation are relative terms meaning that a detectable difference (beyond the contribution of noise in the system used to measure it) is found in the amount of expression of the genes relative to some baseline. In this case, the baseline is determined based on the algorithm. The genes of interest in the diseased cells are then either up- or down-regulated relative to the baseline level using the same measurement method.


Diseased, in this context, refers to an alteration of the state of a body that interrupts or disturbs, or has the potential to disturb, proper performance of bodily functions as occurs with the uncontrolled proliferation of cells. Someone is diagnosed with a disease when some aspect of that person's genotype or phenotype is consistent with the presence of the disease. However, the act of conducting a diagnosis or prognosis may include the determination of disease/status issues such as determining the likelihood of relapse, type of therapy and therapy monitoring. In therapy monitoring, clinical judgments are made regarding the effect of a given course of therapy by comparing the expression of genes over time to determine whether the gene expression profiles have changed or are changing to patterns more consistent with normal tissue.


Genes can be grouped so that information obtained about the set of genes in the group provides a sound basis for making a clinically relevant judgment such as a diagnosis, prognosis, or treatment choice. These sets of genes make up the portfolios of the invention. As with most diagnostic markers, it is often desirable to use the fewest number of markers sufficient to make a correct medical judgment. This prevents a delay in treatment pending further analysis as well unproductive use of time and resources.


One method of establishing gene expression portfolios is through the use of optimization algorithms such as the mean variance algorithm widely used in establishing stock portfolios. This method is described in detail in US patent publication number 20030194734. Essentially, the method calls for the establishment of a set of inputs (stocks in financial applications, expression as measured by intensity here) that will optimize the return (e.g., signal that is generated) one receives for using it while minimizing the variability of the return. Many commercial software programs are available to conduct such operations. “Wagner Associates Mean-Variance Optimization Application,” referred to as “Wagner Software” throughout this specification, is preferred. This software uses functions from the “Wagner Associates Mean-Variance Optimization Library” to determine an efficient frontier and optimal portfolios in the Markowitz sense is one option. Use of this type of software requires that microarray data be transformed so that it can be treated as an input in the way stock return and risk measurements are used when the software is used for its intended financial analysis purposes.


The process of selecting a portfolio can also include the application of heuristic rules. Preferably, such rules are formulated based on biology and an understanding of the technology used to produce clinical results. More preferably, they are applied to output from the optimization method. For example, the mean variance method of portfolio selection can be applied to microarray data for a number of genes differentially expressed in subjects with cancer. Output from the method would be an optimized set of genes that could include some genes that are expressed in peripheral blood as well as in diseased tissue. If samples used in the testing method are obtained from peripheral blood and certain genes differentially expressed in instances of cancer could also be differentially expressed in peripheral blood, then a heuristic rule can be applied in which a portfolio is selected from the efficient frontier excluding those that are differentially expressed in peripheral blood. Of course, the rule can be applied prior to the formation of the efficient frontier by, for example, applying the rule during data pre-selection.


Other heuristic rules can be applied that are not necessarily related to the biology in question. For example, one can apply a rule that only a prescribed percentage of the portfolio can be represented by a particular gene or group of genes. Commercially available software such as the Wagner Software readily accommodates these types of heuristics. This can be useful, for example, when factors other than accuracy and precision (e.g., anticipated licensing fees) have an impact on the desirability of including one or more genes.


The gene expression profiles of this invention can also be used in conjunction with other non-genetic diagnostic methods useful in cancer diagnosis, prognosis, or treatment monitoring. For example, in some circumstances it is beneficial to combine the diagnostic power of the gene expression based methods described above with data from conventional markers such as serum protein markers (e.g., Cancer Antigen 27.29 (“CA 27.29”)). A range of such markers exists including such analytes as CA 27.29. In one such method, blood is periodically taken from a treated patient and then subjected to an enzyme immunoassay for one of the serum markers described above. When the concentration of the marker suggests the return of tumors or failure of therapy, a sample source amenable to gene expression analysis is taken. Where a suspicious mass exists, a fine needle aspirate (FNA) is taken and gene expression profiles of cells taken from the mass are then analyzed as described above. Alternatively, tissue samples may be taken from areas adjacent to the tissue from which a tumor was previously removed. This approach can be particularly useful when other testing produces ambiguous results.


Kits made according to the invention include formatted assays for determining the gene expression profiles. These can include all or some of the materials needed to conduct the assays such as reagents and instructions and a medium through which Biomarkers are assayed.


Articles of this invention include representations of the gene expression profiles useful for treating, diagnosing, prognosticating, and otherwise assessing diseases. These profile representations are reduced to a medium that can be automatically read by a machine such as computer readable media (magnetic, optical, and the like). The articles can also include instructions for assessing the gene expression profiles in such media. For example, the articles may comprise a CD ROM having computer instructions for comparing gene expression profiles of the portfolios of genes described above. The articles may also have gene expression profiles digitally recorded therein so that they may be compared with gene expression data from patient samples. Alternatively, the profiles can be recorded in different representational format. A graphical recordation is one such format. Clustering algorithms such as those incorporated in “DISCOVERY” and “INFER” software from Partek, Inc. mentioned above can best assist in the visualization of such data.


Different types of articles of manufacture according to the invention are media or formatted assays used to reveal gene expression profiles. These can comprise, for example, microarrays in which sequence complements or probes are affixed to a matrix to which the sequences indicative of the genes of interest combine creating a readable determinant of their presence. Alternatively, articles according to the invention can be fashioned into reagent kits for conducting hybridization, amplification, and signal generation indicative of the level of expression of the genes of interest for detecting cancer.


The invention is further illustrated by the following non-limiting examples. All references cited herein are hereby incorporated herein.


EXAMPLES

Genes analyzed according to this invention are typically related to full-length nucleic acid sequences that code for the production of a protein or peptide. One skilled in the art will recognize that identification of full-length sequences is not necessary from an analytical point of view. That is, portions of the sequences or ESTs can be selected according to well-known principles for which probes can be designed to assess gene expression for the corresponding gene.


Example 1

Methods


Patient Population


134 fresh frozen, surgically resected lung SCC and 10 matched normal lung samples from 133 individual patients (LS-71 and LS-136 were duplicate samples from different areas of the same tumor) from all stages of squamous cell lung carcinoma were evaluated in this study. These samples were collected from patients from the University of Michigan Hospital between October 1991 and July 2002 with patient consent and Institutional Review Board (IRB) approval. Portions of the resected lung carcinomas were sectioned and evaluated by the study pathologist by routine hematoxylin and eosin (H&E) staining. Samples chosen for analysis contained greater than 70% tumor cells. Approximately one third of patients (with equal proportions for each stage) received radiotherapy or chemotherapy following surgery. Seventy-seven patients were lymph node negative. Follow-up data were available for all patients. The mean patient age was 68±10 (range 42-91) with approximately 45% of patients 70 years or older. One patient (LS-3) likely died of surgery-related causes and was therefore not utilized in identifying prognostic signatures. Also, three specimens had mixed histology and were also not included in prognostic profiling (LS-76, LS-84, LS-112).


Microarray Analysis


For isolation of RNA, 20 to 40 cryostat sections of 30 μm were cut from each sample, in total corresponding to approximately 100 mg of tissue. Before, in between, and after cutting the sections for RNA isolation, 5 μm sections were cut for hematoxylin and eosin staining to confirm the presence of tumor cells. Total RNA was isolated with RNAzol B (Campro Scientific, Veenendaal, Netherlands), and dissolved in DEPC (0.1%)-treated H2O. About 2 ng of total RNA was resuspended in 10 μl of water and 2 rounds of the T7 RNA polymerase based amplification were performed to yield about 50 μg of amplified RNA. Quality of RNA was checked using the Agilent Bioanalyzer. The mean ribosomal ratio (28s/18s) for all samples was 1.5 (range: 1.0-2.1). Four micrograms of total RNA was amplified, labeled and aRNA was fragmented and hybridized to the Affymetrix U133A chip according to the manufacturer's instructions. Microarray data were extracted using the Affymetrix MAS 5 software. Global gene expression was scaled to an average intensity of 600 units. The data were then normalized using a spline quantile normalization method.


Statistical Analysis


Three complimentary statistical methods were performed to identify the optimal prognostic gene signature: Cox proportional-hazard regression modeling, bootstrapping, and a leave 20 percent out cross validation (L20OCV).


Univariate Cox proportional-hazard regression modeling was performed to identify genes that were significantly associated with overall survival. The Cox score was defined as the sum of the selected gene's log2-based chip signals multiplied by their z scores from the Cox regression. Similarly, Cox scores were calculated for patients in the testing set with the same selected genes from the training set. A series of cutoffs (percentile of risk index for the patients in the training set) was applied to predict the clinical outcome of patients in the testing set by comparing the patients° Cox score in the testing set with a cutoff for the risk index. If a patient's Cox score was higher than the cutoff, the patient was classified as “high risk”, otherwise, it is put in the “low risk” group. Kaplan-Meier analysis was performed to explore the survival characteristics of high-risk and low-risk patients. A cutoff of 3-year survival was employed since the majority of patients who will relapse in this population will have this occur within 3 years. Kiernan et al. (1993). Also many of these patients die due to non-cancer related illnesses after 3 years. Kiernan et al. (1993). This rationale was also employed when performing Cox modeling.


The bootstrap method was also employed to provide a more stringent means of defining prognostic genes. Using the same training and testing sets created above, 65 samples were selected, with replacement from the training set, and then Cox regression was performed on these samples. Each gene's P value and z score were recorded. This step was repeated 400 times thus giving 400 P values and z scores for each gene. For each gene, the top and bottom 5% of P values were removed and then the mean P value and the rank of each gene (based on the mean P value) were defined. Similarly, the top and bottom 5% z scores for each gene in the training set were removed and the sum of the remaining ones was calculated. Various numbers of top genes based on the mean P value were defined, their log2-based chip signal were multiplied with the sum of their z scores. This equated their Cox scores, namely, the risk index. The patients' Cox scores in the testing set was also calculated in this manner. Receiver operator characteristic (ROC) curves were drawn for patients in the training and testing sets and the area under the curve (AUC) values for each gene classifier was recorded. The AUC values were then plotted versus various numbers of gene classifiers to determine the optimal gene number that provides steady AUC values in the training set.


A L20OCV was also performed to confirm the optimal gene number of the classifier. First samples were partitioned into 5 groups with the same or very close numbers of samples. Five pairs of training and testing sets was generated with the training set consisting of 80% of samples and the testing set consisting of the remaining 20%. Therefore each sample was chosen exactly once in a testing set. Cox regression modeling was performed to select the top prognostic genes (from 2 to 200) in the training set and the selected genes were tested in the corresponding testing set. ROC was performed to calculate the AUC. The mean AUC of the 5 testing sets for gene number from 2 to 200 was calculated. This was repeated 100 times and the mean of 100 AUC's for gene numbers from 2 to 200 was then calculated. The mean AUC versus gene number (2 to 200) was plotted and the optimal number of genes in the signature was selected.


Hierarchical clustering was performed with GeneSpring7.0 (Silicon Genetics) to identify major clusters of patients and investigate their association with patient co-variates. Prior to clustering genes that had a coefficient of variation (CV) smaller than 0.3 (arbitrarily chosen) were removed so as to reduce the impact of genes that displayed minimal change in expression across the dataset. Thus a dataset with 11,101 genes was created for clustering analysis. The signal intensity of each gene was divided by the median expression level of that gene from all patients. Samples were clustered using Pearson correlation as measurement of similarity. Genes were clustered in the same way.


Results


Microarray Profiling


141 of the 144 microarrays gave excellent data (% present>40, scaling factor<10) while the remaining 3 samples (LS76, LS78, LS82) gave acceptable results (% present>30, scaling factor<15). Table 2 shows the clinical-pathological staging of the 134 SCC samples analyzed by microarray. All samples were included in initial clustering analysis. Genes were filtered from the dataset if they were not called present in at least 10% of all samples (including normal). This left 14,597 genes for analysis.

TABLE 2Patient samples by stageClinicalNumberPathologicalStage(%)StageNumber1a28 (20)T1 N0 M0271b50 (35)T2 N0 M048IIA7 (5)T1 N1 M06IIB31 (22)T1 N1 M030IIIA19 (14)T2 N2 M010T3 N0 M01T3 N1 M03T3 N2 M04IIIB5 (4)T4 N0 M01T4 N1 M03T4 N2 M01
Note.

One duplicate stage IIb, 77 lymph node negative samples


Unsupervised Hierarchical Clustering


For unsupervised clustering the dataset was further filtered by removing genes (CV<30%) that had low variation of expression across the entire dataset. The 134 SCC and 10 normal lung samples were initially clustered based on unsupervised k-means clustering of the remaining 11,101 genes. The normal lung samples had a distinct profile from the carcinomas and clustered together. The 2 duplicate SCC samples (LS-71 and LS-136) clustered together demonstrating the reproducibility of the microarray analysis. Of the 133 unique patient carcinomas four were removed from further analysis since the patient either died due to surgery (LS3) or the sample had mixed histology (LS-76, LS-84, LS-112). When the 129 samples were clustered using the 11,101 genes two major clusters were formed, one with 55 patients and the other with 74 patients (FIG. 1A). No significant association between tumor stage, differentiation, or patient gender and the two clusters was identified. There were approximately equal proportions of each stage present in both clusters (cluster I consists of 31 stage I, 15 stage II and 9 stage III patients; cluster 2 consists of 42 stage I, 18 stage II and 14 stage III patients). However, the patients in cluster I and 2 showed significantly separated survival curves (FIG. 1B, p=0.036), indicating that expression profiles, irrespective of stage, existed that were associated with overall survival (FIG. 1B).


Identification of Prognostic Gene Signatures


To identify genes that could further stratify early stage patients into good and poor prognostic groups several complimentary statistical analyses were performed. This included: 1) Cox modeling on a training set and validating prognostic signatures on a test set of samples; 2) bootstrapping; and 3) L20OCV.


First, the 129 SCC samples were split into training and test sets with equal number of stages represented in both groups. Both groups showed similar overall median survival times. The 65-patient training set was analyzed using a bootstrapping method (see Methods section) to determine the optimal number of genes to be used in the prognostic signature. When increasing numbers of genes was plotted versus the AUC from a receiver operator characteristic analysis it could be seen that the signature performance began to plateau at around 50 genes (FIG. 2A). A L20OCV procedure was used to confirm the optimal number of prognostic genes in the 65-patient training set. The result showed that a signature has a stable performance when the number of genes reaches 50. Therefore, the top ranked 50 genes would be used as the signature. The 50-gene classifier demonstrated overall predictive value of 70% when used in the 64-patient test set (FIG. 2B).


A LOOCV procedure was then used in the 65-patient training set to determine the optimal cutoff of the risk index. The error rates were calculated with various cutoffs. This indicated that cutoff at 58%ile gave the lowest error rate (FIG. 3). Therefore, the 58% ile of patients was used as the cutoff for determining survival. The performance of the prognostic signature was then examined in the testing set using this cutoff. The signature achieved 52.4% specificity and 81.8% sensitivity in the testing set (FIG. 3). Kaplan-Meier plot also showed good separation between predicted high-risk group of patients and low risk group of patients (p=0.0075). Multivariate analysis including sex, differentiation, stage, tumor size, age, and lymph node status was performed. None of the parameters except for the 50-gene signature had a significant p-value (Table 3). Kaplan-Meier analysis was also performed using the 50-gene signature and a risk cutoff of 58%. The high-risk group was well separated from the low risk group in all patients (p=0.0075, FIG. 4A) and when only those with stage 1 disease were tested (p 0.029; FIG. 4B).

TABLE 3Multivariate AnalysisCo-variateP-value50 gene signature0.01Sex0.24Differentiation0.66Stage0.41T0.91Age0.35N0.99


Example 2

Identification of a Robust Prognostic Signature


Although we used a bootstrap method to avoid random sampling issues in the training-testing method, a more robust prognostic signature might be identified if we use all 129 samples in the training set. Therefore, a gene signature was also selected by bootstrapping the entire 129-patient dataset. Genes were ranked based on their mean P value and the top 100 genes were identified (Table 4). Twenty-three of these genes were in common with the top 50 genes identified from the training-test method.


We had data on time to relapse (TTR) for 16 patients. The mean TTR was 21.7 months with 88% of patients relapsing within 3 years. Since the majority of patients who die after 3 years die from non-cancer related causes we chose a cutoff of 36 months for classifying patients who will have a lung cancer-related death. Our defined classifiers were tested with or without a 36-month cutoff. The signatures had a better performance in the testing set when a 3-year cutoff was employed. Therefore, a gene signature selected with the time limit is better than without the time limit.

TABLE 4SEQ ID NO:Rank452119123033378427057964097768450941310365111351218134601439315375163961786181901920420652143322439234712412425107267727132846129913022531290322523319434213520636161373638207393740315418742288433694423545337463834722848248494235020051234525853386541205530556302571658432593816026961756220963293642065836640867388684436937270286712897257732157414475897615877149789879298035813118231083279843848529886488722288425895690398914539247093261944629516296131972849832699114100


Example 3

Identification of a High-Risk Sub-Group of SCC Patients


The unsupervised hierarchical clustering described above identified two main groups of patients that differed significantly in their overall survival. A bootstrap analysis performed on the two patient groups found 121 genes (non-unique) whose expression levels were significantly different between the high- and low-risk groups (p <0.001, mean difference>3-fold; Table 5). Interestingly, the majority of these genes (118) were down-regulated in the high risk group (FIG. 5A, cluster 1). Pathway analysis demonstrated that genes involved in epidermal development functions, including keratins and small-proline rich proteins, were significantly enriched for in this dataset. These data, shown in Table 6, indicate that there are two major subtypes of SCC one of which has a gene expression profile consistent with poor differentiation and as such tends to be more aggressive. When the genes only involved in epidermal differentiation (FIG. 5B) were used to cluster the patient samples the two prognostically differentiated groups were maintained (FIG. 5C). These data indicate that there are two major subtypes of SCC one of which has a gene expression profile consistent with poor differentiation and as such tends to be more aggressive. The lack of expression of epidermal differentiation genes may be associated with a subgroup of tumors that are de-differentiated and therefore more aggressive.

TABLE 5121 genes significantly different between low- and high-risk clustersDunn-Sidak p-SEQ ID NO:value47 4.069E−08520.001779787614.78438E−06643.94295E−08706.14897E−11715.40462E−10724.99526E−07911.17801E−09920931.51307E−07940.00024053973.25762E−061010.000715044102 4.042E−051051.28648E−051114.10746E−071120.000129644115 7.6587E−081184.67009E−051217.48718E−091231.61815E−111254.82759E−081261.80901E−051281.45634E−111320.0005711371343.42792E−071382.83176E−101404.93018E−081419.06164E−111421.73482E−081450146 8.6277E−051481.68459E−071568.93603E−0515901607.24383E−061664.46788E−051671.61815E−12168 3.2363E−121705.27808E−0817101720173017401753.70691E−071770.0009645851790.000233071812.10853E−071840.0002611851.22494E−091860188 8.3147E−0819201931.33552E−0619401958.04368E−0719601981.78886E−07213021402161.77997E−112191.44447E−072236.79057E−082292.21201E−092310.0001276622320.0006700912330.0003340142360.0003713392375.35608E−1023802430245 1.5392E−072463.77172E−062519.51746E−062531.61815E−122577.19348E−07259 3.2363E−12260026202631.61815E−12278 3.2363E−122853.95638E−093133.06803E−0731803201.10983E−053212.86717E−0632203231.46054E−053242.65922E−0533103321.77997E−1033303413.60669E−083480.0012192643494.42435E−0835303579.21286E−053582.91267E−093601.67317E−0936603671.06791E−0737103730.0007366093971.53724E−104020.0016400044051.89887E−0540704187.28168E−114191.13076E−084242.83902E−054260.0016960154292.33385E−054352.53251E−064458.59804E−084570458045904639.60372E−094684.52017E−06









TABLE 6










List of significantly enriched pathways
















GO.




Gene.

Gene.#.On
Cate-


GO.ID
Count
GO.Class
.U133a
gory
p.value















8544
17
epidermal
56
P
7.31E−12




differentiation


6325
3
chromatin architecture
12
P
2.75E−04


7586
3
digestion
15
P
7.08E−04


7156
4
homophilic cell
39
P
0.004886




adhesion


7148
3
cell shape and cell
28
P
0.007914




size control


7565
3
pregnancy
28
P
0.007914


165
2
MAPKKKcascade
15
P
0.008242


6805
2
xenobiotic metabolism
15
P
0.008242


7169
3
receptor tyrosine
41
P
0.029293




kinase signaling


6832
2
small molecule
29
P
0.049333




transport









Example 4

Gene Expression Signatures for Prognosis of Lung Cancer.


Methods


Real-Time Quantitative RT-PCR


Total RNA samples were normalized by OD260. Quality testing included analysis by capillary electrophoresis using a Bioanalyzer (Agilent). For aRNA, the Ribobeast™ 1-Round Aminoallyl-aRNA amplification kit (Epicentre) was used. All first-strand cDNA synthesis, second-strand cDNA synthesis, in vitro transcription of aRNA, DNase treatment, purification and other steps were performed according to the manufacturer's protocol. For each sample aRNA was reverse transcribed into first-stand cDNA and used for real-time quantitative RT-PCR. The first-strand cDNA synthesis reaction contained, 100 ng of aRNA, 1 μl of 50 ng/μl T7-Oligo(dT) primer, 0.25 μl of 10 mM dNTPs, 1 μl of 5× Superscript™ III Reverse Transcriptase Buffer, 0.25 μl of 200 U/μl Superscript™ III Reverse Transcriptase (Invitrogen Corp), 0.25 μl of 100 mM DTT and 0.25 μl of 0.3 U/μl RNase Inhibitor (Epicentre) in a total reaction volume of 5 μl.


Teal-time quantitative RT-PCR analyses were performed on the ABI Prism 7900HT sequence detection system (Applied Biosystems). Each reaction contained 10 μl of 2× TaqMan® Universal PCR Master Mix (Applied Biosystems), 5 μl of cDNA template, and 1 μl of 20× Assays-on-Demand Gene Expression Assay Mix (Applied Biosystems) in a total reaction volume of 20 μl. The PCR consisted of an UNG activation step at 50° C. for 2 min and initial enzyme activation step at 95° C. for 10 min, followed by 40 cycles of 95° C. for 15 sec, 60° C. for 1 min.


Immunohistochemistry


Immunohistochemistry (IHC) was performed on tissue microarrays containing 60 lung squamous cell carcinomas. Areas of the tumor that best represented the overall morphology were selected for generating a tissue microarray (TMA) block as previously described by Kononen et al. (1998). All controls stained negative for background.


Pathway Analysis


Pathway analysis was performed by first mapping the genes on the Affy U133A chip to the Biological Process categories of Gene Ontology (GO). The categories that had at least 10 genes on the U133A chip were used for subsequent pathway analyses. Genes that were selected from data analysis were mapped to the GO Biological Process categories. Then the hypergeometric distribution probability of the genes was calculated for each category. A category that had a p-value less than 0.05 and had at least two genes was considered over-represented in the selected gene list.


Identification of Core Set of Prognostic Genes


Briefly, 400 random training sets of 65 patients were selected from the 129 lung SCC patients. For each training set, Cox regression was performed to identify significant genes at the 5% significance level (i.e. P<0.05). 331 genes that are significant in more than 40% of the training sets are used as the core gene sets. These 331 genes are shown in Table 7.


Microarray Results Verification


To confirm the microarray results we initially performed TaqMan® quantitative RT-PCR on4 genes (FGFR2, KRT13, NTRK2, and VEGF). The correlation between the platforms ranged from 0.71 to 0.96 indicating the expression data were reproducible.


Immunohistochemistry was then performed on tissue microarrays to confirm expression of several of these proteins within the tumor cells. Various levels of expression of several keratins in addition to the tyrosine kinase proteins FGFR2 and NTKR2 in SCC cells was demonstrated.


Identification of a Core Set of Prognostic Genes


In the previous analysis a set of 50 genes was identified from a single training set of 65 patients. One problem with this approach is that the genes identified as predictors of prognosis can be unstable since the molecular signature strongly depends on the selection of patients in the training sets. The use of validation by repeated random sampling can avoid this instability. We therefore generated 400 random training sets of 65 patients from the 129 lung SCC patients and performed Cox regression to identify significant genes at the 5% significance level (i.e. P<0.05). 331 genes that were significant in more than 40% of the training sets were identified as a core set of prognostic genes in squamous cell lung cancer. These genes are SEQ ID NOs: in Table 7.

TABLE 7331 Core genes1235678911131415161718202122232425262728293031323334353637383940414243444546484950515455565758596265666768697374757677798081828384858687888990919295969899100104106107108109110113114116117119120122124127129130133134135136137139141143147149150151152153154155157159161163164165166169176178180182183187190191194197199200201202203204205206207208209210211212215217218220222224225226227228234235239240241242244247248249250252254255256258261263264265266269270271272274275276282283284286288289290291292293294295296297298299300301302303304305306307308309310311312314315316317319325327328329330334335336337338339340342343344345346347350351352354355356359361363364365368369370372374375376377378379380381382383384385386387388389390391392393394395396398399400401403404406409410411412413415417420421422423425427428430431432433434436437438439441442443444447448449450451452453454455456460461462464465466467469470471472473


Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, the descriptions and examples should not be construed as limiting the scope of the invention.

TABLE 8SEQ ID NOs: and gene descriptions11255_g_atguanylate cyclase activator 1A (retina)GUCA1AL368612200619_atsplicing factor 3b, subunit 2SF3B2NM_0068423200650_s_atlactate dehydrogenase ALDHANM_0055664200727_s_atARP2 actin-related protein 2 homologACTR2AA6995835200728_atARP2 actin-related protein 2 homologACTR2BE5662906200737_atphosphoglycerate kinase 1PGK1NM_0002917200795_atSPARC-like 1 (mast9, hevin)SPARCL1NM_0046848200810_s_atcold inducible RNA binding proteinCIRBPNM_0012809200811_atcold inducible RNA binding proteinCIRBPNM_00128010200824_atglutathione S-transferase piGSTP1NM_00085211200836_s_atmicrotubule-associated protein 4MAP4NM_00237512200840_atlysyl-tRNA synthetaseKARSNM_00554813200863_s_atRAB11A, member RAS oncogene familyRAB11AAI21510214200893_atsplicing factor, arginine/serine-rich 10SFRS10NM_00459315200951_s_atcyclin D2CCND2AW02649116200970_s_atstress-associated endoplasmic reticulum protein 1SERP1AL13680717200993_atimportin 7IPO7AA93927018201003_x_atubiquitin-conjugating enzyme E2 variant 1UBE2V1NM_00334919201033_x_atribosomal protein, large, P0RPLP0NM_00100220201047_x_atRAB6A, member RAS oncogene familyRAB6ABC00361721201067_atproteasome (prosome, macropain) 26S subunit,PSMC2BF215487ATPase, 222201125_s_atintegrin, beta 5ITGB5NM_00221323201151_s_atmuscleblind-likeMBNL1BF51220024201152_s_atmuscleblind-likeMBNL1N3191325201154_x_atribosomal protein L4RPL4NM_00096826201170_s_atbasic helix-loop-helix domain containing, class B, 2BHLHB2NM_00367027201175_atthioredoxin-related transmembrane protein 2TMX2NM_01595928201236_s_atBTG family, member 2BTG2NM_00676329201251_atpyruvate kinase, musclePKM2NM_00265430201286_atsyndecan 1SDC1Z4819931201287_s_atsyndecan 1SDC1NM_00299732201351_s_atYME1-like 1YME1L1AF07065633201353_s_atbromodomain adjacent to zinc finger domain, 2ABAZ2AAI65312634201361_athypothetical protein MGC5508MGC5508NM_02409235201447_atTIA1 cytotoxic granule-associated RNA bindingTIA1H9654936201448_atTIA1 cytotoxic granule-associated RNA bindingTIA1AL046419transcript variant 137201449_atTIA1 cytotoxic granule-associated RNA bindingTIA1AL567227transcript variant 138201545_s_atpoly(A) binding protein, nuclear 1PABPN1NM_00464339201623_s_ataspartyl-tRNA synthetaseDARSBC00062940201667_atgap junction protein, alpha 1GJA1NM_00016541201683_x_atchromosome 14 open reading frame 92C14orf92BE78363242201718_s_aterythrocyte membrane protein band 4.1-like 2EPB41L2BF51168543201725_atchromosome 10 open reading frame 7C10orf7NM_00602344201779_s_atring finger protein 13RNF13AF07055845201780_s_atring finger protein 13RNF13NM_00728246201801_s_atsolute carrier family 29 (nucleoside transporters),SLC29A1AF079117mem 147201820_atkeratin 5KRT5NM_00042448201892_s_atIMP (inosine monophosphate) dehydrogenase 2IMPDH2NM_00088449202006_atprotein tyrosine phosphatase, non-receptor type 12PTPN12NM_00283550202170_s_ataminoadipate-semialdehyde dehydrogenase-AASDHPPTAF151057phosphopantetheinyl transferase51202181_atKIAA0247KIAA0247NM_01473452202219_atsolute carrier family 6, member 8SLC6A8NM_00562953202223_atintegral membrane protein 1ITM1NM_00221954202253_s_atdynamin 2DNM2NM_00494555202288_atFK506 binding protein 12-rapamycin assoc. pro 1FRAP1U8896656202349_attorsin family 1, member A (torsin A)TOR1ANM_00011357202364_atMAX interactor 1MXI1NM_00596258202397_atnuclear transport factor 2NUTF2NM_00579659202418_atYip1 interacting factor homologYIF1NM_02047060202471_s_atisocitrate dehydrogenase 3 (NAD+) gammaIDH3GNM_00413561202489_s_atFXYD domain-containing ion transport regulator 3FXYD3BC00523862202496_atautoantigenRCD-8NM_01432963202503_s_atKIAA0101 gene productKIAA0101NM_01473664202504_atataxia-telangiectasia group D-associated proteinTRIM29NM_01210165202530_atmitogen-activated protein kinase 14MAPK14NM_00131566202602_s_atHIV TAT specific factor 1HTATSF1NM_01450067202746_atintegral membrane protein 2AITM2AAL02178668202747_s_atintegral membrane protein 2AITM2ANM_00486769202753_atproteasome regulatory particle subunit p44S10P44S10NM_01481470202755_s_atglypican 1GPC1AI35486471202756_s_atglypican 1GPC1NM_00208172202831_atglutathione peroxidase 2GPX2NM_00208373202887_s_atDNA-damage-inducible transcript 4DDIT4NM_01905874202935_s_atSRY-box 9SOX9AI38214675202990_atphosphorylase, glycogen; liverPYGLNM_00286376203040_s_athydroxymethylbilane synthaseHMBSNM_00019077203082_atBMS1-like, ribosome assembly protein (yeast)BMS1LNM_01475378203190_atNADH dehydrogenase (ubiquinone) Fe—S protein 8NDUFS8NM_00249679203196_atATP-binding cassette, sub-fam C (CFTR/MRP),ABCC4AI948503mem 480203211_s_atmyotubularin related protein 2MTMR2AK02703881203368_atcysteine-rich with EGF-like domains 1CRELD1NM_01551382203372_s_atsuppressor of cytokine signaling 2SOCS2AB00490383203378_atpre-mRNA cleavage complex II protein Pcf11PCF11AB02063184203491_s_attranslokinPIG8AI12352785203494_s_attranslokinPIG8NM_01467986203545_atasparagine-linked glycosylation 8 homologALG8NM_02407987203555_atprotein tyrosine phosphatase, non-receptor type 18PTPN18NM_01436988203573_s_atRab geranylgeranyltransferase, alpha subunitRABGGTANM_00458189203589_s_attranscription factor Dp-2TFDP2NM_00628690203611_attelomeric repeat binding factor 2TERF2NM_00565291203638_s_atfibroblast growth factor receptor 2FGFR2NM_02296992203639_s_atfibroblast growth factor receptor 2FGFR2M8063493203691_atprotease inhibitor 3, skin-derivedPI3NM_00263894203726_s_atlaminin, alpha 3LAMA3NM_00022795203759_atST3 beta-galactoside alpha-2,3-sialyltransferase 4ST3GAL4NM_00627896203787_atsingle-stranded DNA binding protein 2SSBP2NM_01244697203798_s_atvisinin-like 1VSNL1NM_00338598203809_s_atv-akt murine thymoma viral oncogene homolog 2AKT2AA76907599203853_s_atGRB2-associated binding protein 2GAB2NM_012296100203885_atRAB21, member RAS oncogene familyRAB21NM_014999101203924_atglutathione S-transferase A2GSTA1NM_000846102203953_s_atClaudin 3CLDN3BE791251103203964_atN-myc (and STAT) interactorNMINM_004688104203974_athaloacid dehalogenase-like hydrolase domainHDHD1ANM_012080containing 1A105204014_atdual specificity phosphatase 4DUSP4NM_001394106204036_atendothelial differentiation, lysophosphatidic acidEDG2AW269335G-protein-coupled receptor, 2107204037_atEDG2BF055366108204038_s_atEDG2NM_001401109204047_s_atphosphatase and actin regulator 2PHACTR2AW295193110204049_s_atPHACTR2NM_014721111204136_atcollagen, type VII, alpha 1COL7A1NM_000094112204151_x_ataldo-keto reductase family 1, member C1AKR1C1NM_001353113204154_atcysteine dioxygenase, type ICDO1NM_001801114204206_atMAX binding proteinMNTNM_020310115204268_atS100 calcium-binding protein A2S100A2NM_005978116204326_x_atmetallothionein 1XMT1XNM_002450117204367_atSp2 transcription factorSP2D28588118204379_s_atfibroblast growth factor receptor 3FGFR3NM_000142119204385_atkynureninase (L-kynurenine hydrolase)KYNUNM_003937120204388_s_atmonoamine oxidase AMAOANM_000240121204455_atbullous pemphigoid antigen 1BPAG1NM_001723122204460_s_atRAD1 homologRAD1AF074717123204469_atprotein tyrosine phosphatase, receptor-type, ZPTPRZ1NM_002851polypep 1124204493_atBH3 interacting domain death agonistBIDNM_001196125204532_x_atUDP glycosyltransferase 1 family, polypep A9UGT1A9NM_021027126204542_atsialyltransferaseSIAT7BNM_006456127204547_atRAB40B, member RAS oncogene familyRAB40BNM_006822128204614_atserine (or cysteine) proteinase inhibitor, clade B,SERPINB2NM_002575mem 2129204621_s_atnuclear receptor subfamily 4, group A, member 2NR4A2AI935096130204622_x_atNR4A2NM_006186131204633_s_atnuclear mitogen- and stress-activated proteinRPS6KA5AF074393kinase-1132204636_atcollagen, type XVII, alpha 1COL17A1NM_000494133204672_s_atankyrin repeat domain 6ANKRD6NM_014942134204734_atkeratin 15KRT15NM_002275135204753_s_athepatic leukemia factorHLFAI810712136204754_athepatic leukemia factorHLFW60800137204755_x_athepatic leukemia factorHLFM95585138204855_atserine (or cysteine) proteinase inhibitor, clade B,SERPINB5NM_002639mem 5139204887_s_atpolo-like kinase 4PLK4NM_014264140204952_atGPI-anchored metastasis-associated proteinC4.4ANM_014400homolog141204971_atcystatin A (stefin A)CSTANM_005213142205014_atheparin-binding growth factor binding proteinFGFBP1NM_005130143205022_s_atcheckpoint suppressor 1CHES1NM_005197144205054_atnebulinNEBNM_004543145205064_atsmall proline-rich protein 1BSPRR1BNM_003125146205081_atcysteine-rich protein 1CRIP1NM_001311147205141_atangiogenin, ribonuclease, RNase A family, 5ANGNM_001145148205157_s_atkeratin 17KRT17NM_000422149205176_s_atintegrin beta 3 binding protein (beta3-endonexin)ITGB3BPNM_014288150205206_atKallmann syndrome 1 sequenceKAL1NM_000216151205219_s_atgalactokinase 2GALK2NM_002044152205267_atPOU domain, class 2, associating factor 1POU2AF1NM_006235153205367_atadaptor protein with pleckstrin homology and srcAPSNM_020979homology 2 domains154205372_atpleiomorphic adenoma gene 1PLAG1NM_002655155205450_atphosphorylase kinase, alpha 1 (muscle)PHKA1NM_002637156205490_x_atgap junction protein, beta 3GJB3BF060667157205569_atlysosomal-associated membrane protein 3LAMP3NM_014398158205595_atdesmoglein 3DSG3NM_001944159205618_atproline rich Gla (G-carboxyglutamic acid) 1PRRG1NM_000950160205623_ataldehyde dehydrogenase 3ALDH3A1NM_000691161205624_atcarboxypeptidase A3 (mast cell)CPA3NM_001870162205789_atCD1D antigen, d polypeptideCD1DNM_001766163205839_s_atbenzodiazapine receptor (peripheral) assoc pro 1BZRAP1NM_004758164205961_s_atPC4 and SFRS1 interacting protein 1PSIP1NM_004682165205968_atK+ voltage-gated channel, delayed-rectifier,KCNS3NM_002252subfamily S, member 3166205969_atarylacetamide deacetylase (esterase)AADACNM_001086167206032_atdesmocollin 3, transcript variant Dsc3aDSC3AI797281168206033_s_atdesmocollin 3, transcript variant Dsc3aDSC3AI797281169206068_s_atacyl-Coenzyme A dehydrogenase, long chainACADLAI367275170206094_x_atUDP glycosyltransferase 1 family, polypeptide A6UGT1A6NM_001072171206122_atSRY-box 20SOX15NM_006942172206164_atchloride channel, calcium activated, family mem 2CLCA2NM_006536173206165_s_atchloride channel, calcium activated, family mem 2CLCA2NM_006536174206166_s_atcalcium-activated chloride channel-2CLCA2NM_006536175206300_s_atparathyroid hormone-like hormonePTHLHNM_002820176206331_atcalcitonin receptor-likeCALCRLNM_005795177206400_atlectin, galactoside-binding, soluble, 7LGALS7NM_002307178206461_x_atmetallothionein 1HMT1HNM_005951179206561_s_ataldo-keto reductase family 1, member B10AKR1B10NM_020299180206566_atsolute carrier family 7 (cationic amino acidSLC7A1NM_003045transporter, y+ system), member 1181206581_atbasonuclinBNC1NM_001717182206641_attumor necrosis factor receptor superfamily, mem 17TNFRSF17NM_001192183206653_atPolymerase (RNA) III (DNA directed) polypep GPOLR3GBF062139184206658_athypothetical protein MGC10902UPK3BNM_030570185206756_atcarbohydrate (N-acetylglucosamine 6-O)CHST7NM_019886sulfotransferase 7186206912_atforkhead box E1FOXE1NM_004473187207029_atKIT ligandKITLGNM_000899188207126_x_atUDP glycosyltransferase 1 family, polypep A1UGT1A1 ///NM_000463189207499_x_athypothetical protein FLJ10043SMAP-1NM_017979190207513_s_atzinc finger protein 189ZNF189NM_003452191207620_s_atcalcium/calmodulin-dependent serine proteinCASKNM_003688kinase192207935_s_atkeratin 13KRT13NM_002274193208153_s_atFAT tumor suppressor homolog 2FAT2NM_001447194208228_s_atfibroblast growth factor receptor 2FGFR2M87771195208502_s_atpaired-like homeodomain transcription factor 1PITX1NM_002653196208539_x_atsmall proline-rich protein 2BSPRR2ANM_006945197208581_x_atmetallothionein 1XMT1XNM_005952198208596_s_atUDP glycosyltransferase 1 family, polypep A3UGT1A3NM_019093199208657_s_atseptin 99-SepAF142408200208692_atribosomal protein S3RPS3U14990201208737_atATPase, H+ transporting, lysosomal 13 kDa, V1ATP6V1G1BC003564subunit G isoform 1202208758_at5-aminoimidazole-4-carboxamide ribonucleotideATICD89976formyltransferase/IMP cyclohydrolase203208798_x_atgolgin-67GOLGIN-AF20423167204208856_x_atribosomal protein, large, P0RPLP0BC003655205208870_x_atATP synthase, H+ transporting, mitochondrial F1ATP5C1BC000931complex, gamma polypeptide 1206208933_s_atlectin, galactoside-binding, soluble, 8LGALS8AI659005207208935_s_atlectin, galactoside-binding, soluble, 8LGALS8L78132208208950_s_ataldehyde dehydrogenase 7 family, mem A1ALDH7A1BC002515209209009_atesterase D/formylglutathione hydrolaseESDBC001169210209041_s_atubiquitin-conjugating enzyme E2G 2UBE2G2BG395660211209117_atWW domain binding protein 2WBP2U79458212209122_atadipose differentiation-related proteinADFPBC005127213209125_atkeratin 6AKRT6AJ00269214209126_x_atkeratin 6 isoform K6fKRT6BL42612215209204_atLIM domain only 4LMO4AI824831216209212_s_attranscription factor BTEB2KLF5AB030824217209215_attetracycline transporter-like proteinTETRANL11669218209220_atglypican 3GPC3L47125219209260_atstratifinSFNBC000329220209296_atprotein phosphatase 1B (formerly 2C), magnesium-PPM1BAF136972dependent, beta isoform221209309_atzinc-alpha2-glycoproteinAZGP1D90427222209339_atseven in absentia homolog 2SIAH2U76248223209351_atkeratin 14KRT14BC002690224209380_s_atCFTR/MRP, member 5ABCC5AF146074225209411_s_atGolgi associated, gamma adaptin ear containing,GGA3AW008018ARF binding protein 3226209446_s_atSimilar to hypothetical protein FLJ10803BC001743227209457_atdual specificity phosphatase 5DUSP5U16996228209509_s_atdolichyl-phosphateDPAGT1BC000325229209587_athindlimb expressed homeobox protein backfootBftU70370230209647_s_atIMAGE: 2972022SOCS5AW664421231209699_x_atdihydrodiol dehydrogenaseAKR1C2U05598232209719_x_atsquamous cell carcinoma antigen 1SCCA1U19556233209720_s_atserine (or cysteine) proteinase inhibitor, clade BSERPINB3U19556(ovalbumin), member 3234209727_atGM2 ganglioside activatorGM2AM76477235209748_atspastic paraplegia 4SPG4AB029006236209792_s_atkallikrein 10KLK10BC002710237209800_atkeratin 16KRT16AF061812238209863_s_atCUSPTP73LAF091627239209878_s_atv-rel reticuloendotheliosis viral oncogene hom A,RELAM62399240209897_s_atslit homolog 2 (Drosophila)SLIT2AF055585241209959_atnuclear receptor subfamily 4, group A, member 3NR4A3U12767242209963_s_aterythropoietin receptorEPORM34986243210020_x_atNB-1CALML3M58026244210052_s_atTPX2, microtubule-associated protein homologTPX2AF098158245210064_s_aturoplakin 1BUPK1BNM_006952246210065_s_aturoplakin IbUPK1BNM_006952247210084_x_atmast cell alpha II tryptaseAF206665248210133_atchemokine (C—C motif) ligand 11CCL11D49372249210135_s_atshort stature homeobox 2SHOX2AF022654250210264_atG protein-coupled receptor 35GPR35AF089087251210355_atparathyroid-like proteinPTHLHJ03580252210406_s_atRAB6A, member RAS oncogene familyRAB6AAL136727253210505_atalcohol dehydrogenaseADH7U07821254210512_s_atvascular endothelial growth factorVEGFAF022375255210829_s_atsingle-stranded DNA binding protein 2SSBP2AF077048256210876_atannexin A2ANXA2M62896257211002_s_attripartite motif protein TRIM29 betaTRIM29AF230389258211105_s_atnuclear factor of activated T-cells, cytoplasmic,NFATC1U80918calcineurin-dependent 1259211194_s_atp73HTP73LAB010153260211195_s_atp51 deltaTP73LAB010153261211272_s_atdiacylglycerol kinase, alpha 80 kDaDGKAAF064771262211361_s_athurpinhurpinAJ001696263211401_s_atfibroblast growth factor receptor 2FGFR2AB030078264211452_x_atclone FLB4816 PRO1252AF130054265211456_x_atmetallothionein 1H-likeAF333388266211474_s_atserine (or cysteine) proteinase inhibitor, clade BSERPINB6BC004948(ovalbumin), member 6267211527_x_atvascular permeability factorVEGFM27281268211547_s_atMiller-Dieker lissencephaly proteinLIS1L13387269211548_s_athydroxyprostaglandin dehydrogenase 15-(NAD)HPGDJ05594270211596_s_atleucine-rich repeats and immunoglobulin-likeLRIG1AB050468domains 1271211634_x_atimmunoglobulin heavy constant muIGHMM24669272211635_x_atIgM rheumatoid factor RF-TT1, VH chainM24670273211653_x_atpseudo-chlordeconeAKR1C2M33376274211689_s_attransmembrane protease, serine 2TMPRSS2AF270487275211721_s_atzinc finger proteins 551ZNF551BC005868276211734_s_atIgE Fc, high affinity I, receptor for α polypepFCER1ABC005912277211756_atparathyroid hormone-like hormonePTHLHBC005961278211834_s_atp73Lp63p51p40KETTP73LAB042841279212061_atKIAA0332SR140AB002330280212092_atKIAA1051PEG10BE858180281212094_atKIAA1051PEG10BE858180282212162_atFLJ12811AK022873283212189_s_atcomponent of oligomeric Golgi complex 4COG4AK022874284212228_s_athypothetical protein DKFZp434K046DKFZP434K046AC004382285212236_x_atcytokeratin 17KRT17Z19574286212252_atCa2+ calmodulin-dependent protein kinase kinase 2βCAMKK2AA181179287212255_s_atFLJ10822 fisFLJ10822AK001684288212286_atankyrin repeat domain 12ANKRD12AW572909289212311_atKIAA0746 proteinKIAA0746AA522514290212314_atKIAA0746 proteinKIAA0746AB018289291212424_atprogrammed cell death 11PDCD11AW026194292212441_atKIAA0232KIAA0232D86985293212458_atsprouty-related, EVH1 domain containing 2SPRED2H97931294212466_atsprouty-related, EVH1 domain containing 2SPRED2AW138902295212570_atKIAA0830 proteinKIAA0830AL573201296212573_atKIAA0830 proteinKIAA0830AF131747297212595_s_atDAZ associated protein 2DAZAP2AL534321298212599_atautism susceptibility candidate 2AUTS2AK025298299212600_s_atubiquinol-cytochrome c reductase core protein IIUQCRC2AV727381300212662_atpoliovirus receptorPVRBE615277301212680_x_atprotein phosphatase 1, regulatory (inhibitor)PPP1R14BBE305165subunit 14B302212836_atpolymerase (DNA-directed), delta 3, accessoryPOLD3D26018subunit303212841_s_atPTPRF interacting protein, binding protein 2PPFIBP2AI692180304212864_atCDP-diacylglycerol synthase (phosphatidateCDS2Y16521cytidylyltransferase) 2305212914_atchromobox homolog 7CBX7AV648364306212980_atAHA1, activator of heat shock 90 kDa proteinAHSA2AL050376ATPase homolog 2307213023_atutrophinUTRNNM_007124308213034_atKIAA0999 proteinKIAA0999AB023216309213093_atprotein kinase C, alphaPRKCAAI471375310213199_atDKFZP586P0123 proteinDKFZP586P0123AL080220311213325_atpoliovirus receptor-related 3PVRL3AA129716312213366_x_atATP synthase, H+ transporting, mitochondrial F1ATP5C1AV711183complex, gamma polypeptide 1313213425_atwingless-type MMTV integration site family,WNT5AAI968085member 5A314213440_atRAB1A, member RAS oncogene familyRAB1AAL530264315213471_atnephronophthisis 4NPHP4AB014573316213490_s_atmitogen-activated protein kinase kinase 2MAP2K2AI762811317213518_atprotein kinase C, iotaPRKCIAI689429318213680_atkeratin 6AKRT6BAI831452319213700_s_atPyruvate kinase, musclePKM2AA554945320213721_atSRY-box 2SOX2L07335321213722_atSRY-box 2SOX2AW007161322213796_atSmall proline-rich protein SPRKSPRR1AAI923984323213808_at23688 cloneADAM23BE674466324213843_x_ataccessory proteins BAP31BAP29SLC6A8AW276522325213880_atleucine-rich repeat-containing G protein-coupledLGR5AL524520receptor 5326213913_s_atKIAA0984 proteinKIAA0984AW134976327214073_atcortactinCTTNBG475299328214100_x_atIMAGE: 1964520AI284845329214260_atCOP9 constitutive photomorphogenic homologCOPS8AI079287subunit 8330214441_atsyntaxin 6STX6NM_005819331214549_x_atsmall proline-rich protein 1ASPRR1ANM_005987332214580_x_atkeratin 6BKRT6BAL569511333214680_atneurotrophic tyrosine kinase, receptor, type 2NTRK2BF674712334214688_attransducin-like enhancer of split 4TLE4BF217301335214735_atphosphoinositide-binding protein PIP3-EPIP3-EAW166711336214812_s_atKIAA0184KIAA0184D80006337214829_ataminoadipate-semialdehyde synthaseAASSAK023446338214965_athypothetical protein MGC26885MGC26885AF070574339215011_atRNA, U17D small nucleolarRNU17DAJ006835340215030_atG-rich RNA sequence binding factor 1GRSF1AK023187341215125_s_atUDP glycosyltransferase 1 family, polypep A9UGT1A9AV691323342215189_atkeratin, hair, basic, 6 (monilethrix)KRTHB6X99142343215354_s_atproline-, glutamic acid-, leucine-rich protein 1PELP1BC002875344215372_x_atHypothetical protein LOC151878LOC151878AU146794345215382_x_atmast cell alpha II tryptaseAF206666346215561_s_atinterleukin 1 receptor, type IIL1R1AK026803347215786_atHepatitis B virus x associated proteinHBXAPAK022170348215812_s_atcreatine transporterSLC6A10U41163349216052_x_atArteminARTNAF115765350216147_atSeptin 1111-SepAL353942351216221_s_atpumilio homolog 2PUM2D87078352216248_s_atnuclear receptor subfamily 4, group A, member 2NR4A2S77154353216258_s_atUV-B repressed sequence, HUR 7BE148534354216263_s_atchromosome 14 open reading frame 120C14orf120AK022215355216288_atcysteinyl leukotriene receptor 1CYSLTR1AU159276356216412_x_atIgG to Puumala virus G2, light chain V regionAF043584357216594_x_ataldo-keto reductase family 1, member C1AKR1C1S68290358216603_atsolute carrier family 7, member 8AL365343359216722_atVENT-like homeobox 2 pseudogene 1VENTX2P1AF164963360216918_s_atbullous pemphigoid antigen 1 isoforms 1 and 3DSTAL096710361217003_s_attMDC II, isoform [d]AJ132823362217097_s_athypothetical protein DKFZp564F013PHTF2AC004990363217165_x_atmetallothionein 1F (functional)MT1FM10943364217198_x_atimmunoglobulin heavy constant gamma 1IGHG1U80164365217227_x_atimmunoglobulin lambda locusIGLVJCX93006366217272_s_atserine (or cysteine) proteinase inhibitor, clade B,hurpinAJ001698member 13367217312_s_atcollagen type VII intergenic regionCOL7A1L23982368217388_s_atkynureninase (L-kynurenine hydrolase)KYNUD55639369217418_x_atmembrane-spanning 4-domains, subfam A, mem 1MS4A1X12530370217480_x_atsimilar to Ig kappa chainLOC339562M20812371217528_atchloride channel, calcium activated, family mem 2CLCA2BF003134372217622_atchromosome 22 open reading frame 3C22orf3AA018187373217626_atIMAGE: 3089210AKR1C2 ///BF508244AKR1C1374217746_s_atprogrammed cell death 6 interacting proteinPDCD6IPNM_013374375217783_s_atyippee-likeYPEL5NM_016061376217786_atSKB1 homologSKB1NM_006109377217811_atselenoprotein TSELTNM_016275378217841_s_atprotein phosphatase methylesterase-1PME-1NM_016147379217860_atNADH dehydrogenase (ubiquinone) 1 alphaNDUFA10NM_004544subcomplex, 10,380217922_atMannosidase, alpha, class 1A, member 2MAN1A2AL157902381217994_x_athypothetical protein FLJ20542FLJ20542NM_017871382218070_s_atGDP-mannose pyrophosphorylase AGMPPANM_013335383218092_s_atHIV-1 Rev binding proteinHRBNM_004504384218192_atinositol hexaphosphate kinase 2IHPK2NM_016291385218236_s_atprotein kinase D3PRKD3NM_005813386218238_atGTP binding protein 4GTPBP4NM_012341387218239_s_atGTP binding protein 4GTPBP4NM_012341388218288_s_athypothetical protein MDS025MDS025NM_021825389218305_atimportin 4IPO4NM_024658390218331_s_atchromosome 10 open reading frame 18C10orf18NM_017782391218355_atkinesin family member 4AKIF4ANM_012310392218384_atcalcium regulated heat stable protein 1CARHSP1NM_014316393218460_athypothetical protein FLJ20397FLJ20397NM_017802394218483_s_athypothetical protein FLJ21827FLJ21827NM_020153395218507_athypoxia-inducible protein 2HIG2NM_013332396218546_athypothetical protein FLJ14146FLJ14146NM_024709397218657_atLink guanine nucleotide exchange factor IIRAPGEFL1NM_016339398218696_ateukaryotic translation initiation factor 2-α kinase 3EIF2AK3NM_004836399218699_atRAB7, member RAS oncogene family-like 1RAB7L1BG338251400218750_athypothetical protein MGC5306MGC5306NM_024116401218769_s_atankyrin repeat, family A (RFXANK-like), 2ANKRA2NM_023039402218796_athypothetical protein FLJ20116C20orf42NM_017671403218834_s_atheat shock 70 kDa protein 5 (glucose-regulatedHSPA5BP1NM_017870protein, 78 kDa) binding protein 1404218957_s_athypothetical protein FLJ11848FLJ11848NM_025155405218960_attransmembrane protease, serine 4TMPRSS4NM_016425406218962_s_athypothetical protein FLJ13576FLJ13576NM_022484407218990_s_atsmall proline-rich protein 3SPRR3NM_005416408219129_s_athypothetical protein FLJ11526SAP30LNM_024632409219132_atpellino homolog 2PELI2NM_021255410219154_atRas homolog gene family, member FRHOFNM_024714411219155_atphosphatidylinositol transfer protein, cytoplasmic 1PITPNC1NM_012417412219201_s_attwisted gastrulation homolog 1TWSG1NM_020648413219217_athypothetical protein FLJ23441FLJ23441NM_024678414219241_x_athypothetical protein FLJ20515SSH3NM_017857415219245_s_athypothetical protein FLJ13491FLJ13491AI309636416219250_s_atfibronectin leucine rich transmem protein 3FLRT3NM_013281417219347_atnudix (nucleoside diphosphate linked moiety X)-NUDT15NM_018283type motif 15418219389_athypothetical protein FLJ10052FLJ10052NM_017982419219554_atRh type C glycoproteinRHCGNM_016321420219582_atopioid growth factor receptor-like 1OGFRL1NM_024576421219704_atgerm cell specific Y-box binding proteinYBX2NM_015982422219732_atplasticity related gene 3PRG-3NM_017753423219741_x_atzinc finger protein 552ZNF552NM_024762424219756_s_athypothetical protein FLJ22792POF1BNM_024921425219854_atzinc finger protein 14 (KOX 6)ZNF14NM_021030426219936_s_atG protein-coupled receptor 87GPR87NM_023915427219959_atmolybdenum cofactor sulfuraseMOCOSNM_017947428219962_atangiotensin I converting enzyme (peptidyl-ACE2NM_021804dipeptidase A) 2429219995_s_athypothetical protein FLJ13841FLJ13841NM_024702430219997_s_atCOP9 constitutive photomorphogenic hom sub 7BCOPS7BNM_022730431220046_s_atcyclin L1CCNL1NM_020307432220177_s_attransmembrane protease, serine 3TMPRSS3NM_024022433220285_atchromosome 9 open reading frame 77C9orf77NM_016014434220466_athypothetical protein FLJ13215FLJ13215NM_025004435220664_atsmall proline-rich protein 2CSPRR2CNM_006518436220668_s_atDNA (cytosine-5-)-methyltransferase 3 betaDNMT3BNM_006892437221004_s_atintegral membrane protein 2CITM2CNM_030926438221045_s_atperiod homolog 3PER3NM_016831439221047_s_atMAP/microtubule affinity-regulating kinase 1MARK1NM_018650440221050_s_atGTP binding protein 2GTPBP2NM_019096441221064_s_atchromosome 16 open reading frame 28C16orf28NM_023076442221096_s_athypothetical protein PRO1580PRO1580NM_018502443221234_s_atBTB and CNC homology 1, basic leucine zipperBACH2NM_021813transcription factor 2444221286_s_atproapoptotic caspase adaptor proteinPACAPNM_016459445221305_s_atUDP glycosyltransferase 1 family, polypep A8UGT1A8NM_019076446221326_s_atdelta-tubulinTUBD1NM_016261447221480_atheterogeneous nuclear ribonucleoprotein DHNRPDBG180941448221513_s_atUTP14, U3 small nucleolar ribonucleoprotein,UTP14C/BC001149homolog C/homolog AUTP14A449221514_atU3 small nucleolar ribonucleoprotein, hom AUTP14ABC001149450221580_s_athypothetical protein MGC5306MGC5306BC001972451221597_s_atHSPC171 proteinHSPC171BC003080452221622_s_atuncharacterized hypothalamus protein HT007HT007AF246240453221649_s_atpeter pan homologPPANBC000535454221679_s_atabhydrolase domain containing 6ABHD6AF225418455221770_atribulose-5-phosphate-3-epimeraseRPEBE964473456221790_s_atLDL receptor adaptor proteinARHAL545035457221795_atSimilar to hypothetical protein FLJ20093AI346341458221796_atSimilar to hypothetical protein FLJ20093AA707199459221854_atESTsPKP1AI378979460221884_atecotropic viral integration site 1EVI1BE466525461243_g_atmicrotubule-associated protein 4MAP4M6457146231846_atras homolog gene family, member DRHODAW00373346333323_r_atstratifinSFNX5734846433850_atmicrotubule-associated protein 4MAP4W2889246534858_atpotassium channel tetramerisation domainKCTD2D79998containing 246637512_at3-hydroxysteroid epimeraseRODHU8928146741037_atTEA domain family member 4TEAD4U6382446841469_atelafinPI3L1034346944111_atvacuolar protein sorting 33BVPS33BAI67236347049049_atdeltex 3 homologDTX3N9270847149077_atprotein phosphatase methylesterase-1PME-1AL04053847259625_atnucleolar protein 3NOL3AI91235147365438_atKIAA1609 proteinKIAA1609AA195124


REFERENCES



  • Beer et al. (2002) “Gene-expression profiles predict survival of patients with lung adenocarcinoma” Nat Med 8:816-824

  • Brookes (1999) “The essence of SNPs” Gene 23:177-186

  • Kato et al. (2004) “A Randomized Trial of Adjuvant Chemotherapy with Uracil-Tegafur for Adenocarcinoma of the Lung” N Engl J Med 350:1713-1721

  • Kiernan et al. (1993) “Stage I non-small cell cancer of the lung results of surgical resection at Fairfax Hospital” Va Med Q 120:146-149

  • Kononen et al. (1998) “Tissue microarrays for high-throughput molecular profiling of tumor specimens” Nat Med 4:844-847

  • Mountain et al. (1987) “Lung cancer classification: the relationship of disease extent and cell type to survival in a clinical trials population” J Surg Oncol 35:147-156

  • Wingo et al. (1999) “Annual Report to the Nation on the Status of Cancer, 1973-1996, With a Special Section on Lung Cancer and Tobacco Smoking “J Natl Cancer Inst 91:675-690


Claims
  • 1. A method of assessing lung cancer status comprising the steps of a. obtaining a biological sample from a lung cancer patient; and b. measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 wherein the expression levels of the Marker genes above or below pre-determined cut-off levels are indicative of lung cancer status.
  • 2. A method of staging lung cancer patients comprising the steps of a. obtaining a biological sample from a lung cancer patient; and b. measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 wherein the expression levels of the Marker genes above or below pre-determined cut-off levels are indicative of the lung cancer stage.
  • 3. The method of claim 2 wherein the stage corresponds to classification by the TNM system.
  • 4. The method of claim 2 wherein the stage corresponds to patients with similar gene expression profiles.
  • 5. A method of determining lung cancer patient treatment protocol comprising the steps of a. obtaining a biological sample from a lung cancer patient; and b. measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 wherein the expression levels of the Marker genes above or below pre-determined cut-off levels are sufficiently indicative of risk of recurrence to enable a physician to determine the degree and type of therapy recommended to prevent recurrence.
  • 6. A method of treating a lung cancer patient comprising the steps of: a. obtaining a biological sample from a lung cancer patient; and b. measuring Biomarkers associated with Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7 wherein the expression levels of the Marker genes above or below pre-determined cut-off levels are indicate a high risk of recurrence and; c. treating the patient with adjuvant therapy if they are a high risk patient.
  • 7. A method of determining whether a lung cancer patient is high or low risk of mortality comprising the steps of a. obtaining a biological sample from a lung cancer patient; and b. measuring Biomarkers associated with Marker genes corresponding to those selected from Table 4 wherein the expression levels of the Marker genes above or below pre-determined cut-off levels are sufficiently indicative of risk of mortality to enable a physician to determine the degree and type of therapy recommended.
  • 8. The method of claim 1, 2, 5, 6 or 7 wherein the sample is prepared by a method are selected from the group consisting of bulk tissue preparation and laser capture microdissection.
  • 9. The method of claim 8 wherein the bulk tissue preparation is obtained from a biopsy or a surgical specimen.
  • 10. The method of claim 1, 2, 5, 6 or 7 further comprising measuring the expression level of at least one gene constitutively expressed in the sample.
  • 11. The method of claim 1, 2, 5, 6 or 7 wherein the sample is obtained from a primary tumor.
  • 12. The method of claim 1, 2, 5, 6 or 7 wherein the specificity is at least about 40%.
  • 13. The method of claim 1, 2, 5, 6 or 7 wherein the sensitivity is at least at least about 80%.
  • 14. The method of claim 1, 2, 5, 6 or 7 wherein the pre-determined cut-off levels are at least 1.5-fold over- or under-expression in the sample relative to benign cells or normal tissue.
  • 15. The method of claim 1, 2, 5, 6 or 7 wherein the pre-determined cut-off levels have at least a statistically significant p-value over-expression in the sample having metastatic cells relative to benign cells or normal tissue.
  • 16. The method of claim 28 wherein the p-value is less than 0.05.
  • 17. The method of claim 1, 2, 5, 6 or 7 wherein gene expression is measured on a microarray or gene chip.
  • 18. The method of claim 17 wherein the microarray is a cDNA array or an oligonucleotide array.
  • 19. The method of claim 17 wherein the microarray or gene chip further comprises one or more internal control reagents.
  • 18. The method of claim 1, 2, 5, 6 or 7 wherein gene expression is determined by nucleic acid amplification conducted by polymerase chain reaction (PCR) of RNA extracted from the sample.
  • 20. The method of claim 18 wherein said PCR is reverse transcription polymerase chain reaction (RT-PCR).
  • 21. The method of claim 20, wherein the RT-PCR further comprises one or more internal control reagents.
  • 22. The method of claim 1, 2, 5, 6 or 7 wherein gene expression is detected by measuring or detecting a protein encoded by the gene.
  • 23. The method of claim 22 wherein the protein is detected by an antibody specific to the protein.
  • 24. The method of claim 1, 2, 5, 6 or 7 wherein gene expression is detected by measuring a characteristic of the gene.
  • 25. The method of claim 24 wherein the characteristic measured is selected from the group consisting of DNA amplification, methylation, mutation and allelic variation.
  • 26. A method of generating a lung cancer prognostic patient report comprising the steps of: determining the results of any one of claims 1, 2, 5, 6 or 7; and preparing a report displaying the results.
  • 27. The method of claim 26 wherein the report contains an assessment of patient outcome and/or probability of risk relative to the patient population.
  • 28. A patient report generated by the method according to claim 26.
  • 29. A composition comprising at least one probe set selected from the group consisting of: Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.
  • 30. A kit for conducting an assay to determine lung cancer prognosis in a biological sample comprising: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table I, Table 4, Table 5 or Table 7.
  • 31. The kit of claim 30 further comprising reagents for conducting a microarray analysis.
  • 32. The kit of claim 30 further comprising a medium through which said nucleic acid sequences, their complements, or portions thereof are assayed.
  • 33. Articles for assessing lung cancer status comprising: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.
  • 34. The articles of claim 33 further comprising reagents for conducting a microarray analysis.
  • 35. The articles of claim 34 further comprising a medium through which said nucleic acid sequences, their complements, or portions thereof are assayed.
  • 36. A microarray or gene chip for performing the method of claim 1, 2, 5, 6 or 7.
  • 37. The microarray of claim 36 comprising isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.
  • 38. The microarray of claim 37 wherein the measurement or characterization is at least 1.5-fold over- or under-expression.
  • 39. The microarray of claim 37 wherein the measurement provides a statistically significant p-value over- or under-expression.
  • 40. The microarray of claim 39 wherein the p-value is less than 0.05.
  • 41. The microarray of claim 37 comprising a cDNA array or an oligonucleotide array.
  • 42. The microarray of claim 37 further comprising or more internal control reagents.
  • 43. A diagnostic/prognostic portfolio comprising isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of Marker genes corresponding to those selected from Table 1, Table 4, Table 5 or Table 7.
  • 44. The portfolio of claim 43 wherein the measurement or characterization is at least 1.5-fold over- or under-expression.
  • 45. The portfolio of claim 44 wherein the measurement provides a statistically significant p-value over- or under-expression.
  • 46. The portfolio of claim 44 wherein the p-value is less than 0.05.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

No government funds were used to make this invention.

Provisional Applications (2)
Number Date Country
60632053 Nov 2004 US
60655573 Feb 2005 US