Methods, systems, and compositions for classification, prognosis, and diagnosis of cancers

Abstract
The present invention provides methods, systems and compositions for predicting disease susceptibility in a patient. In some embodiments, methods for the classification, prognosis, and diagnosis of cancers are provided. In other embodiments, the present invention provides statistical methods for building a gene-expression-based classifier that may be employed for predicting disease susceptibility in a patient, for classifying carcinomas, and for the prognosis of clinical outcomes.
Description
FIELD OF THE INVENTION

The present invention relates generally to systems, compositions, and methods for predicting disease susceptibility in a patient.


BACKGROUND

Mutations in p53 are thought to occur in more than 50% of human cancers and are most frequently observed in the DNA binding and transactivation domains, underscoring the importance of its transcriptional activity in suppressing tumor development. In sporadic breast cancers, unlike most cancer types, p53 mutations are only observed in approximately 20% of cases. However, that breast cancer is frequently observed in individuals with germline mutations of p53 (i.e., Li-Fraumeni syndrome) suggests a particularly important role for p53 inactivation in breast carcinogenesis, and perhaps a similarly important role for other factors capable of compromising p53 function.


For example, the reduced transcriptional activation of p53 following hypermethylation and subsequent inhibition of the HOXA5 transcription factor has recently been implicated as a possible epigenetic mechanism in reducing p53 expression in breast cancers. In both breast tumors and other cancer types, amplification and overexpression of the MDM2 gene, whose product promotes p53 degradation, has been implicated in oncogenesis. Moreover, both deletion and epigenetic silencing of the p14ARF gene, a negative regulator of MDM2, has been observed in various cancer types. Thus, p53 deficiency in breast carcinogenesis can potentially arise from a number of mechanisms other than p53 gene mutation.


There is evidence that the p53 status has prognostic significance in a number of cancer types and in particular breast cancer. In breast cancer, p53 mutations confer worse overall and disease-free survival, and a higher incidence of tumor recurrence, independent of other risk factors. Recent evidence suggests that p53 inactivation renders breast tumors resistant to certain DNA-damaging chemotherapies and endocrine therapies presumably through loss of p53-dependent apoptosis.


However, in all of these studies, the prognostic capability and degree of therapeutic resistance of the p53 mutants was found to depend largely on mutant-specific attributes, such as the type of mutations or the precise domain in which the mutation occurs. Importantly, this latter observation is consistent with findings from previous studies showing that not all p53 mutations have equal effects: some simply confer loss of function, while others have a dominant negative effect (such as trans-dominant suppression of wildtype p53 or oncogenic gain of function), while still others show only a partial loss of function where, for example, only a small subset of p53 downstream transcriptional target genes are dysregulated. For these reasons, no single molecular assessment of p53 status appears to provide an absolute indication of the complete p53 function.


There is a need for methods that better assess the effects of different p53 mutations on cell function in general and gene expression in particular, in an effort to enable better cancer prognosis and diagnosis.


SUMMARY

Accordingly, the present invention provides methods, systems, and compositions that provide a more useful measure of in vivo p53 functionality. These methods, systems, and compositions may be employed for the classification, prognosis, and diagnosis of cancers.


In one aspect of the present invention there is provided a method for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


In another aspect of the present invention there is provided a method for predicting disease outcome in a late-stage breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and

    • using the set of genes to predict disease outcome in the late-stage breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: BG271923 (SEQ ID NO: 22), NM002466 (SEQ ID NO: 31), D38553 (SEQ ID NO: 11), NM000909 (SEQ ID NO: 9), NM024843 (SEQ ID NO: 1), R73030 (SEQ ID NO: 29), NM003226 (SEQ ID NO: 28), AW299538 (SEQ ID NO: 5) and AI990465 (SEQ ID NO: 25).


In yet another aspect of the present invention there is provided a method for predicting clinical outcome in an early-stage, locally-treated breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the early-stage, locally-treated breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: AI961235 (SEQ ID NO-23), BG271923 (SEQ ID NO: 22), NM002466 (SEQ ID NO: 31), BC001651 (SEQ ID NO: 14), D38553 (SEQ ID NO: 11), AK000345 (SEQ ID NO: 26), BC004504 (SEQ ID NO: 8), NM000909 (SEQ ID NO: 9), NM024843 (SEQ ID NO: 1), R73030 (SEQ ID NO: 29), AI435828 (SEQ ID NO: 20), AI810764 (SEQ ID NO: 24), AI922323 (SEQ ID NO: 10), NM003225 (SEQ ID NO: 32), NM003226 (SEQ ID NO: 28), AW299538 (SEQ ID NO: 5), NM003462 (SEQ ID NO: 16), AI990465 (SEQ ID NO: 25), NM004392 (SEQ ID NO: 15), NM001267 (SEQ ID NO: 7) and AI826437 (SEQ ID NO: 3).


In a further aspect of the present invention there is provided a method for predicting clinical outcome in a liver cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the liver cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: NM002466 (SEQ ID NO: 31), BC001651 (SEQ ID NO: 14), D38553 (SEQ ID NO: 11), NM024843 (SEQ ID NO: 1), AI435828 (SEQ ID NO: 20), AI810764 (SEQ ID NO: 24), NM003226 (SEQ ID NO: 28) and AW299538 (SEQ ID NO: 5).


In a still further aspect of the present invention there is provided a method of identifying a group of genes for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; ranking the differentially expressed genes according to their ability to predict p53 mutational status; training the ranked genes to distinguish between mutant and wildtype p53 gene expression profiles; obtaining a p53 classifier including a set of genes capable of predicting p53 mutational status; validating the p53 classifier in independent datasets; and assessing the ability of the p53 classifier to predict disease outcome in the patient.


In another aspect of the present invention there is provided a computer system for predicting disease outcome in a patient, the computer system comprising: a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


In yet another aspect of the present invention there is provided a diagnostic tool for predicting disease susceptibility in a patient comprising a plurality of genes capable of predicting p53 mutational status immobilized on a solid support.


In a still further aspect of the present invention there is provided a nucleic acid array for predicting disease susceptibility in a patient comprising a solid support and displayed thereon nucleic acid probes corresponding to genes capable of predicting p53 mutational status in the patient.


These aspects and embodiments are described in greater detail below.


Definitions


As used in this application, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “an agent” includes a plurality of agents, including mixtures thereof.


An individual is not limited to a human being but may also be other organisms including but not limited to a mammal, invertebrate, plant, fungus, virus, bacteria, or one or more cells derived from any of the above.


As used herein the term “comprising” means “including”. Variations of the word “comprising”, such as “comprise” and “comprises”, have correspondingly varied meanings.


Throughout this disclosure, various aspects of this invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.


As used herein, the term “histologic grade” or “tumor grade” refers to characteristics of tumors classified according to the Elston-Ellis system of grading tumors.


As used herein, “p53 status” refers to the mutational status of the p53 gene. A p53 mutant tumor contains a mutation in the p53 gene that alters the function of the protein. A p53 wildtype tumor contains no detectable mutation in the p53 gene.


As used herein “Disease-specific survival” or DSS is a survival assessment where the end point being examined is death because of a disease, for example, breast cancer.


As used herein, “Disease-free survival” or DFS is a survival assessment where the end points are either tumor recurrence (i.e., the cancer comes back as the consequence of distant metastasis to other sites in the body) or death because of breast cancer without evidence of distant metastasis.


As used herein, an “array” is an intentionally created collection of molecules which can be prepared either synthetically or biosynthetically. The molecules in the array can be identical or different from each other. The array can assume a variety of formats, e.g., libraries of soluble molecules; libraries of compounds tethered to resin beads, silica chips, or other solid supports.


As used herein, a “nucleic acid library or array” is an intentionally created collection of nucleic acids which can be prepared either synthetically or biosynthetically in a variety of different formats (e.g., libraries of soluble molecules; and libraries of oligonucleotides tethered to resin beads, silica chips, or other solid supports). Additionally, the term “array” is meant to include those libraries of nucleic acids which can be prepared by spotting nucleic acids of essentially any length (e.g., from 1 to about 1000 nucleotide monomers in length) onto a substrate. The term “nucleic acid” as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides, deoxyribonucleotides or peptide nucleic acids (PNAs), that comprise purine and pyrimidine bases, or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The backbone of the polynucleotide can comprise sugars and phosphate groups, as may typically be found in RNA or DNA, or modified or substituted sugar or phosphate groups. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. The sequence of nucleotides may be interrupted by non-nucleotide components. Thus the terms nucleoside, nucleotide, deoxynucleoside and deoxynucleotide generally include analogs such as those described herein. These analogs are those molecules having some structural features in common with a naturally occurring nucleoside or nucleotide such that when incorporated into a nucleic acid or oligonucleotide sequence, they allow hybridization with a naturally occurring nucleic acid sequence in solution. Typically, these analogs are derived from naturally occurring nucleosides and nucleotides by replacing and/or modifying the base, the ribose or the phosphodiester moiety. The changes can be tailor made to stabilize or destabilize hybrid formation or enhance the specificity of hybridization with a complementary nucleic acid sequence as desired.


As used herein, the term “complementary” refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified. Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two single stranded RNA or DNA molecules are said to be complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 80% of the nucleotides of the other strand, usually at least about 90% to 95%, and more preferably from about 98 to 100% of the nucleotides of the other strand. Alternatively, complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement. Typically, selective hybridization will occur when there is at least about 65% complementarity over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, and more preferably at least about 90% complementarity.


As used herein, a “fragment,” “segment,” or “DNA segment” refers to a portion of a larger DNA polynucleotide or DNA. A polynucleotide, for example, can be broken up, or fragmented into, a plurality of segments. Various methods of fragmenting nucleic acids are well known in the art. These methods may be, for example, either chemical or physical in nature. Chemical fragmentation may include partial degradation with a DNase; partial depurination with acid; the use of restriction enzymes; intron-encoded endonucleases; DNA-based cleavage methods, such as triplex and hybrid formation methods, that rely on the specific hybridization of a nucleic acid segment to localize a cleavage agent to a specific location in the nucleic acid molecule; or other enzymes or compounds which cleave DNA at known or unknown locations. Physical fragmentation methods may involve subjecting the DNA to a high shear rate. High shear rates may be produced, for example, by moving DNA through a chamber or channel with pits or spikes, or forcing the DNA sample through a restricted size flow passage, e.g., an aperture having a cross sectional dimension in the micron or submicron scale. Other physical methods include sonication and nebulization. Combinations of physical and chemical fragmentation methods may likewise be employed such as fragmentation by heat and ion-mediated hydrolysis. See for example, Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 3rd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001) (“Sambrook et al.) which is incorporated herein by reference for all purposes. These methods can be optimized to digest a nucleic acid into fragments of a selected size range. Useful size ranges may be from 100, 200, 400, 700 or 1000 to 500, 800, 1500, 2000, 4000 or 10,000 base pairs. However, larger size ranges such as 4000, 10,000 or 20,000 to 10,000, 20,000 or 500,000 base pairs may also be useful.


As used herein, the term “hybridization” refers to the process in which two single-stranded polynucleotides bind non-covalently to form a stable double-stranded polynucleotide. The term “hybridization” may also refer to triple-stranded hybridization. The resulting (usually) double-stranded polynucleotide is a “hybrid.” The proportion of the population of polynucleotides that forms stable hybrids is referred to herein as the “degree of hybridization”. Hybridization conditions will typically include salt concentrations of less than about 1M, more usually less than about 500 mM and less than about 200 mM. Hybridization temperatures can be as low as 5° C., but are typically greater than 22° C., more typically greater than about 30° C., and preferably in excess of about 37° C. Hybridizations are usually performed under stringent conditions, i.e. conditions under which a probe will hybridize to its target subsequence. Stringent conditions are sequence-dependent and are different in different circumstances. Longer fragments may require higher hybridization temperatures for specific hybridization. As other factors may affect the stringency of hybridization, including base composition and length of the complementary strands, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH and nucleic acid composition) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium.


Typically, stringent conditions include salt concentration of at least 0.01 M to no more than 1 M Na ion concentration (or other salts) at a pH 7.0 to 8.3 and a temperature of at least 25° C. For example, conditions of 5× SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C. are suitable for allele-specific probe hybridizations. For stringent conditions, see for example, Sambrook, Fritsche and Maniatis. “Molecular Cloning A Laboratory Manual” 2nd Ed. Cold Spring Harbor Press (1989) and Anderson “Nucleic Acid Hybridization” 1st Ed., BIOS Scientific Publishers Limited (1999), which are hereby incorporated by reference in their entireties for all purposes above.


As used herein, “hybridization probes” are nucleic acids (such as oligonucleotides) capable of binding in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Nielsen et al., Science 254:1497-1500 (1991), Nielsen Curr. Opin. Biotechnol., 10:71-75 (1999) and other nucleic acid analogs and nucleic acid mimetics.


As used herein, “mRNA” or “mRNA transcripts” include, but are not limited to pre-mRNA transcript(s), transcript processing intermediates, mature mRNA(s) ready for translation and transcripts of the gene or genes, or nucleic acids derived from the mRNA transcript(s). Transcript processing may include splicing, editing and degradation. As used herein, a nucleic acid derived from an mRNA transcript refers to a nucleic acid for whose synthesis the mRNA transcript or a subsequence thereof has ultimately served as a template. Thus, a cDNA reverse transcribed from an mRNA, a cRNA transcribed from that cDNA, a DNA amplified from the cDNA, an RNA transcribed from the amplified DNA, etc., are all derived from the mRNA transcript and detection of such derived products is indicative of the presence and/or abundance of the original transcript in a sample. Thus, mRNA derived samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified from the genes, RNA transcribed from amplified DNA, and the like.


As used herein, a “probe” is a molecule that can be recognized by a particular target. In some embodiments, a probe can be surface immobilized. Examples of probes that can be investigated by this invention include, but are not restricted to, agonists and antagonists for cell membrane receptors, toxins and venoms, viral epitopes, hormones (e.g. opioid peptides, steroids, etc.), hormone receptors, peptides, enzymes, enzyme substrates, cofactors, drugs, lectins, sugars, oligonucleotides, nucleic acids, oligosaccharides, proteins, and monoclonal antibodies.


As used herein, a “target” is a molecule that has an affinity for a given probe. Targets may be naturally-occurring or man-made molecules. Also, they can be employed in their unaltered state or as aggregates with other species. Targets may be attached, covalently or noncovalently, to a binding member, either directly or via a specific binding substance. Examples of targets which can be employed by this invention include, but are not restricted to, antibodies, cell membrane receptors, monoclonal antibodies and antisera reactive with specific antigenic determinants (such as on viruses, cells or other materials), drugs, oligonucleotides, nucleic acids, peptides, cofactors, lectins, sugars, polysaccharides, cells, cellular membranes, and organelles. Targets are sometimes referred to in the art as anti-probes. As the term targets is used herein, no difference in meaning is intended. A “Probe Target Pair” is formed when two macromolecules have combined through molecular recognition to form a complex.




BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copes of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.


The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the disclosed principles of the invention:



FIG. 1 shows hierarchical clustering of 257 tumors using the top 250 genes statistically correlated with p53 status for use in one disclosed embodiment of the invention.



FIG. 2 shows optimization and results of a gene classifier for p53 status in accordance with a disclosed embodiment of the invention.



FIG. 3 shows that genes of the classifier can predict p53 status in independent cDNA microarray datasets in accordance with a disclosed embodiment of the invention.



FIG. 4 shows that the p53 classifier has greater prognostic significance than p53 mutation status alone in accordance with a disclosed embodiment of the invention.



FIG. 5 shows that the p53 classifier has strong prognostic significance in an independent dataset of late-stage tumors in accordance with a disclosed embodiment of the invention.



FIG. 6 shows that the p53 classifier has greater prognostic significance than p53 mutation status in endocrine-treated patients in accordance with a disclosed embodiment of the invention.



FIG. 7 shows that the p53 classifier is prognostic of distant recurrence in an independent set of early-stage locally-treated breast tumors in accordance with a disclosed embodiment of the invention.



FIG. 8 shows that transcript levels of p53, its transcriptional targets, and its upstream effectors distinguish known and predicted classes in accordance with a disclosed embodiment of the invention.




DETAILED DESCRIPTION

The present invention has many preferred embodiments and relies on many patents, applications and other references for details known to those of the art. Therefore, when a patent, application, or other reference is cited or repeated below, it should be understood that it is incorporated by reference in its entirety for all purposes as well as for the proposition that is recited.


Embodiments of the disclosed methods, systems, and compositions for classification, prognosis, and diagnosis of cancers will now be described. These methods, systems, and compositions provide a more useful measure of in vivo p53 functionality and thereby provide a better prognostic indicator of patient outcome as compared to p53 mutation status alone. Other advantages inherent in the disclosed embodiments of the methods, systems, and compositions will be apparent from the following description.


p53 mutations in cancer development and progression can result in trans-dominant suppression of the wild-type p53 allele conferring loss of p53 activity or an oncogenic gain of function independent of wildtype p53. Additionally, the altered activity of some effectors of p53 function, including those that directly influence p53 expression, may contribute to p53 deficiency recapitulating the p53-mutant phenotype. In breast cancer, these effects manifest in more aggressive tumors, therapeutic resistance, and poor clinical outcome.


In accordance with providing a more useful measure of in vivo p53 functionality, disclosed herein is a “p53 classifier”, an expression signature deduced from differences in the molecular configurations of p53 wildtype and mutant tumors. The classifier may comprise a defined number of genes, for example, at least 3 genes. In other embodiments, the classifier may comprise from about 3 genes to about 500 genes. Table 1 provides a listing of the 500 genes. In some embodiments, an optimized p53 classifier comprises 32 genes (Table 2). The optimized 32-gene classifier could distinguish p53 mutant and wildtype tumors with significant accuracy and could predict recurrence and survival in populations representing all therapeutic groups. Moreover, the p53 classifier was a more significant predictor of survival than p53 mutation status alone and remained significant by multivariate analysis independent of other clinical predictors where p53 mutation status did not. Furthermore, downregulation of p53 expression in the absence of mutations was sufficient to induce a mutant (mt) phenotype tumor behaviour in both transcriptional activity and clinical outcome.


In independent datasets of both breast and liver cancers, and regardless of other clinical features, subsets of the optimized p53 classifier could predict p53 status with significant accuracy. As a predictor of disease-specific survival (DSS), the classifier significantly outperformed p53 mutational status alone in both a large patient cohort with heterogeneous treatment, as well as in a set of patients who received postoperative adjuvant endocrine therapy alone.


Moreover, in an independent cDNA microarray study comprised mostly of stage 3 patients who received chemotherapy in the neoadjuvant setting, a 9-gene subset of the p53 classifier was a highly significant predictor of both disease-specific and disease-free survival. The genes of the p53 classifier could accurately discern not only which patients would relapse and die following chemotherapy, but also which late stage patients would survive their cancer.


A 21-gene subset of the classifier could also significantly distinguish molecular subgroups of early-stage radiation-treated patients who would go on to develop a distant metastasis within 5 years from those who would not.


Therefore, by defining among other aspects, a p53 classifier described herein, the methods, systems and compositions of the present invention demonstrate a much greater impact of p53 on human tumor behaviour than previously appreciated and thereby provide a better approach for clinically assessing p53 function.


One aspect of the present invention provides a method for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient. The disease outcome may be selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response. The disease may be any cancer but is preferably breast cancer or liver cancer.


The predicted p53 mutational status may be obtained by ranking the differentially expressed genes according to their association with p53 mutational status, ER (estrogen receptor) status and histologic grade of the tumor. A multivariate ranking procedure such as a Linear Model Fit may be employed to rank the genes. The ranked genes may be subjected to supervised learning to enable them to distinguish between mutant and wildtype gene expression profiles. An example of a supervised learning method that may be employed is Diagonal Linear Discriminant Analysis (DLDA).


In some embodiments, the set of genes with the ability to predict p53 mutational status may comprise at least 3 genes, preferably about 3-500 genes and most preferably about 32 genes. The 32 genes making up the optimized p53 classifier may be selected from the group comprising the list of genes in Table 1. In some embodiments, the 32 genes may include GenBank accession numbers: AI961235 (SEQ ID NO: 23), BG271923 (SEQ ID NO: 22), NM002466 (SEQ ID NO: 31), BC001651 (SEQ ID NO: 14), D38553 (SEQ ID NO: 11), AK000345 (SEQ ID NO: 26), AA742697 (SEQ ID NO: 21), AL080170 (SEQ ID NO: 30), BF245284 (SEQ ID NO: 27), BC004504 (SEQ ID NO: 8), H15261 (SEQ ID NO: 2), NM000909 (SEQ ID NO: 9), NM024843 (SEQ ID NO: 1), R73030 (SEQ ID NO: 29), NM030896 (SEQ ID NO: 17), AI435828 (SEQ ID NO: 20), AL512727 (SEQ ID NO: 6), AW242997 (SEQ ID NO: 18), AI810764 (SEQ ID NO: 24), AI922323 (SEQ ID NO: 10), AL360204 (SEQ ID NO: 13), NM003225 (SEQ ID NO: 32), NM003226 (SEQ ID NO: 28), AW299538 (SEQ ID NO: 5), NM003462 (SEQ ID NO: 16), AI990465 (SEQ ID NO: 25), NM004392 (SEQ ID NO: 15), NM001267 (SEQ ID NO: 7), AF269087 (SEQ ID NO: 4), AI826437 (SEQ ID NO: 3), AL355392 (SEQ ID NO: 12), and AU156421 (SEQ ID NO: 19).


The present invention also provides a method for predicting disease outcome in a late-stage breast cancer patient, the method comprising the steps of obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the late-stage breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: BG271923, NM002466, D38553, NM000909, NM024843, R73030, NM003226, AW299538 and AI990465. All GenBank accession numbers are associated with a sequence and a SEQ ID NO. as shown in FIGS. 9-508.


The present invention also provides a method for predicting clinical outcome in an early-stage, locally-treated breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the early-stage, locally-treated breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, BC004504, NM000909, NM024843, R73030, AI435828, AI810764, AI922323, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267 and AI826437.


The present invention also provides a method for predicting clinical outcome in a liver cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the liver cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: NM002466, BC001651, D38553, NM024843, AI435828, AI810764, NM003226 and AW299538.


The present invention also provides a method of identifying a group of genes for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; ranking the differentially expressed genes according to their ability to predict p53 mutational status; training the ranked genes to distinguish between mutant and wildtype p53 gene expression profiles; obtaining a p53 classifier including a set of genes capable of predicting p53 mutational status; validating the p53 classifier in independent datasets; and assessing the ability of the p53 classifier to predict disease outcome in the patient.


In the above-disclosed method of identifying a group of genes for predicting disease outcome in a patient, the differentially expressed genes may be ranked by a multivariate ranking procedure according to their association with p53 status, ER (estrogen receptor) status and histologic grade of the tumor. The multivariate ranking procedure may be a Linear Model-Fit method or any other method known to one of skill in the art. The step of training may comprise employing a supervised learning method, such as Diagonal Linear Discriminant Analysis (DLDA) or any other supervised learning method known to one of skill in the art.


The p53 classifier disclosed above may comprise at least 3 genes, preferably between about 3-500 genes and more preferably about 32 genes. This 32-gene p53 classifier is an “optimized classifier” which may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, AA742697, AL080170, BF245284, BC004504, H15261, NM000909, NM024843, R73030, NM030896, AI435828, AL512727, AW242997, AI810764, AI922323, AL360204, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267, AF269087, AI826437, AL355392 and AU156421.


The disease outcome may be selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response. In one disclosed embodiment, a 9-gene partial classifier may predict clinical outcome in a late-stage breast cancer patient. The 9-gene partial classifier may include genes selected from the group consisting of GenBank accession numbers: BG271923, NM002466, D38553, NM000909, NM024843, R73030, NM003226, AW299538 and AI990465.


In another disclosed embodiment, a 21-gene partial classifier may predict clinical outcome in an early-stage, locally-treated breast cancer patient. The 21-gene partial classifier may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, BC004504, NM000909, NM024843, R73030, AI435828, AI810764, AI922323, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267 and AI826437.


In yet another disclosed embodiment, a 8-gene partial classifier may predict clinical outcome in a liver cancer patient. The 8-gene partial classifier may include genes selected from the group consisting of GenBank accession numbers: NM002466, BC001651, D38553, NM024843, AI435828, AI810764, NM003226 and AW299538.


The present invention also provides a computer system for predicting disease outcome in a patient, the computer system comprising: a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


The present invention also provides a diagnostic tool for predicting disease susceptibility in a patient comprising a plurality of genes capable of predicting p53 mutational status immobilized on a solid support. The solid support may be a microarray, for example. In one embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, AA742697, AL080170, BF245284, BC004504, H15261, NM000909, NM024843, R73030, NM030896, AI435828, AL512727, AW242997, AI810764, AI922323, AL360204, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267, AF269087, AI826437, AL355392 and AU156421. In another embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: BG271923, NM002466, D38553, NM000909, NM024843, R73030, NM003226, AW299538 and AI990465. In yet another embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, BC004504, NM000909, NM024843, R73030, AI435828, AI810764, AI1922323, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267 and AI826437. In a still further embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: NM002466, BC001651, D38553, NM024843, AI435828, AI810764, NM003226 and AW299538.


The present invention also provides a nucleic acid array for predicting disease susceptibility in a patient comprising a solid support and displayed thereon nucleic acid probes corresponding to genes capable of predicting p53 mutational status in the patient. The nucleic acid array may comprise at least 8, 32, 100, 250 or 500 nucleic acid probes.


Thus, the disclosed methods, systems and compositions are capable of discerning p53-deficient from p53-enabled breast tumors and may be effective in gauging p53 activity in other cancer types. As much as 14% of breast tumors that are otherwise p53 wildtype at the DNA sequence level may be deficient for p53 by other means. Moreover, the classifier is a significant predictor of disease-specific survival and recurrence in various breast cancer populations and therefore will have clinical utility in predicting these endpoints, particularly in the context of therapeutic agents that function predominantly through p53-dependent cell death pathways.


EXAMPLES
Example 1
The Molecular Configurations of p53 Mutant and p53 Wildtype Tumors are Distinct

To gain insight into the molecular variation between p53 mutant (mt) and p53 wildtype (wt) breast tumors, high-density oligonucleotide microarrays were utilized to analyze a population-based series of 257 biopsies, all of which were previously sequenced for mutations in the p53 coding regions (Bergh, J., Norberg, T., Sjogren, S., Lindgren, A. & Holmberg, L. Complete sequencing of the p53 gene provides prognostic information in breast cancer patients, particularly in relation to adjuvant systemic therapy and radiotherapy. Nat Med 1, 1029-34 (1995), incorporated herein by reference).


The original patient material consisted of freshly frozen breast tumors from a population-based cohort of 315 women representing 65% of all breast cancers resected in Uppsala County during the time period Jan. 1, 1987 to Dec. 31, 1989 (Bergh et al., previously incorporated by reference). After surgery, the viable part of the fresh tumor was cut in two; one part was immediately frozen in isopentane and stored at −70° C. until analysis, and the other was fixed in 10% formalin and prepared for histopathologic examination. Frozen tumor tissue was available from 299 of the original 315 patients. Out of these, 270 had RNA of sufficient quantity and quality for microarray experiments, and after Affymetrix quality control, expression profiles of 260 tumors were further analysed. The present study was approved by the ethical committee at the Karolinska Institute.


Mutational analysis of the p53 gene (TP53) was carried out in the original 315 tumors as described previously in Bergh et al. (previously incorporated by reference). Among the 260 tumors included in the present study, 59 had p53 mutations found by cDNA sequence analysis of exons 2 to 11 (Bergh et al., previously incorporated by reference). In three samples p53 status could not be evaluated. Clinico-pathological characteristics were derived from the patient records and from routine clinical measurements at the time of diagnosis. Estrogen receptor status was determined by ligand binding assay as part of the routine clinical procedure. An experienced pathologist determined the Elston-Ellis grades of the tumors, classifying the tumors into low, medium and high-grade tumors (Elston, C. W. & Ellis, I. O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19, 403-10 (1991), incorporated herein by reference). Axillary lymph node metastases were found in 84 of these 260 patients while 166 were node-negative. Ten patients had unknown node status, as no axillary examination was performed due to advanced age or concomitant serious disease. Systemic adjuvant therapy was offered to all node-positive patients. In general, premenopausal women were offered chemotherapy and postmenopausal women received endocrine treatment. Out of the 260 patients included in the present study, 149 did not receive adjuvant therapy. Overall survival of the patients was based on information from the Swedish population registry, and date and cause of death were obtained from a review of the patient records in late 1999.


RNA from 59 tumors known to contain p53 mutations resulting in amino acid-level alterations, and from 198 tumors known to have wildtype p53 were analyzed on Affymetrix U133A and U133B arrays.


Extraction of total RNA was carried out using the Qiagen RNeasy Mini Kit (Qiagen, Germany). Frozen tumors were cut into small pieces and homogenized for around 30-40 seconds in test tubes (maximum 40 mg/tube) containing RLT buffer (RNeasy lysis buffer) with mercaptoethanol. The mixtures were then treated with Proteinase K for 10 minutes at 55° C., which in previous RNA extractions demonstrated improved RNA yield (Egyhazi, S. et al. Proteinase K added to the extraction procedure markedly increases RNA yield from primary breast tumors for use in microarray studies. Clin Chem 50, 975-6 (2004), incorporated herein by reference). In the following centrifugation steps on RNeasy columns, DNase treatment was also included to increase the RNA quality. The integrity of the RNA extracts was tested on an Agilent 2100 Bioanalyzer (Agilent Technologies, Rockville, Md., U.S.A), measuring the 28S:18S ribosomal RNA ratio. RNA extracts of high quality were stored at −70° C. until microarray analysis.


Preparation of in vitro transcription (IVT) products (i.e., target) and oligonucleotide array hybridization and scanning were performed according to the Affymetrix protocol (Affymetrix Inc., Santa Clara, Calif., U.S.A). First-strand cDNA was synthesized from a starting amount of 2-5 μg total RNA using a T7-linked oligo-dT primer, followed by second-strand synthesis. Double-stranded cDNA was purified using phenol/chloroform extraction and phase lock gel. Biotinylated cRNA targets were prepared from the cDNA templates in IVT reactions. The labeled cRNA targets were purified using Qiagen RNeasy Mini Kit and subsequently chemically fragmented. Ten μg of the fragmented, biotinylated cRNA was hybridized to the Affymetrix oligonucleotide human array set, HG-U133A&B, which contains 45,000 probe sets representing more than 39,000 transcripts derived from approximately 33,000 well-substantiated human genes. Hybridization was carried out in a hybridization oven at 45° C. and rotation was set at 60 rpm for 16 h. The arrays were washed and stained in the Fluidics Station 400 (Affymetrix Inc., Santa Clara, Calif., U.S.A) in accordance with the Affymetrix protocol. Staining was carried out using streptavidin-phycoerythrin (SAPE, final concentration of 10 μg/ml) and signal amplification with a biotinylated anti-streptavidin antibody and a second SAPE staining. The arrays were washed and scanned according to the manufacturer's instructions.


The raw expression data was processed using Microarray Suite 5.0 software (Affymetrix Inc., Santa Clara, Calif., U.S.A) and normalized using the global mean method. For each microarray, probeset signal values were scaled by adjusting the mean log intensity to a target signal value of 500. Samples with suboptimal average signal intensities were re-labeled and re-hybridized on new arrays. If microarray artifacts were visible, the samples were re-hybridized on new chips using the same fragmented probe, or alternatively, if the defective areas were small, the affected probes were censored from further analysis. The normalized expression data from both U133A and B chips were combined and natural log transformed.


The extent to which gene expression patterns could distinguish p53 mt and wt tumors was first investigated. By Wilcoxon rank-sum test 3,330 Affymetrix probe-sets representing ˜2,770 distinct genes (according to UniGene build #167) were identified whose expression patterns distinguished p53 mt and wt tumors with a false discovery rate (FDR)-adjusted p value of p<0.001. A number of these genes were found to be known transcriptional targets of p53 including PERP, RRM2, SEMA3B, TAP1, GTSE1, CHECK1, and CHEK2. Shown in FIG. 1 is the result of hierarchical cluster analysis using the top 250 genes, all of which are associated with p53 status with FDR p<5.9×10−8. As expected from the gene selection criteria, the majority of p53 mt and wt tumors clustered into separate tumor groups. Of two predominant cluster nodes, 90% of the p53 mutants were found in one cluster (i.e., the “mutant-like” cluster), while 77% of p53 wt tumors segregated with the other (the “wildtype-like” cluster).


The hierarchical structure of the gene expression profiles was next investigated. As in the tumors, two predominant clusters were observed: one consisting of ˜200 genes more highly expressed in the mutant-like tumor cluster, and the other representing ˜50 genes more highly expressed in the wildtype-like cluster. Within the former, the genes most highly correlated with p53 mutant status were associated with cell cycle progression including, CDC2, CDC20, CCNB1, CCNB2, CKS2, CDCA1, CDCA3, CDCA8, CENPA, TOP2A, PTTG1 and MCM6. This finding is consistent with the observation that wt p53 has a negative regulatory effect on cell cycle genes. Of the genes more highly expressed in the wildtype-like cluster, the presence of several estrogen-regulated and ER status-associated genes including STC2, NCOR1, and ADRA2A was observed.


Further examination of the tumors revealed that in addition to p53 status, the predominant tumor clusters were also correlated with other clinical features, namely estrogen receptor (ER) status and tumor grade. The estrogen receptor status of a cell has been found to be correlated with cancer in several instances. Normal breast cells usually have receptors for estrogen. However, cancer cells arising in the breast do not always have receptors for estrogen. Breast cancers that have estrogen receptors are said to be “estrogen receptor-positive,” while those breast cancers that do not possess estrogen receptors are “estrogen receptor-negative.” In estrogen receptor-positive cancers, cancer cell growth is under the control of estrogen. In contrast, the growth of estrogen receptor-negative cancer cells is not governed by estrogen.



FIG. 1 shows hierarchical clustering of 257 tumors using the top 250 genes statistically correlated with p53 status. Tumors are represented in columns, genes are represented in rows. The degree of color saturation reflects the magnitude of the log expression signal; red hues denote higher expression levels while green hues indicate lower expression levels. The top row of black vertical bars indicates which breast tumors possess p53 mutations. The second row of bars indicates tumors that are ER positive. The third row of bars reflects histologic grade (Elston-Ellis grading system); green bars=grade I, blue bars=grade II, and red bars=grade III.


Segregating with the mutant-like cluster were observed 86% of estrogen receptor-negative (ER−) tumors (pcs=1 7×10−10), 96% of grade III tumors (pcs=2.5×10−19) and only 3% of grade I tumors (pfe=6.9×10−15). This result owes, in part, to the fact that the p53 mutants in this study are positively correlated with ER negativity (pcs=1.7×10−6) and grade III status (pcs=1.2×10−11), and is consistent with previous reports demonstrating that p53 mutant breast cancers are significantly correlated with negative ER status and higher tumor grade. See for example, Cattoretti, G., Rilke, F., Andreola, S., D'Amato, L. & Delia, D. P53 expression in breast cancer. Int J Cancer 41, 178-83 (1988); Isola, J., Visakorpi, T., Holli, K. & Kallioniemi, O. P. Association of overexpression of tumor suppressor protein p53 with rapid cell proliferation and poor prognosis in node-negative breast cancer patients. J Natl Cancer Inst 84, 1109-14 (1992); Andersen, T. I. et al. Prognostic significance of TP53 alterations in breast carcinoma. Br J Cancer 68, 540-8 (1993) and Bhargava, V. et al. The association of p53 immunopositivity with tumor proliferation and other prognostic indicators in breast cancer. Mod Pathol 7, 361-8 (1994), all of which are incorporated herein by reference.


However, it was also observed that among the p53 wt tumors within the mutant-like cluster, there, too, was a significant over-representation of ER-(pcs=2.0×10−6) and grade III tumors (pfe=7.1×10−11). Thus, by univariate statistical analysis, a large number of genes highly associated with p53 status have been identified that are capable of segregating tumors in a manner correlated with p53 status, but also histologic grade and ER status.


Example 2
A Gene Expression Classifier for Predicting p53 Deficiency

The finding that a fraction of p53 wt tumors were found to cluster together with the majority of p53 mutants suggests the possibility that these tumors may in fact be p53 deficient through mechanisms other than p53 mutation. Conversely, the discovery of p53 mutants with molecular configurations reminiscent of most wt tumors suggests that these tumors might in fact express functionally intact p53. However, the tumor group assignments in this case were based on genes selected by a univariate ranking procedure that did not account for the association of p53 status with ER and grade status. This raised the possibility that, to some extent, the selected genes included those that are mostly grade and/or ER associated, which may have biased the clustering of the tumors towards these properties rather than p53 status, per se.


Therefore, a robust gene expression-based classifier for predicting p53 status was developed by designing a predictive model including a multivariate linear regression method known as linear model-fit (LMF) for ranking p53 status-correlated genes independent of histologic grade and ER status.



FIG. 2 shows optimization and results of a gene-based classifier for p53 status. Diagonal Linear Discriminant Analysis (DLDA) was employed for the supervised learning of p53 status using gene expression profiles ranked by the Linear Model-Fit method. (A): Analysis of overlap between grade/estrogen receptor (ER)-correlated genes and p53-correlated genes ranked by Wilcoxon rank-sum test or Linear Model fit. The heat maps indicate the number of genes correlated with tumor grade (upper heat map) or ER status (lower heat map) in 100-gene bins (rows) and also correlated with p53 status (columns; ranked in 50-gene bins); p53 correlated genes were ranked by LMF=Linear Model-Fit or WR=Wilcoxon rank-sum; grade correlated genes were ranked by KW=Kruskal-Wallis, and ER correlated genes by WR. (B): The accuracy of the classifier is plotted as a function of the number of genes used to build the classifier; the optimal classifier consisted of 32 genes and misclassified a total of 40 tumors. (C): The results of the classifier applied to the Uppsala dataset (257 tumors) using leave-one-out cross validation. Unigene symbols (build #167), Genbank accession numbers, and Affymetrix probe IDs (A.=U133A; B.=U133B) are shown.


For gene selection, a linear model was fitted to the gene expression data with expression level as the response, and p53 status, ER status and grade status as the predictor variables. As an initial filter for removing genes not well correlated with the predictor variables, all genes with a p-value fit greater than 0.001 were excluded. Using ER and grade as additional predictors allowed for filtering out genes whose expression patterns could be mostly explained by either ER or grade status. When applied, the LMF ranking procedure markedly reduced the rank of many known cell cycle-regulated genes compared to the univariate Wilcoxon rank-sum (WR) method, indicating that these genes are best explained by high grade rather than p53 status (FIG. 2A, upper panel). Conversely, it was observed that ER-associated genes moved up in the top ranked p53-associated genes by LMF, presumably because their lower ranking by WR resulted from a large number of more highly ranked grade-associated genes (FIG. 2A, lower panel).


For class prediction purposes, the genes were ranked in decreasing order of the absolute value of the p53 status coefficient. For building the classifier, a variant of the maximum likelihood method, DLDA (diagonal linear discriminant analysis) was employed. This had previously been applied to class determination problems using microarray data, described for example, in Dudoit, S., Frilyand, J. & Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77-87 (2002), incorporated herein by reference. The set of predictor genes with greatest classification accuracy was chosen by leave-one-out cross validation.


The accuracy of the classifier as a function of the number of genes it comprised is plotted in FIG. 2B. Of particular note was the observation that the accuracy of the tumor classification was highly stable, varying by only 2.7% (i.e., 7 tumors) regardless of whether the classifier comprised 7 genes or 500 genes. Genes in the 500-gene classifier are shown in Table 1 below. The optimal classifier, however, was achieved at 32 genes (Table 2), whereby 40 tumors (15.6%) were misclassified. 28 of the wt tumors (14%) were classified as mutant-like, while 12 mutants (20%) were misclassified as wildtype-like (FIG. 2C).

TABLE 1GenbankUniGeneRankAffymetrix(decimalsCluster IDUniGeneOrderProbeset IDremoved)(build #173)UniGene NameSymbol1A.217889_sNM_024843Hs.31297cytochrome b reductase 1CYBRD1at2B.243929_atH15261Hs.21948Transcribed sequences3B.229975_atAI826437Hs.283417Transcribed sequences4B.223864_atAF269087Hs.326736ankyrin repeat domain 30AANKRD30A5B.227081_atAW299538Hs.75528nucleolar GTPaseHUMAUANTIG6A.215014_atAL512727Hs.232127MRNA; cDNA DKFZp547P042(from clone DKFZp547P042)7A.206869_atNM_001267Hs.97220ChondroadherinCHAD8A.221585_atBC004504Hs.331904calcium channel, voltage-CACNG4dependent,gamma subunit 49A.205440_sNM_000909Hs.519057neuropeptide Y receptor Y1NPY1Rat10B.228969_atAI922323Hs.226391anterior gradient 2 homologAGR2(Xenopus laevis)11A.212949_atD38553Hs.308045barren homolog (Drosophila)BRRN112B.226067_atAL355392Data notfound13B.232855_atAL360204Hs.283853MRNA full length insert cDNAcloneEUROIMAGE 98054714A.221520_sBC001651Hs.48855Cell division cycle associated 8CDCA8at15A.205472_sNM_004392Hs.63931Dachshund homolog 1DACH1at(Drosophila)16A.205186_atNM_003462Hs.406050Dynein, axonemal, lightDNALI1intermediatePolypeptide 117A.221275_sNM_030896Data notatfound18B.229030_atAW242997Data notfound19B.233413_atAU156421Hs.518736CDNA FLJ13457 fis, clonePLACE100334320A.203438_atAI435828Hs.155223stanniocalcin 2STC221B.230378_atAA742697Hs.62492secretoglobin, family 3A,SCGB3A1member 122B.238581_atBG271923Hs.237809guanylate binding protein 5GBP523B.235343_atAI961235Hs.96885hypothetical protein FLJ12505FLJ1250524B.229150_atAI810764Hs.102406Transcribed sequences25A.205734_sAI990465Hs.38070lymphoid nuclear protein relatedLAF4atto AF426A.214079_atAK000345Hs.272499Dehydrogenase/reductase (SDRDHRS2family)member 227B.238746_atBF245284Hs.354427Transcribed sequence with weaksimilarityto protein ref: NP_286085.1 (E. coli)beta-D-galactosidase[Escherichia coli O157: H7EDL933]28A.204623_atNM_003226Data notfound29B.230863_atR73030Hs.252938low density lipoprotein-relatedLRP2protein 230A.215047_atAL080170Data notfound31A.201710_atNM_002466Hs.179718v-myb myeloblastosis viralMYBL2oncogenehomolog (avian)-like 232A.205009_atNM_003225Data notfound33A.207750_atNM_018510Data notfound34B.237339_atAI668620Hs.144151Transcribed sequences35A.220540_atNM_022358Hs.528664potassium channel, subfamily K,KCNK15member 1536B.223062_sBC004863Hs.286049phosphoserine aminotransferase 1PSAT1at37A.204508_sBC001012Hs.512620carbonic anhydrase XIICA12at38A.214451_atNM_003221Hs.33102transcription factor AP-2 betaTFAP2B(activating enhancer bindingprotein 2 beta)39A.202870_sNM_001255Hs.82906CDC20 cell division cycle 20CDC20athomolog(S. cerevisiae)40B.236641_atAW183154Hs.3104kinesin family member 14KIF1441A.219197_sAI424243Hs.435861signal peptide, CUB domain,SCUBE2atEGF-like 242A.207183_atNM_006143Hs.92458G protein-coupled receptor 19GPR1943A.220414_atNM_017422Hs.180142calmodulin-like 5CALML544A.205354_atNM_000156Hs.81131guanidinoacetate N-GAMTmethyltransferase45A.201755_atNM_006739Hs.77171MCM5 minichromosomeMCM5maintenancedeficient 5, cell division cycle 46(S. cerevisiae)46A.209459_sAF237813Hs.15884-aminobutyrateABATataminotransferase47B.225516_atAA876372Hs.432978solute carrier family 7 (cationicSLC7A2aminoacid transporter, y+ system),member 248A.204558_atNM_003579Hs.66718RAD54-like (S. cerevisiae)RAD54L49B.224428_sAY029179Hs.435733cell division cycle associated 7CDCA7at50B.228854_atAI492388Hs.356349zinc finger protein 145ZNF145(Kruppel-like,expressed in promyelocyticleukemia)51A.208502_sNM_002653Hs.84136paired-like homeodomainPITX1attranscription factor 152B.226936_atBG492359Hs.35962CDNA clone IMAGE: 4448513,partial cds53B.230021_atAI638593Hs.441708hypothetical protein MGC45866MGC4586654A.206799_atNM_006551Hs.204096secretoglobin, family 1D,SCGB1D2member 255A.202410_xNM_000612Hs.349109insulin-like growth factor 2IGF2at(somatomedin A)56A.206509_atNM_002652Hs.99949prolactin-induced proteinPIP57A.204885_sNM_005823Hs.408488MesothelinMSLNat58A.201496_xAI889739Hs.78344myosin, heavy polypeptide 11,MYH11atsmoothmuscle59A.206401_sJ03778Hs.101174microtubule-associated proteinMAPTattau60A.204734_atNM_002275Hs.80342keratin 15KRT1561A.204014_atNM_001394Hs.417962dual specificity phosphatase 4DUSP462A.204775_atNM_005441Hs.75238chromatin assembly factor 1,CHAF1Bsubunit B (p60)63A.215356_atAK023134Hs.130675hypothetical gene FLJ13072FLJ1307264B.243049_atAI791225Hs.444098MRNA; cDNA DKFZp434I1226(from clone DKFZp434I1226)65B.223721_sAF176013Hs.260720DnaJ (Hsp40) homolog,DNAJC12atsubfamily C,member 1266A.219918_sNM_018123Data notatfound67B.243735_atN58363Hs.8739signal transducer and activatorSTATIP1of transcription3 interacting protein 168A.214188_atAW665096Hs.15299HMBA-inducibleHIS169B.226980_atAK001166Hs.421337DEP domain containing 1BDEPDC1B70A.203071_atNM_004636Hs.82222sema domain, immunoglobulinSEMA3Bdomain (Ig),short basic domain, secreted,(semaphorin) 3B71A.206204_atNM_004490Hs.411881growth factor receptor-boundGRB14protein 1472A.205979_atNM_002407Hs.97644secretoglobin, family 2A,SCGB2A1member 173A.208335_sNM_002036Hs.517102Duffy blood groupFYat74B.227550_atAW242720Hs.388347MRNA; cDNADKFZp686J0156 (from cloneDKFZp686J0156)75A.220187_atNM_024636Hs.44208likely ortholog of mouse tumorFLJ23153necrosis-alpha-induced adipose-related protein76B.226473_atBE514414Hs.103305hypothetical protein MGC10561MGC1056177A.204822_atNM_003318Hs.169840TTK protein kinaseTTK78A.204724_sNM_001853Hs.126248collagen, type IX, alpha 3COL9A3at79A.205240_atNM_013296Hs.278338G-protein signalling modulator 2GPSM2(AGS3-like,C. elegans)80A.205898_atU20350Hs.78913chemokine (C—X3—C motif)CX3CR1receptor 181B.223381_atAF326731Hs.234545cell division cycle associated 1CDCA182A.209243_sAF208967Hs.201776paternally expressed 3PEG3at83A.204146_atBE966146Data notfound84B.228273_atBG165011Hs.528654hypothetical protein FLJ11029FLJ1102985A.204162_atNM_006101Hs.414407kinetochore associated 2KNTC286A.204914_sAI360875Hs.432638SRY (sex determining regionSOX11atY)-box 1187A.209309_atD90427Hs.512643alpha-2-glycoprotein 1, zincAZGP188A.205048_sNM_003832Data notatfound89B.227419_xAW964972Hs.361171placenta-specific 9PLAC9at90B.232944_atAK024132Hs.525858MRNA; cDNADKFZp686I18125 (from cloneDKFZp686I18125)91B.224753_atBE614410Hs.434886cell division cycle associated 5CDCA592A.210051_atU78168Hs.8578Rap guanine nucleotideRAPGEF3exchange factor(GEF) 393A.215616_sAB020683Hs.301011jumonji domain containing 2BJMJD2Bat94A.210272_atM29873Hs.415794cytochrome P450, family 2,CYP2B7subfamily B,polypeptide 7 pseudogene95B.222608_sAK023208Hs.62180anillin, actin binding proteinANLNat(scraps homolog,Drosophila)96B.240724_atAI668629Hs.25345Transcribed sequences97B.228554_atAL137566Hs.32405MRNA; cDNADKFZp686A0815 (from cloneDKFZp686A0815)98A.205280_atNM_000824Hs.32973glycine receptor, betaGLRB99B.238659_atAA760689Hs.210532KIAA0141 gene productKIAA0141100B.238116_atAW959427Hs.98849dynein, cytoplasmic, lightDNCL2Bpolypeptide 2B101A.212448_atAB007899Hs.249798neural precursor cell expressed,NEDD4Ldevelopmentally down-regulated4-like102B.235572_atAI469788Hs.381225kinetochore protein Spc24Spc24103A.209603_atAI796169Hs.169946GATA binding protein 3GATA3104A.205358_atNM_000826Hs.335051glutamate receptor, ionotropic,GRIA2AMPA 2105A.202095_sNM_001168Hs.1578baculoviral IAP repeat-BIRC5atcontaining 5 (survivin)106A.211470_sAF186255Hs.38084sulfotransferase family,SULT1C1atcytosolic, 1C, member 1107A.205350_atNM_004378Hs.346950cellular retinoic acid bindingCRABP1protein 1108A.205890_sNM_006398Hs.44532ubiquitin DUBDat109A.209680_sBC000712Hs.20830kinesin family member C1KIFC1at110B.240192_atAI631850Hs.158992FLJ45983 proteinFLJ45983111A.205225_atNM_000125Hs.1657estrogen receptor 1ESR1112B.235545_atAI810054Hs.445098DEP domain containing 1DEPDC1113B.224210_sBC001147Hs.436924peroxisomal membrane proteinPXMP4at4, 24 kDa114B.229381_atAI732488Hs.29190hypothetical protein MGC24047MGC24047115A.210523_atD89675Hs.87223bone morphogenetic proteinBMPR1Breceptor, type IB116A.204641_atNM_002497Hs.153704NIMA (never in mitosis gene a)-NEK2relatedkinase 2117B.227764_atAA227842Hs.21929hypothetical protein MGC52057MGC52057118B.238900_atBE669692Data notfound119A.202580_xNM_021953Hs.511941forkhead box M1FOXM1at120A.205366_sNM_018952Hs.147465homeo box B6HOXB6at121B.227966_sAA524895Hs.449141Hypothetical proteinatLOC285103, mRNA(cDNA clone IMAGE: 5273139),partial cds122B.228069_atAL138828Data notfound123A.210163_atAF030514Hs.103982chemokine (C—X—C motif) ligandCXCL1111124A.204855_atNM_002639Hs.55279serine (or cysteine) proteinaseSERPINB5inhibitor,clade B (ovalbumin), member 5125B.229390_atAV734646Hs.381220Full length insert cDNA cloneZA84A12126A.203213_atAL524035Hs.334562cell division cycle 2, G1 to S andCDC2G2 to M127A.219555_sNM_018455Hs.283532uncharacterized bone marrowBM039atprotein BM039128B.227282_atAB037734Hs.4993protocadherin 19PCDH19129A.220085_atNM_018063Hs.203963helicase, lymphoid-specificHELLS130A.203256_atNM_001793Hs.191842cadherin 3, type 1, P-cadherinCDH3(placental)131B.234992_xBG170335Hs.293257epithelial cell transformingECT2atsequence 2oncogene132A.204825_atNM_014791Hs.184339maternal embryonic leucineMELKzipper kinase133A.204126_sNM_003504Hs.114311CDC45 cell division cycle 45-CDC45Latlike(S. cerevisiae)134A.218663_atNM_022346Hs.528669chromosome condensationHCAP-Gprotein G135B.239962_atAA972452Hs.292072Transcribed sequences136A.205046_atNM_001813Hs.75573centromere protein E, 312 kDaCENPE137B.235717_atAA180985Hs.285574zinc finger protein 229ZNF229138B.233154_atAK022197Hs.130581CDNA FLJ12135 fis, cloneMAMMA1000307139A.206754_sNM_000767Hs.1360cytochrome P450, family 2,CYP2B6atsubfamily B,polypeptide 6140A.204533_atNM_001565Hs.413924chemokine (C—X—C motif) ligandCXCL1010141A.212925_atAA143765Hs.439180chromosome 19 open readingC19orf21frame 21142B.223229_atAB032931Hs.5199HSPC150 protein similar toHSPC150ubiquitin-conjugating enzyme143A.206599_atNM_004695Hs.90911solute carrier family 16SLC16A5(monocarboxylicacid transporters), member 5144A.208103_sNM_030920Hs.385913acidic (leucine-rich) nuclearANP32Eatphosphoprotein32 family, member E145A.217953_atAW189430Hs.348921PHD finger protein 3PHF3146A.219686_atNM_018401Hs.58241serine/threonine kinase 32BSTK32B147A.217276_xAL590118Hs.301947kraken-likedJ222E13.1at148B.234863_xAK026197Hs.272027F-box protein 5FBXO5at149B.240465_atBF508074Data notfound150A.218308_atNM_006342Hs.104019transforming, acidic coiled-coilTACC3containing protein 3151A.206157_atNM_002852Hs.2050pentaxin-related gene, rapidlyPTX3inducedby IL-1 beta152A.209368_atAF233336Hs.212088epoxide hydrolase 2,EPHX2cytoplasmic153B.230856_atAI073396Hs.9398WD40 repeat protein InteractingWIPI49withphosphoInositides of 49 kDa154A.201890_atNM_001034Hs.226390ribonucleotide reductase M2RRM2polypeptide155A.205364_atNM_003500Hs.9795acyl-Coenzyme A oxidase 2,ACOX2branched chain156B.225911_atAL138410Hs.282832hypothetical protein LOC255743LOC255743157B.244696_atAI033582Hs.372254Transcribed sequences158A.218730_sNM_014057Hs.109439osteoglycin (osteoinductiveOGNatfactor, mimecan)159A.219498_sNM_018014Hs.314623B-cell CLL/lymphoma 11ABCL11Aat(zinc finger protein)160A.203702_sAL043927Hs.169910tubulin tyrosine ligase-likeTTLL4atfamily, member 4161A.206045_sNM_003787Hs.23567nucleolar protein 4NOL4at162A.219919_sNM_018276Hs.29173slingshot homolog 3SSH3at(Drosophila)163A.215779_sBE271470Data notatfound164B.230966_atAI859620Hs.437023interleukin 4 induced 1IL4I1165A.206378_atNM_002411Hs.46452secretoglobin, family 2A,SCGB2A2member 2166A.221562_sAF083108Hs.511950sirtuin (silent mating typeSIRT3atinformationregulation 2 homolog) 3 (S. cerevisiae)167A.221258_sNM_031217Hs.301052kinesin family member 18ADKFZP434G2226at168A.221577_xAF003934Hs.296638growth differentiation factor 15GDF15at169B.235709_atH37811Hs.20575growth arrest-specific 2 like 3GAS2L3170B.235171_atAI354636Data notfound171A.207437_atNM_006491Hs.292511neuro-oncological ventralNOVA1antigen 1172A.203638_sNM_022969Hs.404081fibroblast growth factor receptor 2FGFR2at(bacteria-expressed kinase,keratinocytegrowth factor receptor,craniofacialdysostosis 1, Crouzon syndrome,Pfeiffer syndrome, Jackson-Weiss syndrome)173A.218542_atNM_018131Hs.14559chromosome 10 open readingC10orf3frame 3174A.217613_atAW173720Hs.176227hypothetical protein FLJ11155FLJ11155175B.241310_atAI685841Hs.161354Transcribed sequences176A.205234_atNM_004696Hs.351306solute carrier family 16SLC16A4(monocarboxylicacid transporters), member 4177A.203726_sNM_000227Hs.83450laminin, alpha 3LAMA3at178A.221436_sNM_031299Hs.30114cell division cycle associated 3CDCA3at179A.205242_atNM_006419Hs.100431chemokine (C—X—C motif) ligandCXCL1313(B-cell chemoattractant)180A.218726_atNM_018410Hs.104859hypothetical proteinDKFZp762E1312DKFZp762E1312181A.218856_atNM_016629Data notfound182B.226661_atT90295Data notfound183A.218741_atNM_024053Hs.208912chromosome 22 open readingC22orf18frame 18184A.206201_sNM_005924Hs.77858mesenchyme homeo box 2MEOX2at(growtharrest-specific homeo box)185B.236184_atAI798959Hs.131686Transcribed sequences186A.220651_sNM_018518Hs.198363MCM10 minichromosomeMCM10atmaintenancedeficient 10 (S. cerevisiae)187A.216331_atAK022548Hs.74369integrin, alpha 7ITGA7188B.232105_atAU148391Hs.181245MRNA; cDNADKFZp686B15184 (from cloneDKFZp686B15184)189B.226907_atN32557Hs.192822protein phosphatase 1,PPP1R14Cregulatory(inhibitor) subunit 14C190B.234976_xBG324504Hs.321127solute carrier family 4, sodiumSLC4A5atbicarbonatecotransporter, member 5191A.211323_sL38019Hs.149900inositol 1,4,5-triphosphateITPR1atreceptor, type 1192A.206391_atNM_002888Hs.82547retinoic acid receptor responderRARRES1(tazarotene induced) 1193A.222348_atAW971134Hs.212787KIAA0303 proteinKIAA0303194B.235845_atAI380207Hs.368802Sp5 transcription factorSP5195B.239233_atAA744613Hs.292925KIAA1212KIAA1212196A.208383_sNM_002591Hs.1872phosphoenolpyruvatePCK1atcarboxykinase 1 (soluble)197A.214440_atNM_000662Hs.155956N-acetyltransferase 1 (arylamineNAT1N-acetyltransferase)198B.230456_atBE501559Hs.380824NS5ATP13TP2 proteinNS5ATP13TP2199A.219650_atNM_017669Data notfound200A.210052_sAF098158Hs.9329TPX2, microtubule-associatedTPX2atproteinhomolog (Xenopus laevis)201A.204468_sNM_005424Hs.78824tyrosine kinase withTIEatimmunoglobulinand epidermal growth factorhomology domains202A.209531_atBC001453Hs.26403glutathione transferase zeta 1GSTZ1(maleylacetoacetate isomerase)203A.217014_sAC004522Data notatfound204B.227155_atR10289Hs.3844LIM domain only 4LMO4205A.213520_atNM_004260Hs.31442RecQ protein-like 4RECQL4206B.241505_atBF513468Data notfound207A.213451_xBE044614Hs.411644tenascin XBTNXBat208A.214389_atAI733515Hs.148907hypothetical protein MGC52019MGC52019209B.235229_atAI694413Data notfound210A.203571_sNM_006829Hs.511763chromosome 10 open readingC10orf116atframe 116211B.237168_atAA708016Data notfound212A.203915_atNM_002416Hs.77367chemokine (C—X—C motif) ligand 9CXCL9213B.224509_sBC006399Hs.155839reticulon 4 interacting protein 1RTN4IP1at214A.206093_xNM_007116Data notatfound215A.205613_atNM_016524Hs.258326B/K proteinLOC51760216B.236885_atAI651930Data notfound217B.236341_atAI733018Hs.247824cytotoxic T-lymphocyte-CTLA4associated protein 4218A.221854_atAI378979Hs.313068plakophilin 1 (ectodermalPKP1dysplasia/skin fragility syndrome)219A.201291_sNM_001067Hs.156346topoisomerase (DNA) II alphaTOP2Aat170 kDa220B.232734_atAK023230Hs.139709hypothetical protein FLJ12572FLJ12572221A.214053_atAW772192Hs.7888CDNA FLJ44318 fis, cloneTRACH3000780222B.231195_atAI492376Data notfound223A.212956_atAB020689Hs.411317KIAA0882 proteinKIAA0882224A.214404_xAI307915Hs.79414SAM pointed domain containingSPDEFatetstranscription factor225B.237086_atAI693336Hs.163484forkhead box A1FOXA1226A.205948_atNM_007050Hs.225952protein tyrosine phosphatase,PTPRTreceptor type, T227A.214745_atAW665865Hs.193143KIAA1069 proteinKIAA1069228A.208029_sNM_018407Hs.296398lysosomal associated proteinLAPTM4Battransmembrane 4 beta229A.205569_atNM_014398Hs.10887lysosomal-associated membraneLAMP3protein 3230B.235046_atAA456099Hs.176376Transcribed sequences231A.203130_sNM_004522Data notatfound232B.238584_atW52934Hs.113009hypothetical protein FLJ22527FLJ22527233A.220986_sNM_030953Hs.169333tigger transposable elementTIGD6atderived 6234A.205023_atD14134Hs.446554RAD51 homolog (RecARAD51homolog, E. coli)(S. cerevisiae)235B.237048_atAW451103Hs.71371Clone IMAGE: 4797878,mRNA, partial cds236B.225400_atBF111780Hs.440663chromosome 1 open readingC1orf19frame 19237A.206134_atNM_014479Hs.145296ADAM-like, decysin 1ADAMDEC1238A.214469_atNM_021052Hs.121017histone 1, H2aeHIST1H2AE239A.202188_atNM_014669Hs.295014nucleoporin 93 kDaNUP93240A.204678_sU90065Hs.376874potassium channel, subfamily K,KCNK1atmember 1241B.231517_atAW243917Hs.196566ZYG-11A early embryogenesisproteinmRNA, complete cds242A.210387_atBC001131Data notfound243B.223623_atAF325503Hs.43125esophageal cancer related gene 4ECRG4protein244B.228729_atN90191Hs.23960cyclin B1CCNB1245A.204904_atNM_002060Hs.296310gap junction protein, alpha 4,GJA437 kDa(connexin 37)246B.237301_atBF433570Hs.144479Transcribed sequences247B.239623_atN93197Hs.49573CDNA FLJ44606 fis, cloneBRACE2005991248B.242601_atAA600175Hs.443169hypothetical protein LOC253012LOC253012249B.223861_atAL136755Hs.298312HORMA domain containingNOHMAprotein250A.213122_atAI096375Hs.173094TSPY-like 5TSPYL5251A.204482_atNM_003277Hs.505337claudin 5 (transmembraneCLDN5proteindeleted in velocardiofacialsyndrome)252B.240512_xH10766Hs.23406potassium channelKCTD4attetramerisationdomain containing 4253A.209642_atAF043294Hs.287472BUB1 budding uninhibited byBUB1benzimidazoles 1 homolog(yeast)254B.239669_atAW006409Hs.532143Transcribed sequences255B.243028_xBE045392Data notatfound256A.210721_sAB040812Hs.32539p21(CDKN1A)-activated kinase 7PAK7at257A.215942_sBF973178Hs.122552G-2 and S-phase expressed 1GTSE1at258B.222895_sAA918317Hs.57987B-cell CLL/lymphoma 11BBCL11Bat(zinc finger protein)259A.203708_atNM_002600Hs.188phosphodiesterase 4B, cAMP-PDE4Bspecific(phosphodiesterase E4 duncehomolog,Drosophila)260B.235178_xAL120674Data notatfound261B.236471_atAI949827Hs.404741nuclear factor (erythroid-derivedNFE2L32)-like 3262A.220024_sNM_020956Hs.205457periaxinPRXat263A.213711_atNM_002281Hs.170925keratin, hair, basic, 1KRTHB1264A.204766_sNM_002452Hs.413078nudix (nucleoside diphosphateNUDT1atlinkedmoiety X)-type motif 1265B.227182_atAW966474Hs.88417sushi domain containing 3SUSD3266A.220061_atNM_017888Hs.122939hypothetical protein FLJ20581FLJ20581267A.220117_atNM_024697Hs.99256hypothetical protein FLJ22419FLJ22419268B.237395_atAV700083Hs.176588cytochrome P450, family 4,CYP4Z1subfamily Z,polypeptide 1269B.226034_atBE222344Hs.346735Clone IMAGE: 3881549, mRNA270A.207038_atNM_004694Hs.42645solute carrier family 16SLC16A6(monocarboxylicacid transporters), member 6271B.238541_atBE544855Hs.236572CDNA clone IMAGE: 5265729,partial cds272A.207702_sNM_012301Hs.22599atrophin-1 interacting protein 1AIP1at273B.236496_atAW006352Hs.159643chromosome 14 open readingC14orf66frame 66274A.215300_sAK022172Hs.396595flavin containingFMO5atmonooxygenase 5275A.219580_sNM_024780Hs.145807transmembrane channel-like 5TMC5at276B.230469_atAW665138Hs.58559pleckstrin homology domainPLEKHK1containing,family K member 1277B.243636_sAI042373Hs.132917Transcribed sequencesat278A.203764_atNM_014750Hs.77695discs, large homolog 7DLG7(Drosophila)279A.209936_atAF107493Hs.439480RNA binding motif protein 5RBM5280A.207961_xNM_022870Data notatfound281B.233059_atAK026384Hs.199776potassium inwardly-rectifyingKCNJ3channel,subfamily J, member 3282A.221583_sAI129381Hs.354740potassium large conductanceKCNMA1atcalcium-activatedchannel, subfamily M, alphamember 1283B.228762_atAW151924Hs.159142lunatic fringe homologLFNG(Drosophila)284A.219415_atNM_020659Hs.268728tweety homolog 1 (Drosophila)TTYH1285A.203397_sBF063271Hs.278611UDP-N-acetyl-alpha-D-GALNT3atgalactosamine:polypeptideN-acetylgalactosaminyltransferase3(GalNAc-T3)286A.206091_atNM_002381Hs.278461matrilin 3MATN3287A.217562_atBF589529Hs.497208DBCCR1-likeDBCCR1L288B.229764_atAW629527Hs.338851FLJ41238 proteinFLJ41238289B.232544_atAU144916Hs.222056CDNA FLJ11572 fis, cloneHEMBA1003373290A.203819_sAU160004Hs.79440IGE-II mRNA-binding protein 3IMP-3at291A.206102_atNM_021067Data notfound292A.210738_sAF011390Hs.5462solute carrier family 4, sodiumSLC4A4atbicarbonatecotransporter, member 4293B.236285_atAI631846Hs.137007hypothetical protein BC009980LOC113730294A.209800_atAF061812Hs.432448keratin 16 (focal non-KRT16epidermolyticpalmoplantar keratoderma)295A.218211_sNM_024101Hs.297405MelanophilinMLPHat296B.223361_atAF116682Hs.238205chromosome 6 open readingC6orf115frame 115297B.242776_atAA584428Hs.12742zinc finger, CCHC domainZCCHC6containing 6298A.221909_atBF984207Data notfound299A.209408_atU63743Hs.69360kinesin family member 2CKIF2C300A.215812_sU41163Data notatfound301B.232238_atAK001380Hs.121028asp (abnormal spindle)-like,ASPMmicrocephalyassociated (Drosophila)302B.223126_sAF312864Hs.12532chromosome 1 open readingC1orf21atframe 21303A.212141_atX74794Hs.460184MCM4 minichromosomeMCM4maintenance deficient 4(S. cerevisiae)304A.222325_atAW974812Hs.433049Transcribed sequences305B.224314_sAF277174Hs.130946egl nine homolog 1 (C. elegans)EGLN1at306A.207470_atNM_017535Hs.194369arginine-glutamic acid dipeptideRERE(RE) repeats307B.228504_atAI828648Hs.406684sodium channel, voltage-gated,SCN7Atype VII, alpha308B.228245_sAW594320Hs.405557ovostatin 2OVOS2at309A.213712_atBF508639Hs.58488catenin (cadherin-associatedCTNNAL1protein),alpha-like 1310A.213998_sAW188131Hs.250696DEAD (Asp-Glu-Ala-Asp) boxDDX17atpolypeptide 17311B.230323_sAW242836Hs.355663hypothetical protein BC016153LOC120224at312A.212713_atR72286Hs.296049microfibrillar-associated protein 4MFAP4313B.230316_atR49343Hs.430576SEC14-like 2 (S. cerevisiae)SEC14L2314A.32128_atY13710Hs.16530chemokine (C—C motif) ligandCCL1818(pulmonary and activation-regulated)315B.236718_atAI278445Hs.43334Transcribed sequence with weaksimilarityto protein sp: P39189 (H. sapiens)ALU2_HUMAN Alu subfamilySB sequenceContamination warning entry316B.227030_atBG231773Hs.371680CDNA FLJ46579 fis, cloneTHYMU3042758317B.235658_atAW058580Hs.151444Transcribed sequences318B.230622_atBE552393Hs.100469myeloid/lymphoid or mixed-MLLT4lineageleukemia (trithorax homolog,Drosophila);translocated to, 4319A.205213_atNM_014716Hs.337242centaurin, beta 1CENTB1320A.221754_sAI341234Hs.6191coronin, actin binding protein,CORO1Bat1B321A.214612_xU10691Data notatfound322A.203463_sH05668Hs.7407epsin 2EPN2at323B.237350_atAW027968Hs.454465Similar to CDNA sequenceBC021608(LOC143941), mRNA324A.220789_sNM_004749Hs.231411transforming growth factor betaTBRG4atregulator 4325A.208496_xNM_003534Hs.247813histone 1, H3gHIST1H3Gat326A.202992_atNM_000587Hs.78065complement component 7C7327A.210432_sAF225986Hs.300717sodium channel, voltage-gated,SCN3Aattype III, alpha328B.239525_atAI733041Hs.374649hypothetical proteinDKFZp547A023DKFZp547A023329B.244344_atAW135316Hs.105448protein kinase, lysine deficient 4PRKWNK4330B.236773_atAI635931Hs.147613Transcribed sequences331A.207118_sNM_004659Hs.211819matrix metalloproteinase 23BMMP23Bat332B.228558_atAL518291Data notfound333B.230269_atAI963605Hs.406256Transcribed sequences334B.228262_atAW237462Hs.127951hypothetical protein FLJ14503FLJ14503335B.238878_atAA496211Hs.157208aristaless related homeoboxARX336B.228559_atBF111626Hs.55028CDNA clone IMAGE: 6043059,partial cds337A.204542_atNM_006456Hs.288215sialyltransferase 7SIAT7B((alpha-N-acetylneuraminyl-2,3-beta-galactosyl-1,3)-N-acetyl galactosaminidealpha-2,6-sialyltransferase) B338B.224839_sBF310919Hs.355862glutamic pyruvate transaminaseGPT2at(alanine aminotransferase) 2339A.209755_atAF288395Hs.158244nicotinamide nucleotideNMNAT2adenylyltransferase 2340B.229019_atAI694320Hs.6295zinc finger protein 533ZNF533341A.218039_atNM_016359Hs.279905nucleolar and spindle associatedNUSAP1protein 1342A.205947_sNM_003382Hs.170560vasoactive intestinal peptideVIPR2atreceptor 2343B.244107_atAW189097Hs.444393Transcribed sequences344B.228241_atAI827789Hs.100686breast cancer membrane proteinBCMP1111345A.204750_sBF196457Hs.95612desmocollin 2DSC2at346A.204130_atNM_000196Hs.1376hydroxysteroid (11-beta)HSD11B2dehydrogenase 2347A.220119_atNM_022140Hs.104746erythrocyte membrane proteinEPB41L4Aband 4.1 like 4A348B.230238_atAI744123Hs.13308hypothetical protein LOC134548LOC134548349A.204719_atNM_007168Hs.58351ATP-binding cassette, sub-ABCA8family A(ABC1), member 8350A.219961_sNM_018474Hs.436632chromosome 20 open readingC20orf19atframe 19351A.219132_atNM_021255Hs.44038pellino homolog 2 (Drosophila)PELI2352A.220584_atNM_025094Data notfound353B.227350_atAI807356Hs.127797CDNA FLJ11381 fis, cloneHEMBA1000501354B.230800_atAV699353Hs.443428adenylate cyclase 4ADCY4355A.204709_sNM_004856Hs.270845kinesin family member 23KIF23at356B.243526_atAI968904Hs.174373hypothetical protein LOC349136LOC349136357A.219491_atNM_024036Hs.148438leucine rich repeat andLRFN4fibronectintype III domain containing 4358A.204686_atNM_005544Hs.390242insulin receptor substrate 1IRS1359B.228066_atAI870951Hs.445574Transcribed sequence with weaksimilarityto protein pir: I37984 (H. sapiens)I37984keratin 9, type I, cytoskeletal -human360A.206795_atNM_004101Hs.42502coagulation factor II (thrombin)F2RL2receptor-like 2361A.209464_atAB011446Hs.442658aurora kinase BAURKB362B.229082_atAI141520Data notfound363B.240304_sBG484769Hs.115838CDNA FLJ44282 fis, cloneatTRACH2003516364B.227702_atAA557324Hs.439760cytochrome P450, family 4,CYP4X1subfamily X,polypeptide 1365B.235077_atBF956762Hs.418271maternally expressed 3MEG3366A.202705_atNM_004701Hs.194698cyclin B2CCNB2367A.209616_sS73751Hs.278997carboxylesterase 1CES1at(monocyte/macrophageserine esterase 1)368A.211441_xAF280113Hs.306220cytochrome P450, family 3,CYP3A43atsubfamily A,polypeptide 43369B.241861_atR89089Data notfound370B.228425_atBF056746Hs.516311MRNA; cDNADKFZp686E10196(from clone DKFZp686E10196);complete cds371A.213938_atZ38645Hs.476384CAZ-associated structuralCASTprotein372A.202409_atX07868Data notfound373A.219115_sNM_014432Hs.288240Interleukin 20 receptor, alphaIL20RAat374A.39248_atN74607Hs.234642Aquaporin 3AQP3375B.227232_atT58044Data notfound376B.230319_atAI222435Hs.90250CDNA FLJ36413 fis, cloneTHYMU2010816.377A.203287_atNM_005558Hs.18141Ladinin 1LAD1378A.218009_sNM_003981Hs.344037Protein regulator of cytokinesis 1PRC1at379A.222351_atAW009884Hs.431156Protein phosphatase 2 (formerlyPPP2R1B2A),Regulatory subunit A (PR 65),beta isoform380A.204794_atNM_004418Hs.1183Dual specificity phosphatase 2DUSP2381A.211456_xAF333388Data notatfound382A.206296_xNM_007181Hs.95424Mitogen-activated protein kinaseMAP4K1atkinaseKinase kinase 1383A.205357_sNM_000685Hs.197063Angiotensin II receptor, type 1AGTR1at384B.244385_atAA766126Data notfound385A.202235_atNM_003051Hs.75231Solute carrier family 16SLC16A1(monocarboxylicAcid transporters), member 1386B.240422_atAI935710Hs.530456Transcribed sequences387B.230644_atAI375083Hs.31522Leucine rich repeat andLRFN5fibronectin type IIIDomain containing 5388A.220238_sNM_018846Hs.376793Kelch-like 7 (Drosophila)KLHL7at389B.235004_atAI677701Hs.201619RNA binding motif protein 24RBM24390A.201397_atNM_006623Hs.3343PhosphoglyceratePHGDHdehydrogenase391A.208010_sNM_012411Hs.87860Protein tyrosine phosphatase,PTPN22atNon-receptor type 22(lymphoid)392A.210138_atAF074979Hs.141492Regulator of G-proteinRGS20signalling 20393A.203828_sNM_004221Hs.943Natural killer cell transcript 4NK4at394A.205862_atNM_014668Hs.438037GREB1 proteinGREB1395A.219984_sNM_020386Hs.36761HRAS-like suppressorHRASLSat396A.203358_sNM_004456Hs.444082Enhancer of zeste homolog 2EZH2at(Drosophila)397B.232570_sAL356755Data notatfound398A.212613_atAI991252Hs.376046Butyrophilin, subfamily 3,BTN3A2member A2399B.238077_atT75480Hs.13982Potassium channelKCTD6tetramerisationDomain containing 6400A.217023_xAF099143Data notatfound401B.242093_atAW263497Hs.97774Synaptotagmin-like 5SYTL5402B.232979_atAK000839Hs.306410CDNA FLJ20832 fis, cloneADKA03033403B.232286_atAA572675Hs.188173CDNA FLJ12187 fis, cloneMAMMA1000831404A.203223_atNM_004703Hs.390163Rabaptin, RAB GTPase bindingRABEP1effector protein 1405B.225834_atAL135396Hs.339665Similar to RIKEN cDNAMGC578272700049P18 gene406A.205591_atNM_006334Hs.74376Olfactomedin 1OLFM1407B.228058_atAI559190Hs.105887Similar to common salivaryLOC124220protein 1408A.207828_sNM_005196Data notatfound409A.222379_atAI002715Hs.348522Potassium voltage-gatedKCNE4channel,Isk-related family, member 4410A.210084_xAF206665Hs.405479Tryptase, alphaTPS1at411B.233249_atAU155297Hs.287562CDNA FLJ13313 fis, cloneOVARC1001489412B.232948_atAU147218Hs.297369CDNA FLJ12111 fis, cloneMAMMA1000025413B.229033_sAA143060Hs.454758Melanoma associated antigenMUM1at(mutated) 1414B.229623_atBF508344Hs.112742CDNA clone IMAGE: 6301163,containingFrame-shift errors415A.222339_xAI054381Hs.293379Transcribed sequencesat416A.205347_sNM_021992Hs.56145Thymosin, beta, identified inTMSNBatneuroblastomaCells417B.229245_atAA535361Hs.343666Phosphoinositol 3-phosphate-PEPP3bindingProtein-3418B.225491_atAL157452Hs.349088Solute carrier family 1 (glialSLC1A2high affinityGlutamate transporter), member 2419B.239594_atBF110735Data notfound420A.213906_atAW592266Hs.300592v-myb myeloblastosis viralMYBL1oncogenehomolog (avian)-like 1421B.223757_atAF305836Hs.406958Deiodinase, iodothyronine, typeDIO3OSIII oppositeStrand422B.242296_xBF594828Hs.91145Transcribed sequencesat423B.236312_atAA938184Hs.44380Transcribed sequence with weaksimilarityto protein ref: NP_071385.1(H. sapiens)hypothetical protein FLJ20958[Homo sapiens]424B.227529_sBF511276Hs.197081A kinase (PRKA) anchor proteinAKAP12at(gravin) 12425A.221928_atAI057637Hs.234898acetyl-Coenzyme A carboxylaseACACBbeta426B.244013_atAI084430Hs.113919Hypothetical proteinLOC374969LOC374969427A.219769_atNM_020238Hs.142179inner centromere proteinINCENPantigens 135/155 kDa428B.239758_atAI142126Hs.26125Transcribed sequences429B.239913_atAI421796Hs.132591solute carrier family 10SLC10A4(sodium/bile acidcotransporter family), member 4430A.211226_atAF080586Hs.158351galanin receptor 2GALR2431A.206023_atNM_006681Hs.418367Neuromedin UNMU432A.210538_sU37546Data notatfound433B.232277_atAA643687Hs.149425solute carrier family 28 (sodium-SLC28A3couplednucleoside transporter), member 3434A.207339_sNM_002341Hs.376208Lymphotoxin beta (TNFLTBatsuperfamily, member 3)435A.37145_atM85276Data notfound436B.243837_xAA639707Hs.443239Transcribed sequencesat437A.221198_atNM_021920Data notfound438B.233442_atAU147500Hs.287499CDNA FLJ12196 fis, cloneMAMMA1000867439B.232545_atAF176701Hs.442734F-box and leucine-rich repeatFBXL9protein 9440B.238323_atBG387172Hs.528776TEA domain family member 2TEAD2441B.231993_atAK026784Hs.301296CDNA: FLJ23131 fis, cloneLNG08502442B.224212_sAF169689Hs.247734Protocadherin alpha 2PCDHA2at443B.231560_atD59759Data notfound444A.201195_sAB018009Hs.184601solute carrier family 7 (cationicSLC7A5atamino acidtransporter, y+ system), member 5445B.239185_atAI284184Hs.388917ATP-binding cassette, sub-ABCA9family A (ABC1),member 9446B.232776_atAU145289Hs.193223CDNA FLJ11646 fis, cloneHEMBA1004394447A.212865_sBF449063Hs.512555collagen, type XIV, alpha 1COL14A1at(undulin)448B.228750_atAI693516Hs.28625Transcribed sequences449B.241577_atAI732794Data notfound450A.209125_atJ00269Data notfound451B.238898_atBG028463Hs.163734Transcribed sequences452A.203548_sBF672975Hs.180878lipoprotein lipaseLPLat453B.230363_sBE858808Hs.52463inositol polyphosphate-5-INPP5Fatphosphatase F454A.221111_atNM_018402Hs.272350interleukin 26IL26455B.226597_atAI348159Hs.76277polyposis locus protein 1-like 1DP1L1456A.218169_atNM_018052Hs.445061Hypothetical protein FLJ10305FLJ10305457A.206107_atNM_003834Hs.65756regulator of G-protein signallingRGS1111458B.230158_atAA758751Hs.484250Hypothetical protein FLJ32949FLJ32949459B.244706_atAA521309Hs.380763similar to hypothetical proteinLOC115294FLJ10883460B.228648_atAA622495Hs.10844leucine-rich alpha-2-LRG1glycoprotein 1461B.237047_atAI678049Hs.508819CDNA FLJ40458 fis, cloneTESTI2041778462A.205671_sNM_002120Hs.1802major histocompatibilityHLA-DOBatcomplex, class II,DO beta463A.217167_xAJ252550Data notatfound464A.205399_atNM_004734Hs.21355Doublecortin and CaM kinase-DCAMKL1like 1465B.236646_atBE301029Hs.226422Hypothetical protein FLJ31166FLJ31166466A.203354_sAW117368Hs.408177ADP-ribosylation factor guanineEFA6Ratnucleotidefactor 6467B.237252_atAW119113Hs.2030ThrombomodulinTHBD468A.206341_atNM_000417Hs.130058interleukin 2 receptor, alphaIL2RA469A.210525_xBC001787Hs.123232Chromosome 14 open readingC14orf143atframe 143470A.214897_atAB007975Hs.492779MRNA, chromosome 1 specifictranscriptKIAA0506.471A.203362_sNM_002358Hs.79078MAD2 mitotic arrest deficient-MAD2L1atlike 1 (yeast)472B.230874_atAI241896Hs.48653CDNA FLJ39593 fis, cloneSKNSH2001222473B.224396_sAF316824Hs.435655asporin (LRR class 1)ASPNat474A.208305_atNM_000926Hs.2905Progesterone receptorPGR475B.223867_atAF334676Hs.414648tektin 3TEKT3476A.211363_sAF109294Hs.459541MethylthioadenosineMTAPatphosphorylase477B.232267_atAL162032Hs.23644G protein-coupled receptor 133GPR133478B.244121_atBE835502Data notfound479B.242808_atAI733287Hs.203755Transcribed sequence withmoderate similarityto protein sp: P12947(H. sapiens)RL31_HUMAN 60S ribosomalprotein L31480A.215465_atAL080207Hs.134585ATP-binding cassette, sub-ABCA12family A (ABC1),member 12481A.210244_atU19970Hs.51120Cathelicidin antimicrobialCAMPpeptide482A.204603_atNM_003686Hs.47504Exonuclease 1EXO1483B.232986_atAC074331Data notfound484B.225241_atBG253437Hs.356289steroid sensitive gene 1URB485B.230760_atBF592062Hs.169859zinc finger protein, Y-linkedZFY486A.209480_atM16276Hs.409934major histocompatibilityHLA-DQB1complex,class II, DQ beta 1487A.206664_atNM_001041Hs.429596Sucrase-isomaltase (alpha-SIglucosidase)488A.206291_atNM_006183Hs.80962NeurotensinNTS489A.222085_atAW452357Hs.27373Hypothetical gene supported byLOC400451AK075564;BC060873490A.214899_atAC007842Data notfound491B.240174_atBF512871Hs.193522Transcribed sequence withmoderateSimilarity to protein sp: P39188(H. sapiens)ALU1_HUMAN Alu subfamilyJ sequenceContamination warning entry492A.219148_atNM_018492Hs.104741T-LAK cell-originated proteinTOPKkinase493B.226303_atAA706788Hs.46531Phosphoglucomutase 5PGM5494B.222848_atBC005400Hs.164018Leucine zipper protein FKSG14FKSG14495A.202270_atNM_002053Hs.62661Guanylate binding protein 1,GBP1interferon-inducible,67 kDa496A.205266_atNM_002309Hs.2250leukemia inhibitory factorLIF(cholinergicdifferentiation factor)497B.239008_atAW606588Hs.430335Transcribed sequence with weaksimilarity to protein sp: P39195(H. sapiens) ALU8_HUMANAlu subfamily SX sequencecontamination warning entry498B.228194_sAI675836Hs.348923sortilin-related VPS10 domainSORCS1atcontainingreceptor 1499A.215514_atAL080072Hs.21195MRNA; cDNADKFZp564M0616 (from cloneDKFZp564M0616)500A.219010_atNM_018265Hs.73239Hypothetical protein FLJ10901FLJ10901


The 500-gene classifier: The genes are ranked according to their correlation with p53 status. The genes are identified by their GenBank Accession Nos., Affymetrix Probeset IDs, Unigene IDs, Unigene Names and Unigene Symbols.


For sequences and SEQ ID NOs for the genes described in Table 1, see FIGS. 9-508 in which each of the sequences for the above genes is shown and is associated with a GenBank Accession No., Unigene ID, and/or a Unigene Name, and a SEQ ID NO.


Example 3
The p53 Classifier has Significant Accuracy in Two Independent Datasets

The performance of the p53 classifier in the context of independent datasets was then evaluated. FIG. 3 shows that genes of the classifier can predict p53 status in independent cDNA microarray datasets. (A) A 9-gene subset of the 32-gene classifier can predict p53 status in an independent breast cancer dataset. 9 genes of our classifier were selected based on their presence in 50% or more of the tumors. The tumors used in the analysis were required to have expression data present for >50% of the genes. (B) An 8-gene subset of the p53 classifier can predict p53 status in an independent liver cancer dataset. 8 overlapping genes were selected based on their presence in 90% or more of the tumors. The tumors used in the analysis were required to have expression data present for >50% of the genes. (A&B) Black vertical bars indicate p53 mutant status. Gene symbols (Unigene build #167) and corresponding IMAGE clone IDs (from the original studies) are listed. The hierarchical clustergrams are shown. Genes (rows) and tumors (columns) were clustered. In the tumor dendrograms, the green branch denotes the wildtype-like configurations, and the red branch the mutant-like profiles.


Two publicly available microarray datasets where p53 status was known, were therefore accessed: a breast cancer study by Sorlie et al (Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100, 8418-23 (2003), incorporated herein by reference) and a liver cancer study by Chen et al (Chen, X. et al. Gene expression patterns in human liver cancers. Mol Biol Cell 13, 1929-39 (2002), incorporated herein by reference). Both studies were conducted on cDNA microarray platforms.


In the Sorlie dataset, 69 breast tumors were sequenced for p53 mutations. This subset of tumors was queried for the availability of expression data corresponding to the genes of the classifier. Twenty-eight genes in the classifier mapped to UniGene IDs (build #167). Though over half of these genes mapped to the Sorlie et. al. microarray, few were expressed in the majority of the tumors, and a number of tumors possessed measurements for less than half of the genes. Only 9 genes in the classifier were found to correspond to cDNA probes (representing 9 different genes) having expression measurements present in >50% of the tumors, where the tumors possessed measurements for >50% of the genes (resulting in a subset of 44 well-sampled tumors). Using this 9-gene subset of the classifier to hierarchically cluster the tumors (FIG. 3A), 77% of the p53 mt tumors clustered into one branch, and 77% of the wildtypes clustered into the other (pcs=3.0×10−4) recapitulating the robust predictive capability of the classifier.


A cDNA-microarray based liver cancer dataset where p53 status was ascertained by immunohistochemistry, IHC (Chen, X. et al. Gene expression patterns in human liver cancers. Mol Biol Cell 13, 1929-39 (2002), incorporated herein by reference) was next analyzed. In this study, p53 protein levels were ascertained by IHC. Here, 8 classifier genes could be mapped to all 59 tumors assayed for p53 status (with each gene having data present in 90% or more of all tumors, and where each tumor contained data for >50% of the genes). With similar statistical significance as that seen in the breast cancer dataset (i.e, pfe=3.5×10−4), this 8-gene subset of the classifier was able to cluster the HCC samples into two predominant clusters correlated with p53 status: 87% of the mutants in one cluster, and 61% of the wildtypes in the other (FIG. 3B). Together, these observations suggest that the genes comprising the p53 classifier are robust in their ability to classify not only breast tumors based on p53 status, but also liver cancers, and therefore may have generalizable utility in predicting p53 status in other cancer types.

TABLE 2GenbankAffymetrixUniGene IDUniGeneAccession No.Probeset ID(build #171)UniGene Name (build #167)SymbolAI961235B.235343_atHs.96885Hypothetical protein FLJ12505FLJ12505BG271923B.238581_atHs.237809Guanylate binding protein 5GBP5NM_002466A.201710_atHs.179718v-myb myeloblastosis viralMYBL2oncogene homolog (avian)-like 2BC001651A.221520_s_atHs.48855Cell division cycle associated 8CDCA8D38553A.212949_atHs.308045Barren homolog (Drosophila)BRRN1AK000345A.214079_atHs.272499Dehydrogenase/reductase (SDRDHRS2family) member 2AA742697B.230378_atHs.62492Secretoglobin, family 3A, member 1SCGB3A1AL080170A.215047_atBF245284B.238746_atHs.354427Transcribed sequencesBC004504A.221585_atHs.331904Calcium channel, voltage-CACNG4dependent, gamma subunit 4H15261B.243929_atHs.21948Transcribed sequencesNM_000909A.205440_s_atHs.519057Neuropeptide Y receptor Y1NPY1RNM_024843A.217889_s_atHs.31297Cytochrome b reductase 1CYBRD1R73030B.230863_atHs.252938Low density lipoprotein-relatedLRP2protein 2NM_030896A.221275_s_atAI435828A.203438_atHs.155223Stanniocalcin 2STC2AL512727A.215014_atHs.232127MRNA; cDNA DKFZp547P042 (from cloneDKFZp547P042)AW242997B.229030_atAI810764B.229150_atHs.102406Transcribed sequencesAI922323B.228969_atHs.226391Anterior gradient 2 homologAGR2(Xenopus laevis)AL360204B.232855_atHs.283853MRNA full length insert cDNA cloneEUROIMAGE 980547NM_003225A.205009_atHs.350470Trefoil factor 1 (breast cancer,TFF1estrogen-inducible sequenceexpressed in)NM_003226A.204623_atHs.82961Trefoil factor 3 (intestinal)TFF3AW299538B.227081_atHs.75528Nucleolar GTPaseHUMAUANTIGNM_003462A.205186_atHs.406050Dynein, axonemal, lightDNALI1intermediate polypeptide 1AI990465A.205734_s_atHs.38070Lymphoid nuclear protein relatedLAF4to AF4NM_004392A.205472_s_atHs.63931Dachshund homolog (Drosophila)DACH1NM_001267A.206869_atHs.97220ChondroadherinCHADAF269087B.223864_atHs.326736Breast cancer antigen NY-BR-1NY-BR-1AI826437B.229975_atHs.283417Transcribed sequencesAL355392B.226067_atAU156421B.233413_atHs.518736CDNA FLJ13457 fis, clonePLACE1003343.


Optimized 32-gene p53 Classifier: The genes are identified by their GenBank Accession Nos., Affymetrix Probeset IDs, Unigene IDs, Unigene Names and Unigene Symbols.


Example 4
The p53 Classifier is a Greater Prognostic Indicator of Patient Outcome than p53 Mutation status Alone

It is widely accepted that in breast cancer and other tumor types p53 status is prognostic of clinical outcomes such as tumor recurrence, patient survival, and therapeutic response. The hypothesis that a classifier based on p53 activity would out-perform p53 mutation status alone as a prognostic indicator of clinical outcomes was tested. FIG. 4 shows that the p53 classifier has greater prognostic significance than p53 mutation status alone. Kaplan-Meier survival curves are shown for patients classified according to (A) p53 mutation status, (B&C) the p53 classifier, or (D) both. The clinical endpoint was death from breast cancer (ie, disease-specific survival). In A,B, and D all 257 patients were assessed; in C, only the 198 patients with p53 wildtype tumors were assessed. The Wald test (pw) was used to assess significance of the hazard ratios (HR).


The classifier and sequence-level p53 mutation status were compared with respect to their abilities to predict disease-specific survival (DSS) in all 257 patients of the Uppsala cohort regardless of treatment type or clinical stage.


The significance of the hazard ratio generated using the p53 classifier to segregate patients was an order of magnitude greater than that obtained using p53 mutation status alone (pw=0.00057 versus pw=0.012, respectively) (FIG. 4 A&B); notably, this improved p-value was statistically significant at pmc=0.0046. Furthermore, the p53 classifier could also significantly segregate patients into low and high risk groups in the subset of 198 women confirmed by sequencing to have wildtype p53 (pw=0.016) (FIG. 4C) indicating that those with p53 wt tumors classified as mutant-like have poorer DSS than those with wt tumors of the wt-like class. In FIG. 4D, survival curves among all four tumor subgroups were compared. Notably, it was observed that patients with p53 mt or wt tumors classified as mt-like (green and blue curves, respectively) have similar overall survival curves, while the twelve with p53 mt tumors classified as wt-like (red curve) show a survival curve that falls between that of the group with mutant-like p53 mt tumors (green curve) and that of the group with wt-like p53 wt tumors (black curve) and is not significantly different from either curve (pw=0.47 for mt/mt-like comparison and pw=0.37 for wt/wt-like comparison).


Next, the prognostic significance of the classifier on the Sorlie et al cDNA microarray dataset was examined (Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100, 8418-23 (2003), incorporated herein by reference). FIG. 5 shows that the p53 classifier has strong prognostic significance in an independent dataset of late-stage tumors. Tumors were hierarchically classified according to the 9-gene partial classifier described in FIG. 3 and analyzed for correlations with survival outcomes: (A) hierarchical clustergram of 76 tumors from the Sorlie et al dataset; the black branch of the tumor dendrogram denotes the wildtype-like configuration, and the red branch the mutant-like profile. Shown are Kaplan-Meier estimates for (B) disease-specific survival and (C) disease-free survival, where patient groups were determined according to the green and red branches of the tumor dendrogram in (A).


Here, the 9-gene partial classifier that could distinguish mt and wt tumors both with 77% accuracy, was used to hierarchically cluster 76 well-sampled tumor specimens with associated patient survival information (FIG. 5A). Importantly, the majority of these tumors (>80%) are derived from two independent prospective studies on chemotherapeutic response of stage III patients with locally advanced breast cancer (T3/T4 and/or N2). The tumors clustered into two predominant branches with 31 tumors in the wt-like cluster and 44 tumors in the mutant-like cluster. Grouping the patients according to these tumor profiles, the Kaplan-Meier survival curves for disease-specific and disease-free survival (FIGS. 5B& C) were both highly significant in this cohort (pw=0.00008 (DSS) and pw=0.00005 (DFS)). Remarkably, the 31 patients in the p53 wt-like cluster showed a 90% probability of surviving their breast cancer for a period of 7 years compared to a 35% probability of 7-year survival for the 44 patients in the p53 mt-like group (FIG. 5B). Thus, in this predominantly stage III patient population, the partial classifier can accurately predict not only which patients will relapse and die, but also which late stage patients will survive their cancer.


For hierarchical cluster analysis, log expression values were mean centered and normalized, and genes and tumors were clustered using the Pearson correlation metric and average linkage (Cluster and TreeView software courtesy Dr. Michael Eisen; software available on Lawrence Berkeley National Laboratory, UC Berkeley's website). For survival analysis, patients were stratified according to the p53 classifier output or, as in one case, according to p53 mutation status. The Kaplan Meier estimate was used to compute survival curves for the different patient groups and the Wald Test was used to assess the statistical significance of the resultant hazard ratio. The FIG. 4 survival analysis assesses the probability of achieving, by chance alone, the more significant Wald p-value of 0.00057 generated using the group assignments as determined by the p53 classifier (panel B) compared to p=0.012 using p53 status alone (panel A). In 100,000 iterative runs, 40 tumors were randomly selected (ie, the number of tumors that differed in group assignment between panel A and B), their p53 status inverted, and the Wald p-values computed for each run. A p-value ≦0.00057 was obtained only 564 times. The Monte Carlo p-value for this observation is estimated to be 0.0046.


For association tests (i.e., to ascertain the significance of the number of observed events in two or more groups), the Chi-square test was employed. When the number of events was sufficiently small (<5) in any category, Fisher's Exact test was applied instead of Chi-square test.


For the statistical analysis of expression levels for p53 downstream target genes and upstream effectors, two-tailed two-group T tests were employed to determine differentially expressed genes between the p53 wt and mt tumors (FIG. 8). One-tailed two group t-tests were performed for comparisons between the p53 wt tumors in the mt-like class and the p53 wt tumors in the wt-like class (and vice versa) to test whether the genes were significantly differentially expressed in the same direction (or opposite direction) as that observed between the p53 wildtypes and mutants.


It would be evident to one of skill in the art that the method embodiments of the present invention are not limited to the statistical methods disclosed herein. Embodiments of the present invention encompass equivalent analytical methods. The p-value abbreviations used herein include:

    • pwr=Wilcoxon rank-sum test
    • pt=T test
    • pcs=Chi-square test
    • pfe=Fisher's Exact test
    • pw=Wald test
    • pmc=Monte Carlo estimate


Promoter analysis for p53 binding sites was performed on each of the classifier genes with a known transcription start site (TSS). BEARR (Vega, V. B., Bangarusamy, D. K., Miller, L. D., Liu, E. T. & Lin, C. Y. BEARR: Batch Extraction and Analysis of cis-Regulatory Regions. Nucleic Acids Res 32, W257-60 (2004), incorporated herein by reference) was used to extract promoter sequences (3000 bp upstream to 500 bp downstream of the TSS) and predict putative binding sites using the P53 position weight matrix obtained from TRANSFAC (Kel, A. E. et al. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31, 3576-9 (2003), incorporated herein by reference) version 6.0 (Matrix accession: M00272) as well as simple pattern search based on the canonical p53 binding site consensus 5′-RRRCWWGYYYN(0-13)RRRCWWGYYY-3′ (el-Deiry, W. S., Kern, S. E., Pietenpol, J. A., Kinzler, K. W. & Vogelstein, B. Definition of a consensus binding site for p53. Nat Genet 1, 45-9 (1992), incorporated herein by reference.


Example 5
The p53-Deficiency Classifier, but not P53 Status Alone, is Significantly Correlated with Outcome in Endocrine-Treated Patients

To further test the robustness of the classifier in predicting patient outcome, its performance in other relevant therapeutic treatment groups was analyzed. Recently, it has been observed that p53 mt breast tumors show greater resistance to endocrine therapy than p53 wt tumors, and this has been explained, in part, by the uncoupling of p53-dependent apoptosis in the resistant tumors (Berns, E. M. et al. Complete sequencing of TP53 predicts poor response to systemic therapy of advanced breast cancer. Cancer Res 60, 2155-62 (2000), incorporated herein by reference). To test the ability of the classifier to predict outcome in a hormone therapy-specific patient cohort, a subpopulation of the Uppsala cohort consisting of 68 ER+ patients who received only adjuvant tamoxifen treatment following surgery, was examined. FIG. 6 shows that the p53 classifier has greater prognostic significance than p53 mutation status in endocrine-treated patients. Sixty-eight ER+, endocrine-treated patients were classified according to (A) p53 mutation status or (B) the p53 classifier and analyzed for correlations with disease-specific survival (DSS). Kaplan-Meier survival estimates are shown. As shown in the survival analysis in FIGS. 6A&B, it was observed that the classifier was a significant predictor of disease-specific survival (pw=0.047), while p53 mutation status alone was not (pw=0.395).


Next, the prognostic performance of the classifier on a set of 97 breast tumors published by van't Veer et al (van't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-6 (2002), incorporated herein by reference) was examined. FIG. 7 shows that the p53 classifier is prognostic of distant recurrence in an independent set of early-stage locally-treated breast tumors. 97 tumors from a Dutch cohort (van't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-6 (2002), incorporated herein by reference) of early-stage patients treated with postoperative adjuvant radiotherapy and followed for a period of at least 5 years were hierarchically clustered using a set of probes corresponding to 21 genes of the optimized classifier. The predominant cluster nodes are demarcated by color and “C” designations (i.e., C1-C5). Black arrows correspond to tumors from patients who developed a distant metastasis (DM) within 5 years. Gene symbols and corresponding Genbank accession numbers are shown. Hierarchical clustering was performed as described previously.


Here, all of the samples were controlled for clinical uniformity, i.e., <5 cm in size (T1/T2), with no advanced disease (pN0), from patients less than 55 years of age at diagnosis, treated by surgery and subsequent radiotherapy only (with the exception of 5 patients who received adjuvant systemic therapy). From the 32-gene classifier, 24 probes corresponding to 21 genes could be mapped to all 97 tumors with survival information. Upon clustering the tumors, approximately 4 clusters with similar average distance correlations were observed that significantly distinguished patients who would develop a distant metastasis within 5 years (pfe=2.2×10−4) (FIG. 7). Notably, of the 26 tumors in cluster 1, which bear the molecular configuration of p53 mt-like tumors, 73% had a distant metastasis within 5 years, compared to 26% of 39 tumors in cluster 3, which most closely resemble the p53 wt-like molecular configuration. These findings suggest that the p53 classifier is prognostic of tumor recurrence in early stage, locally-treated breast cancer.


Example 6
Analysis of Classifier Gene Functions

To gain some mechanistic insights, the functional annotations of the classifier genes were analysed for clues to explain the correlation between their expression levels and p53 status and patient outcome. Surprisingly, it was found that none of the classifier genes are known transcriptional targets of p53, nor have they been previously implicated in the p53 pathway. Promoter analysis of the 21 genes with defined promoter regions revealed no evidence of the canonical p53 binding site, or recently described novel p53 binding sites, within any of the promoters.


Twelve of the genes are of unknown function. However, of the characterized genes, a number are associated with cell growth and proliferation (MYBL2, TFF1, BRRN1, CHAD, SCGB3A1, DACH, CDCA8), transcription (LAF4, NY-BR-1, DACH, MYBL2), ion transport (CACNG4, CYBRD1, LRP2), and breast cancer biology (SCGB3A1, TFF1, STC2, NY-BR-1, AGR2). Speculatively, some of these genes may contribute mechanistically to the poor prognosis of the p53 mutant-like tumors. For example, MYBL2, which was observed to be upregulated in the p53 mutant-like tumors, is a growth-promoting transcription factor closely related to the c-MYB oncogene. It maps to a chromosomal region frequently amplified in breast cancer (20q13) and has previously been reported to be overexpressed in breast cancer cell lines and sporadic ovarian carcinomas (Forozan, F. et al. Comparative genomic hybridization analysis of 38 breast cancer cell lines: a basis for interpreting complementary DNA microarray data. Cancer Res 60, 4519-25 (2000) and Tanner, M. M. et al. Frequent amplification of chromosomal region 20q12-q13 in ovarian cancer. Clin Cancer Res 6, 1833-9 (2000), both of which are incorporated herein by reference. SCGB3A1 (HIN1), which was observed to be downregulated in the p53 mutant-like tumors, is a putative tumor suppressor gene that can inhibit breast cancer cell growth when overexpressed and has been found to be transcriptionally silenced by hypermethylation of its promoter in early stages of breast tumorigenesis (Krop, I. E. et al. HIN-1, a putative cytokine highly expressed in normal but not cancerous mammary epithelial cells. Proc Natl Acad Sci USA 98, 9796-801 (2001), incorporated herein by reference).


Example 7
Nature of Misclassified Tumors

It was observed that a number of cancers with wild type p53 sequence status were classified as p53 mutant by expression profiling using the 32-gene classifier. If the “misclassified” p53 wt tumors were in fact p53 deficient, they would possess certain molecular characteristics reflective of perturbations of the p53 pathway, and these characteristics would be found in the majority of p53 mutant tumors. First, the possibility that p53 deficiency could result from reduced transcript levels either by transcriptional repression of the p53 gene (TP53) or by the shortening of its mRNA half-life, was considered. The t test was used to compare the relative expression levels of TP53 (using the TP53 probe-sets present on the microarray) among the different tumor classes (FIG. 8). Indeed, consistent with this hypothesis, it was observed that the overall expression level of TP53 was significantly reduced in the 28 wt tumors classified as mt-like compared to the remaining 170 wt tumors classified as wt-like (pt=1.4×10−04). No statistically significant difference in expression levels was observed between the p53 mt tumors correctly classified as mt-like and all wt tumors, consistent with the fact that TP53 mRNA levels are not commonly reduced in p53 mutant breast cancers.



FIG. 8 shows that transcript levels of p53, its transcriptional targets, and its upstream effectors distinguish known and predicted classes. Expression levels of p53 pathway-relevant genes were examined. The statistical significance of transcript levels between the different tumor classes was determined by t test and is shown in a summary table to the right of the figure. The 4 tumor classes are as follows: 1) 47 p53 mt tumors classified as mutant, 2) 28 p53 wt tumors classified as mutant, 3) 170 p53 wt tumors classified as wildtype, and 4) 12 p53 mt tumors classified as wildtype. Statistical measurements in the summary shown in grey did not reach significance at p<0.05.


Table 3 shows a comparative analysis of p53 mutations. (I) Severe mutations were defined as insertions, deletions, or stop codons. Of the remaining missense point mutations (mpms; 11 in the wt-like group, 27 in the mt-like group) we determined the frequency of occurrence of (II) the most common missense point mutations in p53 as defined by the IARC TP53 Mutation Database (available online on the website of the International Agency for Research on Cancer, IARC), and (III) mutants previously shown, in vitro, to possess dominant negative activity were determined. P-values were calculated using Fisher's Exact test.


This strategy was applied to known transcriptional targets of p53, which were hypothesized to show altered transcription in p53-deficient tumors to some extent. Indeed, a number of p53 target genes demonstrated altered patterns of expression (FIG. 8). The TP53-inducible genes TP53INP1, SEMA3B, PMAIP1 (NOXA), FDXR, CCNG1, and LRDD, all of which contain functional p53-binding sites in their promoters, showed significantly lower expression in the 28 wt tumors classified as mt-like compared to the other wildtypes (all at pt<0.05). Moreover, all but one of these genes were also significantly reduced in the p53 mt tumors classified as mt-like (compared to all wt tumors); and in all but two cases, these genes showed significantly higher expression in the 12 mt tumors classified as wt-like when compared to the other mutants.


CHEK1 and CHEK2, both positive upstream effectors of p53 that phosphorylate p53 and thereby promote its stabilization, are known to be transcriptionally repressed by p53. A significant increase in the mRNA levels of these genes in both the p53 wt and mt tumors of the mutant-like class was observed. It was also observed that the 12 mt tumors misclassified as wildtype-like displayed significantly lower expression of these genes compared to the other 47 p53 mutants. Notably, no differential expression of the p53-regulated genes CDKN1A (p21), GADD45, PPM1D (WIP1), TP5313 (PIG3), TNFRSF6, BBC3 (PUMA), APAF1 or BCL2 was observed in these breast tumor specimens.


Taken together, these data suggest that the classifier can distinguish tumors based on some aspects of p53 transcriptional activity that are inhibited in both the p53 mutant and wildtype tumors of the mutant-like class, yet operative in the p53 wildtype tumors (and to some extent the 12 p53 mutant tumors) of the wildtype-like class.


Perhaps paradoxically, it was observed that the p53-inducible genes PERP, BAX and SFN (14-3-3 sigma) were all expressed at significantly higher levels in the 28 misclassified wt tumors, rather than at lower levels like their inducible gene counterparts described above. However, the significant overexpression of these genes in the p53 mt tumors classified as mutant-like was also observed, suggesting that in breast cancer, these genes may be induced by alternate regulatory mechanisms in the context of mutant or deficient p53.


Intriguingly, another positive upstream effector of p53, ATR, which is thought to enhance p53 activity in a manner similar to that of CHEK1 and CHEK2, was also found expressed at significantly higher levels in the p53 mutants and p53 wt tumors of the mutant-like class, even though this gene is not known to be modulated in a p53-dependent manner. Of note, no significant differences in the expression levels of the upstream effectors, ATM or PRKDC (DNA-PK) were observed.


The expression levels of other upstream modulators of p53 activity were then examined in order to ascertain possible alternate mechanisms by which p53 expression and activity might be reduced in the mutant-like p53 wt tumors. First, it was observed that several known positive regulators of p53 transactivation were significantly reduced in both the wildtypes and mutants of the mutant-like class including HOXA5, USF1, EGR1 and TP53BP1. HOXA5, USF1, and EGR1 are all transcription factors known to bind the p53 promoter and enhance its expression. Interestingly, deficiencies in all three have previously been implicated in breast carcinogenesis. Recently the coordinate loss of both p53 and HOXA5 mRNA and protein expression was observed in a panel of human breast cancer cell lines, and the HOXA5 promoter was found to be methylated in 16 of 20 p53-negative human breast tumors. USF1, which is structurally related to the c-Myc oncoprotein, has been found to have reduced transcriptional activity in breast cancer cell lines, and has recently been shown to activate the expression of estrogen receptor alpha. EGR1, a DNA damage-responsive gene with antiproliferative and apoptotic functions, can inhibit tumorigenicity when exogenously expressed in human breast cancer cells, and has been observed to have reduced expression in human and mouse breast cancer cell lines and tumors. TP53BP1 is not thought to be a transcription factor, but rather a BRCT domain-containing substrate of ATM that is phosphorylated in response to DNA damage. This gene product is known to bind the central DNA-binding domain of p53 and thus enhance the transcriptional activation of p53 target genes. A significantly reduced expression of all four genes in the 28 p53 wt tumors classified as mutant-like was found, and in the cases of USF1 and TP53BP1, significantly higher expression in the p53 mutants classified as wildtype-like. Interestingly, it was also observed that their expression levels are also significantly lower in the 47 p53 mt tumors classified as mutant-like, suggesting a possible positive feedback loop whereby wildtype p53 can enhance expression of these genes and impaired p53 cannot. Together, these observations suggest the possibility that either acting separately or in combination, these genes may be important for intact p53 activity in the breast, and when transcriptionally silenced, contribute to p53 deficiency.


Finally, the expression of several known negative regulators of p53 activity were examined. Notably, MDM2, which negatively regulates p53 through phosphorylation-mediated degradation of the p53 protein, and whose overexpression at the protein level has been implicated in a variety of cancers, was not found to be differentially expressed at the transcript level in the experiments described herein. However, both PLK1 and GTSE1 were. The M-phase regulator PLK1 has recently been shown to bind to the DNA-binding domain of p53 and thus inhibit its transcriptional activity in vitro. GTSE1 (B99) binds the C-terminal regulatory domain of p53 causing the inhibition of p53 transactivation function as well as a reduction of intracellular levels of p53 protein. Intriguingly, the transcript levels of both genes were among the most highly significantly overexpressed in both p53 wt and mt tumors of the mt-like class, suggesting a possible role for these gene products in suppression of p53 function in breast carcinogenesis.


The spectrum of p53 mutations for correlations that might explain the misclassification of the 12 p53-mutant tumors as wildtype-like was next analyzed. First, it was observed that only one mutation was common to the wildtype-like and the mutant-like tumors: a Tyr>Cys at amino acid 220 in the DNA-binding domain. Of the 47 p53 mt tumors correctly classified as mutants, it was observed that 42% (20/47) possessed “severe” mutations defined as insertions (n=2), deletions (n=11) and stop codons (n=7) (Table 3-I) resulting in frameshifts and subsequent trunctation, whereas in the 12 mutants classified as wildtype-like, only 1 (8%) contained a severe mutation: a 3-bp insertion in the DNA-binding domain resulting in the inframe addition of a glycine residue (pfe=0.025). Using the IARC TP53 Mutation Database (available online on the website of the International Agency for Research on Cancer, IARC), which, as of June 2003, has indexed 18,585 somatic and 225 germline mutations of p53, the frequencies of occurrence of the most common p53 mutations in human cancer (representing ˜20% of all p53 mutations; Table 1-II) in the 12 wt-like mutants and the 47 mt-like mutants were compared. None of the common mutations were found to overlap with the subset of 11 missense point mutations (mpms) in the wt-like group, compared to 9 of 27 in the mt-like group (pfe=0.029). The mpms in each tumor group was then cross-compared with the IARC TP53 Mutation Database's comprehensive listing of 418 mutants previously analyzed for dominant negative function in at least one of 44 previously published studies. As Table 2-III shows, it was found that only one of the 11 mpms among the 12 wt-like mutants had been demonstrated previously to have dominant negative activity, compared to 12 of 27 within the mt-like group (pfe=0.039). Together, these data suggest that at the sequence level, the 12 p53 mutants classified as wildtype-like may in fact comprise of mostly “benign” p53 mutant forms compared to those 47 classified as mutant-like, in agreement with their molecular consistencies with the majority of p53 wt tumors in our expression analyses.

TABLE 312 wt-likemutation typetumors47 mt-like tumorsp-value:I. severe mutations:1200.025deletions011stop codons07insertions12(11 tumors with(27 tumorsmpms)with mpms)II. Common missense090.029pt. mutations:175 (Arg->His)02248 (Arg->Gln)03248 (Arg->Trp)02273 (Arg->His)00273 (Arg->Cys)02282 (Arg->Trp)00III. pt. mutations with1120.039known dominantnegative function:


Comparative analysis of p53 mutations. (I) Severe mutations were defined as insertions, deletions, or stop codons. Of the remaining missense point mutations (mpms; 11 in the wt-like group, 27 in the mt-like group) we determined the frequency of occurrence of (II) the most common missense point mutations in p53 as defined by the IARC TP53 Mutation Database (http://www.iarc.fr/p53/index.html), and (III) mutants previously shown, in vitro, to possess dominant negative activity. P-values were calculated using Fisher's Exact test.


The practice of the present invention may employ conventional biology methods known to the skilled artisan, software and systems. The foregoing examples have described methods for predicting disease outcome in a patient. In another aspect, there is also provided a computer system for predicting disease outcome in a patient. The computer system may comprise a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


A suitable computer system may be a general purpose computer such as a PC or a Macintosh, for example. Computer software products of the invention typically include a computer readable medium having computer-executable instructions for performing the logic steps of the method of the invention. Suitable computer readable media include floppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM, magnetic tapes etc. The computer executable instructions may be written in a suitable computer language or a combination of several languages. Basic computational biology methods are described in, e.g. Setubal and Meidanis et al., Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics: Application in Biological Science and Medicine (CRC Press, London, 2000) and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc., 2nd Ed., 2001).


Additionally, the present invention may have preferred embodiments that include methods for providing genetic information over networks such as the Internet.


Additionally, some embodiments of the present invention may provide a plurality of pharmaceutical targets for designing chemotherapeutic drugs for a variety of cancers. For example, the 32 genes most correlated with p53 mutational status could serve as potential molecular targets for chemotherapy. Chemotherapy drugs (cytotoxics) and antihormonal treatments are commonly used to treat cancers. In several patients however, treatment regimens involving cytotoxics and antihormonals have been known to cause mild to severe side effects. In breast cancer for example, these side effects include vomiting, nausea, alopecia and fatigue. The future of effective treatment for cancer thus resides with drugs that are more specific for their targets. According to some studies, about 68% of breast cancer drugs in the clinical developmental pipeline are of the targeted class. Therefore, molecular signatures such as those embodied in certain aspects of the present invention will provide important leads or will prove to be targets in their own right for targeted chemotherapeutic drugs.


In conclusion, the disclosed embodiments of the present invention define a gene expression signature a gene expression signature that can predict p53 status and survival in human breast tumours (the p53 signature or classifier). In independent datasets of both breast and liver cancers, and regardless of other clinical features, subsets of the p53 signature can predict p53 status with significant accuracy. As a predictor of disease-specific survival (DSS), the signature significantly outperformed p53 mutation status alone in a large patient cohort with heterogeneous treatment. The p53 signature could significantly distinguish patients having more or less benefit from systemic adjuvant therapies and loco-regional radiotherapy. Though the p53 pathway may be compromised at some level in most human cancers, analysis of transcripts involved in the p53 pathway suggests that the p53 expression signature defines an operational configuration of this pathway in breast tumors (more so than p53 mutation status alone) that impacts patient survival, and therapeutic response. In cancer, it is clear that not all p53 mutations have equal effects: some simply confer loss of function, while others have a dominant negative effect (such as trans-dominant suppression of wildtype p53 or oncogenic gain of function), while still others show only a partial loss of function where, for example, only a small subset of p53 downstream transcriptional target genes are dysregulated. For these reasons, no single molecular assessment of p53 status appears to provide an absolute indication of the complete p53 function. The embodiments disclosed herein suggest that by looking at the downstream indicators of p53 function, the functional status of p53 may be ascertained more precisely than using sequencing or biochemical means.


It is to be understood that the above description in intended to be illustrative and not restrictive. Many variations of the invention will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. All cited references, including patent and non-patent literature, are incorporated herewith by reference in their entireties for all purposes.

Claims
  • 1. A method for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.
  • 2. The method of claim 1 wherein disease outcome is selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response.
  • 3. The method of claim 1 wherein the disease is breast cancer.
  • 4. The method of claim 1 wherein the disease is liver cancer.
  • 5. The method of claim 1 wherein predicted p53 mutational status is obtained by ranking the differentially expressed genes according to their association with p53 mutational status, ER status and histologic grade of the tumor.
  • 6. The method of claim 5 wherein the genes are ranked according to a multivariate ranking procedure.
  • 7. The method of claim 6 wherein the multivariate ranking procedure is Linear Model-Fit.
  • 8. The method of claim 5 wherein predicted p53 mutational status is obtained by employing a supervised learning method.
  • 9. The method of claim 8 wherein the supervised learning method is Diagonal Linear Discriminant Analysis.
  • 10. The method of claim 1 wherein the set of genes to predict p53 mutational status comprise at least 3 genes.
  • 11. The method of claim 10 wherein the set of genes to predict p53 mutational status comprise 3-500 genes.
  • 12. The method of claim 11 wherein the set of genes to predict p53 mutational status comprise 32 genes.
  • 13. The method of claim 12 wherein the 32 genes are selected from the group comprising the list of genes in Table 1.
  • 14. The method of claim 13 wherein the 32 genes include genes with sequences selected from the group consisting of SEQ ID NO: 23, SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 26, SEQ ID NO: 21, SEQ ID NO: 30, SEQ ID NO: 27, SEQ ID NO: 8, SEQ ID NO: 2, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 6, SEQ ID NO: 18, SEQ ID NO: 24, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 32, SEQ ID NO: 28, SEQ ID NO: 5, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 15, SEQ ID NO: 7, SEQ ID NO: 4, SEQ ID NO: 3, SEQ ID NO: 12, and SEQ ID NO: 19.
  • 15. A method for predicting disease outcome in a late-stage breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the late-stage breast cancer patient wherein the sequences encoding the set of genes are selected from the group consisting of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5 and SEQ ID NO: 25.
  • 16. A method for predicting clinical outcome in an early-stage, locally-treated breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the early-stage, locally-treated breast cancer patient wherein the set of genes are selected from the genes with sequences selected from the group consisting of SEQ ID NO: 23, SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 26, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 10, SEQ ID NO: 32, SEQ ID NO: 28, SEQ ID NO: 5, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 15, SEQ ID NO: 7, and SEQ ID NO: 3.
  • 17. A method for predicting clinical outcome in a liver cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the liver cancer patient wherein the set of genes have sequences selected from the group consisting of SEQ ID NO:31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 1, SEQ ID NO:20, SEQ ID NO: 24, SEQ ID NO: 28, and SEQ ID NO: 5.
  • 18. A method of identifying a group of genes for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; ranking the differentially expressed genes according to their ability to predict p53 mutational status; training the ranked genes to distinguish between mutant and wildtype p53 gene expression profiles; obtaining a p53 classifier including a set of genes capable of predicting p53 mutational status; validating the p53 classifier in independent datasets; and assessing the ability of the p53 classifier to predict disease outcome in the patient.
  • 19. The method of claim 18 wherein the differentially expressed genes are ranked by a multivariate ranking procedure according to their association with p53 status, ER status and histologic grade of the tumor.
  • 20. The method of claim 19 wherein the multivariate ranking procedure is a Linear Model-Fit.
  • 21. The method of claim 18 wherein the step of training comprises employing a supervised learning method.
  • 22. The method of claim 21 wherein the supervised learning method is a Diagonal Linear Discriminant Analysis.
  • 23. The method of claim 18 wherein the p53 classifier comprises at least 3 genes.
  • 24. The method of claim 23 wherein the p53 classifier comprises 3-500 genes.
  • 25. The method of claim 24 wherein an optimized p53 classifier comprises 32 genes.
  • 26. The method of claim 25 wherein the optimized p53 classifier includes genes with sequences selected from the group consisting of SEQ ID NO: 23, SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 26, SEQ ID NO: 21, SEQ ID NO: 30, SEQ ID NO: 27, SEQ ID NO: 8, SEQ ID NO: 2, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 6, SEQ ID NO: 18, SEQ ID NO: 24, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 32, SEQ ID NO: 28, SEQ ID NO: 5, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 15, SEQ ID NO: 7, SEQ ID NO: 4, SEQ ID NO: 3, SEQ ID NO: 12, and SEQ ID NO: 19.
  • 27. The method of claim 18 wherein disease outcome is selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response.
  • 28. The method of claim 27 wherein a 9-gene partial classifier can predict clinical outcome in a late-stage breast cancer patient.
  • 29. The method of claim 28 wherein the 9-gene partial classifier includes genes with sequences selected from the group consisting of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5, and SEQ ID NO: 25.
  • 30. The method of claim 27 wherein a 21-gene partial classifier can predict clinical outcome in an early-stage, locally-treated breast cancer patient.
  • 31. The method of claim 30 wherein the 21-gene partial classifier includes genes with sequences selected from the group consisting of SEQ ID NO: 23, SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 26, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 10, SEQ ID NO: 32, SEQ ID NO: 28, SEQ ID NO: 5, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 15, SEQ ID NO: 7, and SEQ ID NO: 3.
  • 32. The method of claim 27 wherein a 8-gene partial classifier can predict clinical outcome in a liver cancer patient.
  • 33. The method of claim 32 wherein the 8-gene partial classifier includes genes with sequences selected from the group consisting of SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 1, SEQ ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 28, and SEQ ID NO: 5.
  • 34. A computer system for predicting disease outcome in a patient, the computer system comprising: a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.
  • 35. A diagnostic tool for predicting disease susceptibility in a patient comprising a plurality of genes capable of predicting p53 mutational status immobilized on a solid support.
  • 36. The diagnostic tool of claim 35 wherein the solid support is a microarray.
  • 37. The diagnostic tool of claim 35 wherein the plurality of genes include genes selected from the group consisting of GenBank accession numbers: SEQ ID NO: 23, SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 26, SEQ ID NO: 21, SEQ ID NO: 30, SEQ ID NO: 27, SEQ ID NO: 8, SEQ ID NO: 2, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 17, SEQ ID NO: 20, SEQ ID NO: 6, SEQ ID NO: 18, SEQ ID NO: 24, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 32, SEQ ID NO: 28, SEQ ID NO: 5, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 15, SEQ ID NO: 7, SEQ ID NO: 4, SEQ ID NO: 3, SEQ ID NO: 12, and SEQ ID NO: 19.
  • 38. The diagnostic tool of claim 37 wherein the plurality of genes include genes with sequences selected from the group consisting of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5, and SEQ ID NO: 25.
  • 39. The diagnostic tool of claim 37 wherein the plurality of genes include genes with sequences selected from the group consisting of SEQ ID NO: 23, SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 26, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 10, SEQ ID NO: 32, SEQ ID NO: 28, SEQ ID NO: 5, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 15, SEQ ID NO: 7, and SEQ ID NO: 3.
  • 40. The diagnostic tool of claim 37 wherein the plurality of genes include genes with sequences selected from the group consisting of SEQ ID NO: 31, SEQ ID NO: 14, SEQ ID NO: 11, SEQ ID NO: 1, SEQ ID NO: 20, SEQ ID NO: 24, SEQ ID NO: 28, and SEQ ID NO: 5.
  • 41. A nucleic acid array for predicting disease susceptibility in a patient comprising a solid support and displayed thereon nucleic acid probes corresponding to genes capable of predicting p53 mutational status in the patient.
  • 42. The nucleic acid array of claim 41 comprising at least 8 nucleic acid probes.
  • 43. The nucleic acid array of claim 42 comprising at least 32 nucleic acid probes.
  • 44. The nucleic acid array of claim 43 comprising at least 500 nucleic acid probes.