Methods, systems, and compositions for classification, prognosis, and diagnosis of cancers

Information

  • Patent Grant
  • 8065093
  • Patent Number
    8,065,093
  • Date Filed
    Wednesday, October 6, 2004
    19 years ago
  • Date Issued
    Tuesday, November 22, 2011
    12 years ago
Abstract
The present invention provides methods, systems and compositions for predicting disease susceptibility in a patient. In some embodiments, methods for the classification, prognosis, and diagnosis of cancers are provided. In other embodiments, the present invention provides statistical methods for building a gene-expression-based classifier that may be employed for predicting disease susceptibility in a patient, for classifying carcinomas, and for the prognosis of clinical outcomes.
Description

The application contains a sequence listing which has been submitted via CD-R in lieu of a printed paper copy, and is hereby incorporated by reference in its entirety. The CD-R, recorded on Mar. 9, 2005, are labeled “CRF,” “Copy 1,” and “Copy 2” and each contains only one identical 1.95 MB file (38271767.APP).


FIELD OF THE INVENTION

The present invention relates generally to systems, compositions, and methods for predicting disease susceptibility in a patient.


BACKGROUND

Mutations in p53 are thought to occur in more than 50% of human cancers and are most frequently observed in the DNA binding and transactivation domains, underscoring the importance of its transcriptional activity in suppressing tumor development. In sporadic breast cancers, unlike most cancer types, p53 mutations are only observed in approximately 20% of cases. However, that breast cancer is frequently observed in individuals with germline mutations of p53 (i.e., Li-Fraumeni syndrome) suggests a particularly important role for p53 inactivation in breast carcinogenesis, and perhaps a similarly important role for other factors capable of compromising p53 function.


For example, the reduced transcriptional activation of p53 following hypermethylation and subsequent inhibition of the HOXA5 transcription factor has recently been implicated as a possible epigenetic mechanism in reducing p53 expression in breast cancers. In both breast tumors and other cancer types, amplification and overexpression of the MDM2 gene, whose product promotes p53 degradation, has been implicated in oncogenesis. Moreover, both deletion and epigenetic silencing of the p14ARF gene, a negative regulator of MDM2, has been observed in various cancer types. Thus, p53 deficiency in breast carcinogenesis can potentially arise from a number of mechanisms other than p53 gene mutation.


There is evidence that the p53 status has prognostic significance in a number of cancer types and in particular breast cancer. In breast cancer, p53 mutations confer worse overall and disease-free survival, and a higher incidence of tumor recurrence, independent of other risk factors. Recent evidence suggests that p53 inactivation renders breast tumors resistant to certain DNA-damaging chemotherapies and endocrine therapies presumably through loss of p53-dependent apoptosis.


However, in all of these studies, the prognostic capability and degree of therapeutic resistance of the p53 mutants was found to depend largely on mutant-specific attributes, such as the type of mutations or the precise domain in which the mutation occurs. Importantly, this latter observation is consistent with findings from previous studies showing that not all p53 mutations have equal effects: some simply confer loss of function, while others have a dominant negative effect (such as trans-dominant suppression of wildtype p53 or oncogenic gain of function), while still others show only a partial loss of function where, for example, only a small subset of p53 downstream transcriptional target genes are dysregulated. For these reasons, no single molecular assessment of p53 status appears to provide an absolute indication of the complete p53 function.


There is a need for methods that better assess the effects of different p53 mutations on cell function in general and gene expression in particular, in an effort to enable better cancer prognosis and diagnosis.


SUMMARY

Accordingly, the present invention provides methods, systems, and compositions that provide a more useful measure of in vivo p53 functionality. These methods, systems, and compositions may be employed for the classification, prognosis, and diagnosis of cancers.


In one aspect of the present invention there is provided a method for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


In another aspect of the present invention there is provided a method for predicting disease outcome in a late-stage breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the late-stage breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: BG271923 (SEQ ID NO: 22), NM002466 (SEQ ID NO: 31), D38553 (SEQ ID NO: 11), NM000909 (SEQ ID NO: 9), NM024843 (SEQ ID NO: 1), R73030 (SEQ ID NO: 29), NM003226 (SEQ ID NO: 28), AW299538 (SEQ ID NO: 5) and AI990465 (SEQ ID NO: 25).


In yet another aspect of the present invention there is provided a method for predicting clinical outcome in an early-stage, locally-treated breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the early-stage, locally-treated breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: AI961235 (SEQ ID NO-23), BG271923 (SEQ ID NO: 22), NM002466 (SEQ ID NO: 31), BC001651 (SEQ ID NO: 14), D38553 (SEQ ID NO: 11), AK000345 (SEQ ID NO: 26), BC004504 (SEQ ID NO: 8), NM000909 (SEQ ID NO: 9), NM024843 (SEQ ID NO: 1), R73030 (SEQ ID NO: 29), AI435828 (SEQ ID NO: 20), AI810764 (SEQ ID NO: 24), AI922323 (SEQ ID NO: 10), NM003225 (SEQ ID NO: 32), NM003226 (SEQ ID NO: 28), AW299538 (SEQ ID NO: 5), NM003462 (SEQ ID NO: 16), AI990465 (SEQ ID NO: 25), NM004392 (SEQ ID NO: 15), NM001267 (SEQ ID NO: 7) and AI826437 (SEQ ID NO: 3).


In a further aspect of the present invention there is provided a method for predicting clinical outcome in a liver cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the liver cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: NM002466 (SEQ ID NO: 31), BC001651 (SEQ ID NO: 14), D38553 (SEQ ID NO: 11), NM024843 (SEQ ID NO: 1), AI435828 (SEQ ID NO: 20), AI810764 (SEQ ID NO: 24), NM003226 (SEQ ID NO: 28) and AW299538 (SEQ ID NO: 5).


In a still further aspect of the present invention there is provided a method of identifying a group of genes for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; ranking the differentially expressed genes according to their ability to predict p53 mutational status; training the ranked genes to distinguish between mutant and wildtype p53 gene expression profiles; obtaining a p53 classifier including a set of genes capable of predicting p53 mutational status; validating the p53 classifier in independent datasets; and assessing the ability of the p53 classifier to predict disease outcome in the patient.


In another aspect of the present invention there is provided a computer system for predicting disease outcome in a patient, the computer system comprising: a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


In yet another aspect of the present invention there is provided a diagnostic tool for predicting disease susceptibility in a patient comprising a plurality of genes capable of predicting p53 mutational status immobilized on a solid support.


In a still further aspect of the present invention there is provided a nucleic acid array for predicting disease susceptibility in a patient comprising a solid support and displayed thereon nucleic acid probes corresponding to genes capable of predicting p53 mutational status in the patient.


These aspects and embodiments are described in greater detail below.


Definitions


As used in this application, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “an agent” includes a plurality of agents, including mixtures thereof.


An individual is not limited to a human being but may also be other organisms including but not limited to a mammal, invertebrate, plant, fungus, virus, bacteria, or one or more cells derived from any of the above.


As used herein the term “comprising” means “including”. Variations of the word “comprising”, such as “comprise” and “comprises”, have correspondingly varied meanings.


Throughout this disclosure, various aspects of this invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.


As used herein, the term “histologic grade” or “tumor grade” refers to characteristics of tumors classified according to the Elston-Ellis system of grading tumors.


As used herein, “p53 status” refers to the mutational status of the p53 gene. A p53 mutant tumor contains a mutation in the p53 gene that alters the function of the protein. A p53 wildtype tumor contains no detectable mutation in the p53 gene.


As used herein “Disease-specific survival” or DSS is a survival assessment where the end point being examined is death because of a disease, for example, breast cancer.


As used herein, “Disease-free survival” or DFS is a survival assessment where the end points are either tumor recurrence (i.e., the cancer comes back as the consequence of distant metastasis to other sites in the body) or death because of breast cancer without evidence of distant metastasis.


As used herein, an “array” is an intentionally created collection of molecules which can be prepared either synthetically or biosynthetically. The molecules in the array can be identical or different from each other. The array can assume a variety of formats, e.g., libraries of soluble molecules; libraries of compounds tethered to resin beads, silica chips, or other solid supports.


As used herein, a “nucleic acid library or array” is an intentionally created collection of nucleic acids which can be prepared either synthetically or biosynthetically in a variety of different formats (e.g., libraries of soluble molecules; and libraries of oligonucleotides tethered to resin beads, silica chips, or other solid supports). Additionally, the term “array” is meant to include those libraries of nucleic acids which can be prepared by spotting nucleic acids of essentially any length (e.g., from 1 to about 1000 nucleotide monomers in length) onto a substrate. The term “nucleic acid” as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides, deoxyribonucleotides or peptide nucleic acids (PNAs), that comprise purine and pyrimidine bases, or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The backbone of the polynucleotide can comprise sugars and phosphate groups, as may typically be found in RNA or DNA, or modified or substituted sugar or phosphate groups. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. The sequence of nucleotides may be interrupted by non-nucleotide components. Thus the terms nucleoside, nucleotide, deoxynucleoside and deoxynucleotide generally include analogs such as those described herein. These analogs are those molecules having some structural features in common with a naturally occurring nucleoside or nucleotide such that when incorporated into a nucleic acid or oligonucleotide sequence, they allow hybridization with a naturally occurring nucleic acid sequence in solution. Typically, these analogs are derived from naturally occurring nucleosides and nucleotides by replacing and/or modifying the base, the ribose or the phosphodiester moiety. The changes can be tailor made to stabilize or destabilize hybrid formation or enhance the specificity of hybridization with a complementary nucleic acid sequence as desired.


As used herein, the term “complementary” refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified. Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two single stranded RNA or DNA molecules are said to be complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 80% of the nucleotides of the other strand, usually at least about 90% to 95%, and more preferably from about 98 to 100% of the nucleotides of the other strand. Alternatively, complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement. Typically, selective hybridization will occur when there is at least about 65% complementarity over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, and more preferably at least about 90% complementarity.


As used herein, a “fragment,” “segment,” or “DNA segment” refers to a portion of a larger DNA polynucleotide or DNA. A polynucleotide, for example, can be broken up, or fragmented into, a plurality of segments. Various methods of fragmenting nucleic acids are well known in the art. These methods may be, for example, either chemical or physical in nature. Chemical fragmentation may include partial degradation with a DNase; partial depurination with acid; the use of restriction enzymes; intron-encoded endonucleases; DNA-based cleavage methods, such as triplex and hybrid formation methods, that rely on the specific hybridization of a nucleic acid segment to localize a cleavage agent to a specific location in the nucleic acid molecule; or other enzymes or compounds which cleave DNA at known or unknown locations. Physical fragmentation methods may involve subjecting the DNA to a high shear rate. High shear rates may be produced, for example, by moving DNA through a chamber or channel with pits or spikes, or forcing the DNA sample through a restricted size flow passage, e.g., an aperture having a cross sectional dimension in the micron or submicron scale. Other physical methods include sonication and nebulization. Combinations of physical and chemical fragmentation methods may likewise be employed such as fragmentation by heat and ion-mediated hydrolysis. See for example, Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 3rd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001) (“Sambrook et al.) which is incorporated herein by reference for all purposes. These methods can be optimized to digest a nucleic acid into fragments of a selected size range. Useful size ranges may be from 100, 200, 400, 700 or 1000 to 500, 800, 1500, 2000, 4000 or 10,000 base pairs. However, larger size ranges such as 4000, 10,000 or 20,000 to 10,000, 20,000 or 500,000 base pairs may also be useful.


As used herein, the term “hybridization” refers to the process in which two single-stranded polynucleotides bind non-covalently to form a stable double-stranded polynucleotide. The term “hybridization” may also refer to triple-stranded hybridization. The resulting (usually) double-stranded polynucleotide is a “hybrid.” The proportion of the population of polynucleotides that forms stable hybrids is referred to herein as the “degree of hybridization”. Hybridization conditions will typically include salt concentrations of less than about 1M, more usually less than about 500 mM and less than about 200 mM. Hybridization temperatures can be as low as 5° C., but are typically greater than 22° C., more typically greater than about 30° C., and preferably in excess of about 37° C. Hybridizations are usually performed under stringent conditions, i.e. conditions under which a probe will hybridize to its target subsequence. Stringent conditions are sequence-dependent and are different in different circumstances. Longer fragments may require higher hybridization temperatures for specific hybridization. As other factors may affect the stringency of hybridization, including base composition and length of the complementary strands, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH and nucleic acid composition) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium.


Typically, stringent conditions include salt concentration of at least 0.01 M to no more than 1 M Na ion concentration (or other salts) at a pH 7.0 to 8.3 and a temperature of at least 25° C. For example, conditions of 5× SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C. are suitable for allele-specific probe hybridizations. For stringent conditions, see for example, Sambrook, Fritsche and Maniatis. “Molecular Cloning A Laboratory Manual” 2nd Ed. Cold Spring Harbor Press (1989) and Anderson “Nucleic Acid Hybridization” 1st Ed., BIOS Scientific Publishers Limited (1999), which are hereby incorporated by reference in their entireties for all purposes above.


As used herein, “hybridization probes” are nucleic acids (such as oligonucleotides) capable of binding in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Nielsen et al., Science 254:1497-1500 (1991), Nielsen Curr. Opin. Biotechnol., 10:71-75 (1999) and other nucleic acid analogs and nucleic acid mimetics.


As used herein, “mRNA” or “mRNA transcripts” include, but are not limited to pre-mRNA transcript(s), transcript processing intermediates, mature mRNA(s) ready for translation and transcripts of the gene or genes, or nucleic acids derived from the mRNA transcript(s). Transcript processing may include splicing, editing and degradation. As used herein, a nucleic acid derived from an mRNA transcript refers to a nucleic acid for whose synthesis the mRNA transcript or a subsequence thereof has ultimately served as a template. Thus, a cDNA reverse transcribed from an mRNA, a cRNA transcribed from that cDNA, a DNA amplified from the cDNA, an RNA transcribed from the amplified DNA, etc., are all derived from the mRNA transcript and detection of such derived products is indicative of the presence and/or abundance of the original transcript in a sample. Thus, mRNA derived samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified from the genes, RNA transcribed from amplified DNA, and the like.


As used herein, a “probe” is a molecule that can be recognized by a particular target. In some embodiments, a probe can be surface immobilized. Examples of probes that can be investigated by this invention include, but are not restricted to, agonists and antagonists for cell membrane receptors, toxins and venoms, viral epitopes, hormones (e.g. opioid peptides, steroids, etc.), hormone receptors, peptides, enzymes, enzyme substrates, cofactors, drugs, lectins, sugars, oligonucleotides, nucleic acids, oligosaccharides, proteins, and monoclonal antibodies.


As used herein, a “target” is a molecule that has an affinity for a given probe. Targets may be naturally-occurring or man-made molecules. Also, they can be employed in their unaltered state or as aggregates with other species. Targets may be attached, covalently or noncovalently, to a binding member, either directly or via a specific binding substance. Examples of targets which can be employed by this invention include, but are not restricted to, antibodies, cell membrane receptors, monoclonal antibodies and antisera reactive with specific antigenic determinants (such as on viruses, cells or other materials), drugs, oligonucleotides, nucleic acids, peptides, cofactors, lectins, sugars, polysaccharides, cells, cellular membranes, and organelles. Targets are sometimes referred to in the art as anti-probes. As the term targets is used herein, no difference in meaning is intended. A “Probe Target Pair” is formed when two macromolecules have combined through molecular recognition to form a complex.





BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copes of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.


The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the disclosed principles of the invention:



FIG. 1 shows hierarchical clustering of 257 tumors using the top 250 genes statistically correlated with p53 status for use in one disclosed embodiment of the invention.



FIG. 2 shows optimization and results of a gene classifier for p53 status in accordance with a disclosed embodiment of the invention.



FIG. 3 shows that genes of the classifier can predict p53 status in independent cDNA microarray datasets in accordance with a disclosed embodiment of the invention.



FIG. 4 shows that the p53 classifier has greater prognostic significance than p53 mutation status alone in accordance with a disclosed embodiment of the invention.



FIG. 5 shows that the p53 classifier has strong prognostic significance in an independent dataset of late-stage tumors in accordance with a disclosed embodiment of the invention.



FIG. 6 shows that the p53 classifier has greater prognostic significance than p53 mutation status in endocrine-treated patients in accordance with a disclosed embodiment of the invention.



FIG. 7 shows that the p53 classifier is prognostic of distant recurrence in an independent set of early-stage locally-treated breast tumors in accordance with a disclosed embodiment of the invention.



FIG. 8 shows that transcript levels of p53, its transcriptional targets, and its upstream effectors distinguish known and predicted classes in accordance with a disclosed embodiment of the invention.



FIGS. 9-508 each show the Genbank ID, Unigene ID, Unigene name, and sequence corresponding to the nucleic acid sequences shown in SEQ ID NO.'s 1-500, respectively.





DETAILED DESCRIPTION

The present invention has many preferred embodiments and relies on many patents, applications and other references for details known to those of the art. Therefore, when a patent, application, or other reference is cited or repeated below, it should be understood that it is incorporated by reference in its entirety for all purposes as well as for the proposition that is recited.


Embodiments of the disclosed methods, systems, and compositions for classification, prognosis, and diagnosis of cancers will now be described. These methods, systems, and compositions provide a more useful measure of in vivo p53 functionality and thereby provide a better prognostic indicator of patient outcome as compared to p53 mutation status alone. Other advantages inherent in the disclosed embodiments of the methods, systems, and compositions will be apparent from the following description.


p53 mutations in cancer development and progression can result in trans-dominant suppression of the wild-type p53 allele conferring loss of p53 activity or an oncogenic gain of function independent of wildtype p53. Additionally, the altered activity of some effectors of p53 function, including those that directly influence p53 expression, may contribute to p53 deficiency recapitulating the p53-mutant phenotype. In breast cancer, these effects manifest in more aggressive tumors, therapeutic resistance, and poor clinical outcome.


In accordance with providing a more useful measure of in vivo p53 functionality, disclosed herein is a “p53 classifier”, an expression signature deduced from differences in the molecular configurations of p53 wildtype and mutant tumors. The classifier may comprise a defined number of genes, for example, at least 3 genes. In other embodiments, the classifier may comprise from about 3 genes to about 500 genes. Table 1 provides a listing of the 500 genes. In some embodiments, an optimized p53 classifier comprises 32 genes (Table 2). The optimized 32-gene classifier could distinguish p53 mutant and wildtype tumors with significant accuracy and could predict recurrence and survival in populations representing all therapeutic groups. Moreover, the p53 classifier was a more significant predictor of survival than p53 mutation status alone and remained significant by multivariate analysis independent of other clinical predictors where p53 mutation status did not. Furthermore, downregulation of p53 expression in the absence of mutations was sufficient to induce a mutant (mt) phenotype tumor behaviour in both transcriptional activity and clinical outcome.


In independent datasets of both breast and liver cancers, and regardless of other clinical features, subsets of the optimized p53 classifier could predict p53 status with significant accuracy. As a predictor of disease-specific survival (DSS), the classifier significantly outperformed p53 mutational status alone in both a large patient cohort with heterogeneous treatment, as well as in a set of patients who received postoperative adjuvant endocrine therapy alone.


Moreover, in an independent cDNA microarray study comprised mostly of stage 3 patients who received chemotherapy in the neoadjuvant setting, a 9-gene subset of the p53 classifier was a highly significant predictor of both disease-specific and disease-free survival. The genes of the p53 classifier could accurately discern not only which patients would relapse and die following chemotherapy, but also which late stage patients would survive their cancer.


A 21-gene subset of the classifier could also significantly distinguish molecular subgroups of early-stage radiation-treated patients who would go on to develop a distant metastasis within 5 years from those who would not.


Therefore, by defining among other aspects, a p53 classifier described herein, the methods, systems and compositions of the present invention demonstrate a much greater impact of p53 on human tumor behaviour than previously appreciated and thereby provide a better approach for clinically assessing p53 function.


One aspect of the present invention provides a method for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient. The disease outcome may be selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response. The disease may be any cancer but is preferably breast cancer or liver cancer.


The predicted p53 mutational status may be obtained by ranking the differentially expressed genes according to their association with p53 mutational status, ER (estrogen receptor) status and histologic grade of the tumor. A multivariate ranking procedure such as a Linear Model Fit may be employed to rank the genes. The ranked genes may be subjected to supervised learning to enable them to distinguish between mutant and wildtype gene expression profiles. An example of a supervised learning method that may be employed is Diagonal Linear Discriminant Analysis (DLDA).


In some embodiments, the set of genes with the ability to predict p53 mutational status may comprise at least 3 genes, preferably about 3-500 genes and most preferably about 32 genes. The 32 genes making up the optimized p53 classifier may be selected from the group comprising the list of genes in Table 1. In some embodiments, the 32 genes may include GenBank accession numbers: AI961235 (SEQ ID NO: 23), BG271923 (SEQ ID NO: 22), NM002466 (SEQ ID NO: 31), BC001651 (SEQ ID NO: 14), D38553 (SEQ ID NO: 11), AK000345 (SEQ ID NO: 26), AA742697 (SEQ ID NO: 21), AL080170 (SEQ ID NO: 30), BF245284 (SEQ ID NO: 27), BC004504 (SEQ ID NO: 8), H15261 (SEQ ID NO: 2), NM000909 (SEQ ID NO: 9), NM024843 (SEQ ID NO: 1), R73030 (SEQ ID NO: 29), NM030896 (SEQ ID NO: 17), AI435828 (SEQ ID NO: 20), AL512727 (SEQ ID NO: 6), AW242997 (SEQ ID NO: 18), AI810764 (SEQ ID NO: 24), AI922323 (SEQ ID NO: 10), AL360204 (SEQ ID NO: 13), NM003225 (SEQ ID NO: 32), NM003226 (SEQ ID NO: 28), AW299538 (SEQ ID NO: 5), NM003462 (SEQ ID NO: 16), AI990465 (SEQ ID NO: 25), NM004392 (SEQ ID NO: 15), NM001267 (SEQ ID NO: 7), AF269087 (SEQ ID NO: 4), AI826437 (SEQ ID NO: 3), AL355392 (SEQ ID NO: 12), and AU156421 (SEQ ID NO: 19).


The present invention also provides a method for predicting disease outcome in a late-stage breast cancer patient, the method comprising the steps of obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the late-stage breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: BG271923, NM002466, D38553, NM000909, NM024843, R73030, NM003226, AW299538 and AI990465. All GenBank accession numbers are associated with a sequence and a SEQ ID NO. as shown in FIGS. 9-508.


The present invention also provides a method for predicting clinical outcome in an early-stage, locally-treated breast cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the early-stage, locally-treated breast cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, BC004504, NM000909, NM024843, R73030, AI435828, AI810764, AI922323, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267 and AI826437.


The present invention also provides a method for predicting clinical outcome in a liver cancer patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the liver cancer patient wherein the set of genes are selected from the group consisting of GenBank accession numbers: NM002466, BC001651, D38553, NM024843, AI435828, AI810764, NM003226 and AW299538.


The present invention also provides a method of identifying a group of genes for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; ranking the differentially expressed genes according to their ability to predict p53 mutational status; training the ranked genes to distinguish between mutant and wildtype p53 gene expression profiles; obtaining a p53 classifier including a set of genes capable of predicting p53 mutational status; validating the p53 classifier in independent datasets; and assessing the ability of the p53 classifier to predict disease outcome in the patient.


In the above-disclosed method of identifying a group of genes for predicting disease outcome in a patient, the differentially expressed genes may be ranked by a multivariate ranking procedure according to their association with p53 status, ER (estrogen receptor) status and histologic grade of the tumor. The multivariate ranking procedure may be a Linear Model-Fit method or any other method known to one of skill in the art. The step of training may comprise employing a supervised learning method, such as Diagonal Linear Discriminant Analysis (DLDA) or any other supervised learning method known to one of skill in the art.


The p53 classifier disclosed above may comprise at least 3 genes, preferably between about 3-500 genes and more preferably about 32 genes. This 32-gene p53 classifier is an “optimized classifier” which may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, AA742697, AL080170, BF245284, BC004504, H15261, NM000909, NM024843, R73030, NM030896, AI435828, AL512727, AW242997, AI810764, AI922323, AL360204, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267, AF269087, AI826437, AL355392 and AU156421.


The disease outcome may be selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response. In one disclosed embodiment, a 9-gene partial classifier may predict clinical outcome in a late-stage breast cancer patient. The 9-gene partial classifier may include genes selected from the group consisting of GenBank accession numbers: BG271923, NM002466, D38553, NM000909, NM024843, R73030, NM003226, AW299538 and AI990465.


In another disclosed embodiment, a 21-gene partial classifier may predict clinical outcome in an early-stage, locally-treated breast cancer patient. The 21-gene partial classifier may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, BC004504, NM000909, NM024843, R73030, AI435828, AI810764, AI922323, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267 and AI826437.


In yet another disclosed embodiment, a 8-gene partial classifier may predict clinical outcome in a liver cancer patient. The 8-gene partial classifier may include genes selected from the group consisting of GenBank accession numbers: NM002466, BC001651, D38553, NM024843, AI435828, AI810764, NM003226 and AW299538.


The present invention also provides a computer system for predicting disease outcome in a patient, the computer system comprising: a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


The present invention also provides a diagnostic tool for predicting disease susceptibility in a patient comprising a plurality of genes capable of predicting p53 mutational status immobilized on a solid support. The solid support may be a microarray, for example. In one embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, AA742697, AL080170, BF245284, BC004504, H15261, NM000909, NM024843, R73030, NM030896, AI435828, AL512727, AW242997, AI810764, AI922323, AL360204, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267, AF269087, AI826437, AL355392 and AU156421. In another embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: BG271923, NM002466, D38553, NM000909, NM024843, R73030, NM003226, AW299538 and AI990465. In yet another embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: AI961235, BG271923, NM002466, BC001651, D38553, AK000345, BC004504, NM000909, NM024843, R73030, AI435828, AI810764, AI1922323, NM003225, NM003226, AW299538, NM003462, AI990465, NM004392, NM001267 and AI826437. In a still further embodiment, the plurality of genes immobilized on the solid support may include genes selected from the group consisting of GenBank accession numbers: NM002466, BC001651, D38553, NM024843, AI435828, AI810764, NM003226 and AW299538.


The present invention also provides a nucleic acid array for predicting disease susceptibility in a patient comprising a solid support and displayed thereon nucleic acid probes corresponding to genes capable of predicting p53 mutational status in the patient. The nucleic acid array may comprise at least 8, 32, 100, 250 or 500 nucleic acid probes.


Thus, the disclosed methods, systems and compositions are capable of discerning p53-deficient from p53-enabled breast tumors and may be effective in gauging p53 activity in other cancer types. As much as 14% of breast tumors that are otherwise p53 wildtype at the DNA sequence level may be deficient for p53 by other means. Moreover, the classifier is a significant predictor of disease-specific survival and recurrence in various breast cancer populations and therefore will have clinical utility in predicting these endpoints, particularly in the context of therapeutic agents that function predominantly through p53-dependent cell death pathways.


EXAMPLES
Example 1
The Molecular Configurations of p53 Mutant and p53 Wildtype Tumors are Distinct

To gain insight into the molecular variation between p53 mutant (mt) and p53 wildtype (wt) breast tumors, high-density oligonucleotide microarrays were utilized to analyze a population-based series of 257 biopsies, all of which were previously sequenced for mutations in the p53 coding regions (Bergh, J., Norberg, T., Sjogren, S., Lindgren, A. & Holmberg, L. Complete sequencing of the p53 gene provides prognostic information in breast cancer patients, particularly in relation to adjuvant systemic therapy and radiotherapy. Nat Med 1, 1029-34 (1995), incorporated herein by reference).


The original patient material consisted of freshly frozen breast tumors from a population-based cohort of 315 women representing 65% of all breast cancers resected in Uppsala County during the time period Jan. 1, 1987 to Dec. 31, 1989 (Bergh et al., previously incorporated by reference). After surgery, the viable part of the fresh tumor was cut in two; one part was immediately frozen in isopentane and stored at −70° C. until analysis, and the other was fixed in 10% formalin and prepared for histopathologic examination. Frozen tumor tissue was available from 299 of the original 315 patients. Out of these, 270 had RNA of sufficient quantity and quality for microarray experiments, and after Affymetrix quality control, expression profiles of 260 tumors were further analysed. The present study was approved by the ethical committee at the Karolinska Institute.


Mutational analysis of the p53 gene (TP53) was carried out in the original 315 tumors as described previously in Bergh et al. (previously incorporated by reference). Among the 260 tumors included in the present study, 59 had p53 mutations found by cDNA sequence analysis of exons 2 to 11 (Bergh et al., previously incorporated by reference). In three samples p53 status could not be evaluated. Clinico-pathological characteristics were derived from the patient records and from routine clinical measurements at the time of diagnosis. Estrogen receptor status was determined by ligand binding assay as part of the routine clinical procedure. An experienced pathologist determined the Elston-Ellis grades of the tumors, classifying the tumors into low, medium and high-grade tumors (Elston, C. W. & Ellis, I. O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19, 403-10 (1991), incorporated herein by reference). Axillary lymph node metastases were found in 84 of these 260 patients while 166 were node-negative. Ten patients had unknown node status, as no axillary examination was performed due to advanced age or concomitant serious disease. Systemic adjuvant therapy was offered to all node-positive patients. In general, premenopausal women were offered chemotherapy and postmenopausal women received endocrine treatment. Out of the 260 patients included in the present study, 149 did not receive adjuvant therapy. Overall survival of the patients was based on information from the Swedish population registry, and date and cause of death were obtained from a review of the patient records in late 1999.


RNA from 59 tumors known to contain p53 mutations resulting in amino acid-level alterations, and from 198 tumors known to have wildtype p53 were analyzed on Affymetrix U133A and U133B arrays.


Extraction of total RNA was carried out using the Qiagen RNeasy Mini Kit (Qiagen, Germany). Frozen tumors were cut into small pieces and homogenized for around 30-40 seconds in test tubes (maximum 40 mg/tube) containing RLT buffer (RNeasy lysis buffer) with mercaptoethanol. The mixtures were then treated with Proteinase K for 10 minutes at 55° C., which in previous RNA extractions demonstrated improved RNA yield (Egyhazi, S. et al. Proteinase K added to the extraction procedure markedly increases RNA yield from primary breast tumors for use in microarray studies. Clin Chem 50, 975-6 (2004), incorporated herein by reference). In the following centrifugation steps on RNeasy columns, DNase treatment was also included to increase the RNA quality. The integrity of the RNA extracts was tested on an Agilent 2100 Bioanalyzer (Agilent Technologies, Rockville, Md., U.S.A), measuring the 28S:18S ribosomal RNA ratio. RNA extracts of high quality were stored at −70° C. until microarray analysis.


Preparation of in vitro transcription (IVT) products (i.e., target) and oligonucleotide array hybridization and scanning were performed according to the Affymetrix protocol (Affymetrix Inc., Santa Clara, Calif., U.S.A). First-strand cDNA was synthesized from a starting amount of 2-5 μg total RNA using a T7-linked oligo-dT primer, followed by second-strand synthesis. Double-stranded cDNA was purified using phenol/chloroform extraction and phase lock gel. Biotinylated cRNA targets were prepared from the cDNA templates in IVT reactions. The labeled cRNA targets were purified using Qiagen RNeasy Mini Kit and subsequently chemically fragmented. Ten μg of the fragmented, biotinylated cRNA was hybridized to the Affymetrix oligonucleotide human array set, HG-U133A&B, which contains 45,000 probe sets representing more than 39,000 transcripts derived from approximately 33,000 well-substantiated human genes. Hybridization was carried out in a hybridization oven at 45° C. and rotation was set at 60 rpm for 16 h. The arrays were washed and stained in the Fluidics Station 400 (Affymetrix Inc., Santa Clara, Calif., U.S.A) in accordance with the Affymetrix protocol. Staining was carried out using streptavidin-phycoerythrin (SAPE, final concentration of 10 μg/ml) and signal amplification with a biotinylated anti-streptavidin antibody and a second SAPE staining. The arrays were washed and scanned according to the manufacturer's instructions.


The raw expression data was processed using Microarray Suite 5.0 software (Affymetrix Inc., Santa Clara, Calif., U.S.A) and normalized using the global mean method. For each microarray, probeset signal values were scaled by adjusting the mean log intensity to a target signal value of 500. Samples with suboptimal average signal intensities were re-labeled and re-hybridized on new arrays. If microarray artifacts were visible, the samples were re-hybridized on new chips using the same fragmented probe, or alternatively, if the defective areas were small, the affected probes were censored from further analysis. The normalized expression data from both U133A and B chips were combined and natural log transformed.


The extent to which gene expression patterns could distinguish p53 mt and wt tumors was first investigated. By Wilcoxon rank-sum test 3,330 Affymetrix probe-sets representing ˜2,770 distinct genes (according to UniGene build #167) were identified whose expression patterns distinguished p53 mt and wt tumors with a false discovery rate (FDR)-adjusted p value of p<0.001. A number of these genes were found to be known transcriptional targets of p53 including PERP, RRM2, SEMA3B, TAP1, GTSE1, CHECK1, and CHEK2. Shown in FIG. 1 is the result of hierarchical cluster analysis using the top 250 genes, all of which are associated with p53 status with FDR p<5.9×10−8. As expected from the gene selection criteria, the majority of p53 mt and wt tumors clustered into separate tumor groups. Of two predominant cluster nodes, 90% of the p53 mutants were found in one cluster (i.e., the “mutant-like” cluster), while 77% of p53 wt tumors segregated with the other (the “wildtype-like” cluster).


The hierarchical structure of the gene expression profiles was next investigated. As in the tumors, two predominant clusters were observed: one consisting of ˜200 genes more highly expressed in the mutant-like tumor cluster, and the other representing ˜50 genes more highly expressed in the wildtype-like cluster. Within the former, the genes most highly correlated with p53 mutant status were associated with cell cycle progression including, CDC2, CDC20, CCNB1, CCNB2, CKS2, CDCA1, CDCA3, CDCA8, CENPA, TOP2A, PTTG1 and MCM6. This finding is consistent with the observation that wt p53 has a negative regulatory effect on cell cycle genes. Of the genes more highly expressed in the wildtype-like cluster, the presence of several estrogen-regulated and ER status-associated genes including STC2, NCOR1, and ADRA2A was observed.


Further examination of the tumors revealed that in addition to p53 status, the predominant tumor clusters were also correlated with other clinical features, namely estrogen receptor (ER) status and tumor grade. The estrogen receptor status of a cell has been found to be correlated with cancer in several instances. Normal breast cells usually have receptors for estrogen. However, cancer cells arising in the breast do not always have receptors for estrogen. Breast cancers that have estrogen receptors are said to be “estrogen receptor-positive,” while those breast cancers that do not possess estrogen receptors are “estrogen receptor-negative.” In estrogen receptor-positive cancers, cancer cell growth is under the control of estrogen. In contrast, the growth of estrogen receptor-negative cancer cells is not governed by estrogen.



FIG. 1 shows hierarchical clustering of 257 tumors using the top 250 genes statistically correlated with p53 status. Tumors are represented in columns, genes are represented in rows. The degree of color saturation reflects the magnitude of the log expression signal; red hues denote higher expression levels while green hues indicate lower expression levels. The top row of black vertical bars indicates which breast tumors possess p53 mutations. The second row of bars indicates tumors that are ER positive. The third row of bars reflects histologic grade (Elston-Ellis grading system); green bars=grade I, blue bars=grade II, and red bars=grade III.


Segregating with the mutant-like cluster were observed 86% of estrogen receptor-negative (ER−) tumors (pcs=1 7×10−10), 96% of grade III tumors (pcs=2.5×10−19) and only 3% of grade I tumors (pfe=6.9×10−15). This result owes, in part, to the fact that the p53 mutants in this study are positively correlated with ER negativity (pcs=1.7×10−6) and grade III status (pcs=1.2×10−11), and is consistent with previous reports demonstrating that p53 mutant breast cancers are significantly correlated with negative ER status and higher tumor grade. See for example, Cattoretti, G., Rilke, F., Andreola, S., D'Amato, L. & Delia, D. P53 expression in breast cancer. Int J Cancer 41, 178-83 (1988); Isola, J., Visakorpi, T., Holli, K. & Kallioniemi, O. P. Association of overexpression of tumor suppressor protein p53 with rapid cell proliferation and poor prognosis in node-negative breast cancer patients. J Natl Cancer Inst 84, 1109-14 (1992); Andersen, T. I. et al. Prognostic significance of TP53 alterations in breast carcinoma. Br J Cancer 68, 540-8 (1993) and Bhargava, V. et al. The association of p53 immunopositivity with tumor proliferation and other prognostic indicators in breast cancer. Mod Pathol 7, 361-8 (1994), all of which are incorporated herein by reference.


However, it was also observed that among the p53 wt tumors within the mutant-like cluster, there, too, was a significant over-representation of ER-(pcs=2.0×10−6) and grade III tumors (pfe=7.1×10−11). Thus, by univariate statistical analysis, a large number of genes highly associated with p53 status have been identified that are capable of segregating tumors in a manner correlated with p53 status, but also histologic grade and ER status.


Example 2
A Gene Expression Classifier for Predicting p53 Deficiency

The finding that a fraction of p53 wt tumors were found to cluster together with the majority of p53 mutants suggests the possibility that these tumors may in fact be p53 deficient through mechanisms other than p53 mutation. Conversely, the discovery of p53 mutants with molecular configurations reminiscent of most wt tumors suggests that these tumors might in fact express functionally intact p53. However, the tumor group assignments in this case were based on genes selected by a univariate ranking procedure that did not account for the association of p53 status with ER and grade status. This raised the possibility that, to some extent, the selected genes included those that are mostly grade and/or ER associated, which may have biased the clustering of the tumors towards these properties rather than p53 status, per se.


Therefore, a robust gene expression-based classifier for predicting p53 status was developed by designing a predictive model including a multivariate linear regression method known as linear model-fit (LMF) for ranking p53 status-correlated genes independent of histologic grade and ER status.



FIG. 2 shows optimization and results of a gene-based classifier for p53 status. Diagonal Linear Discriminant Analysis (DLDA) was employed for the supervised learning of p53 status using gene expression profiles ranked by the Linear Model-Fit method. (A): Analysis of overlap between grade/estrogen receptor (ER)-correlated genes and p53-correlated genes ranked by Wilcoxon rank-sum test or Linear Model fit. The heat maps indicate the number of genes correlated with tumor grade (upper heat map) or ER status (lower heat map) in 100-gene bins (rows) and also correlated with p53 status (columns; ranked in 50-gene bins); p53 correlated genes were ranked by LMF=Linear Model-Fit or WR=Wilcoxon rank-sum; grade correlated genes were ranked by KW=Kruskal-Wallis, and ER correlated genes by WR. (B): The accuracy of the classifier is plotted as a function of the number of genes used to build the classifier; the optimal classifier consisted of 32 genes and misclassified a total of 40 tumors. (C): The results of the classifier applied to the Uppsala dataset (257 tumors) using leave-one-out cross validation. Unigene symbols (build #167), Genbank accession numbers, and Affymetrix probe IDs (A.=U133A; B.=U133B) are shown.


For gene selection, a linear model was fitted to the gene expression data with expression level as the response, and p53 status, ER status and grade status as the predictor variables. As an initial filter for removing genes not well correlated with the predictor variables, all genes with a p-value fit greater than 0.001 were excluded. Using ER and grade as additional predictors allowed for filtering out genes whose expression patterns could be mostly explained by either ER or grade status. When applied, the LMF ranking procedure markedly reduced the rank of many known cell cycle-regulated genes compared to the univariate Wilcoxon rank-sum (WR) method, indicating that these genes are best explained by high grade rather than p53 status (FIG. 2A, upper panel). Conversely, it was observed that ER-associated genes moved up in the top ranked p53-associated genes by LMF, presumably because their lower ranking by WR resulted from a large number of more highly ranked grade-associated genes (FIG. 2A, lower panel).


For class prediction purposes, the genes were ranked in decreasing order of the absolute value of the p53 status coefficient. For building the classifier, a variant of the maximum likelihood method, DLDA (diagonal linear discriminant analysis) was employed. This had previously been applied to class determination problems using microarray data, described for example, in Dudoit, S., Frilyand, J. & Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77-87 (2002), incorporated herein by reference. The set of predictor genes with greatest classification accuracy was chosen by leave-one-out cross validation.


The accuracy of the classifier as a function of the number of genes it comprised is plotted in FIG. 2B. Of particular note was the observation that the accuracy of the tumor classification was highly stable, varying by only 2.7% (i.e., 7 tumors) regardless of whether the classifier comprised 7 genes or 500 genes. Genes in the 500-gene classifier are shown in Table 1 below. The optimal classifier, however, was achieved at 32 genes (Table 2), whereby 40 tumors (15.6%) were misclassified. 28 of the wt tumors (14%) were classified as mutant-like, while 12 mutants (20%) were misclassified as wildtype-like (FIG. 2C).














TABLE 1







Genbank
UniGene




Rank
Affymetrix
(decimals
Cluster ID

UniGene


Order
Probeset ID
removed)
(build #173)
UniGene Name
Symbol




















1
A.217889_s
NM_024843
Hs.31297
cytochrome b reductase 1
CYBRD1



at


2
B.243929_at
H15261
Hs.21948
Transcribed sequences


3
B.229975_at
AI826437
Hs.283417
Transcribed sequences


4
B.223864_at
AF269087
Hs.326736
ankyrin repeat domain 30A
ANKRD30A


5
B.227081_at
AW299538
Hs.75528
nucleolar GTPase
HUMAUAN







TIG


6
A.215014_at
AL512727
Hs.232127
MRNA; cDNA DKFZp547P042






(from clone DKFZp547P042)


7
A.206869_at
NM_001267
Hs.97220
Chondroadherin
CHAD


8
A.221585_at
BC004504
Hs.331904
calcium channel, voltage-
CACNG4






dependent,






gamma subunit 4


9
A.205440_s
NM_000909
Hs.519057
neuropeptide Y receptor Y1
NPY1R



at


10
B.228969_at
AI922323
Hs.226391
anterior gradient 2 homolog
AGR2






(Xenopus laevis)


11
A.212949_at
D38553
Hs.308045
barren homolog (Drosophila)
BRRN1


12
B.226067_at
AL355392
Data not





found


13
B.232855_at
AL360204
Hs.283853
MRNA full length insert cDNA






clone






EUROIMAGE 980547


14
A.221520_s
BC001651
Hs.48855
Cell division cycle associated 8
CDCA8



at


15
A.205472_s
NM_004392
Hs.63931
Dachshund homolog 1
DACH1



at


(Drosophila)


16
A.205186_at
NM_003462
Hs.406050
Dynein, axonemal, light
DNALI1






intermediate






Polypeptide 1


17
A.221275_s
NM_030896
Data not



at

found


18
B.229030_at
AW242997
Data not





found


19
B.233413_at
AU156421
Hs.518736
CDNA FLJ13457 fis, clone






PLACE1003343


20
A.203438_at
AI435828
Hs.155223
stanniocalcin 2
STC2


21
B.230378_at
AA742697
Hs.62492
secretoglobin, family 3A,
SCGB3A1






member 1


22
B.238581_at
BG271923
Hs.237809
guanylate binding protein 5
GBP5


23
B.235343_at
AI961235
Hs.96885
hypothetical protein FLJ12505
FLJ12505


24
B.229150_at
AI810764
Hs.102406
Transcribed sequences


25
A.205734_s
AI990465
Hs.38070
lymphoid nuclear protein related
LAF4



at


to AF4


26
A.214079_at
AK000345
Hs.272499
Dehydrogenase/reductase (SDR
DHRS2






family)






member 2


27
B.238746_at
BF245284
Hs.354427
Transcribed sequence with weak






similarity






to protein ref: NP_286085.1 (E. coli)






beta-D-galactosidase






[Escherichia coli O157: H7






EDL933]


28
A.204623_at
NM_003226
Data not





found


29
B.230863_at
R73030
Hs.252938
low density lipoprotein-related
LRP2






protein 2


30
A.215047_at
AL080170
Data not





found


31
A.201710_at
NM_002466
Hs.179718
v-myb myeloblastosis viral
MYBL2






oncogene






homolog (avian)-like 2


32
A.205009_at
NM_003225
Data not





found


33
A.207750_at
NM_018510
Data not





found


34
B.237339_at
AI668620
Hs.144151
Transcribed sequences


35
A.220540_at
NM_022358
Hs.528664
potassium channel, subfamily K,
KCNK15






member 15


36
B.223062_s
BC004863
Hs.286049
phosphoserine aminotransferase 1
PSAT1



at


37
A.204508_s
BC001012
Hs.512620
carbonic anhydrase XII
CA12



at


38
A.214451_at
NM_003221
Hs.33102
transcription factor AP-2 beta
TFAP2B






(activating enhancer binding






protein 2 beta)


39
A.202870_s
NM_001255
Hs.82906
CDC20 cell division cycle 20
CDC20



at


homolog






(S. cerevisiae)


40
B.236641_at
AW183154
Hs.3104
kinesin family member 14
KIF14


41
A.219197_s
AI424243
Hs.435861
signal peptide, CUB domain,
SCUBE2



at


EGF-like 2


42
A.207183_at
NM_006143
Hs.92458
G protein-coupled receptor 19
GPR19


43
A.220414_at
NM_017422
Hs.180142
calmodulin-like 5
CALML5


44
A.205354_at
NM_000156
Hs.81131
guanidinoacetate N-
GAMT






methyltransferase


45
A.201755_at
NM_006739
Hs.77171
MCM5 minichromosome
MCM5






maintenance






deficient 5, cell division cycle 46






(S. cerevisiae)


46
A.209459_s
AF237813
Hs.1588
4-aminobutyrate
ABAT



at


aminotransferase


47
B.225516_at
AA876372
Hs.432978
solute carrier family 7 (cationic
SLC7A2






amino






acid transporter, y+ system),






member 2


48
A.204558_at
NM_003579
Hs.66718
RAD54-like (S. cerevisiae)
RAD54L


49
B.224428_s
AY029179
Hs.435733
cell division cycle associated 7
CDCA7



at


50
B.228854_at
AI492388
Hs.356349
zinc finger protein 145
ZNF145






(Kruppel-like,






expressed in promyelocytic






leukemia)


51
A.208502_s
NM_002653
Hs.84136
paired-like homeodomain
PITX1



at


transcription factor 1


52
B.226936_at
BG492359
Hs.35962
CDNA clone IMAGE: 4448513,






partial cds


53
B.230021_at
AI638593
Hs.441708
hypothetical protein MGC45866
MGC45866


54
A.206799_at
NM_006551
Hs.204096
secretoglobin, family 1D,
SCGB1D2






member 2


55
A.202410_x
NM_000612
Hs.349109
insulin-like growth factor 2
IGF2



at


(somatomedin A)


56
A.206509_at
NM_002652
Hs.99949
prolactin-induced protein
PIP


57
A.204885_s
NM_005823
Hs.408488
Mesothelin
MSLN



at


58
A.201496_x
AI889739
Hs.78344
myosin, heavy polypeptide 11,
MYH11



at


smooth






muscle


59
A.206401_s
J03778
Hs.101174
microtubule-associated protein
MAPT



at


tau


60
A.204734_at
NM_002275
Hs.80342
keratin 15
KRT15


61
A.204014_at
NM_001394
Hs.417962
dual specificity phosphatase 4
DUSP4


62
A.204775_at
NM_005441
Hs.75238
chromatin assembly factor 1,
CHAF1B






subunit B (p60)


63
A.215356_at
AK023134
Hs.130675
hypothetical gene FLJ13072
FLJ13072


64
B.243049_at
AI791225
Hs.444098
MRNA; cDNA DKFZp434I1226






(from clone DKFZp434I1226)


65
B.223721_s
AF176013
Hs.260720
DnaJ (Hsp40) homolog,
DNAJC12



at


subfamily C,






member 12


66
A.219918_s
NM_018123
Data not



at

found


67
B.243735_at
N58363
Hs.8739
signal transducer and activator
STATIP1






of transcription






3 interacting protein 1


68
A.214188_at
AW665096
Hs.15299
HMBA-inducible
HIS1


69
B.226980_at
AK001166
Hs.421337
DEP domain containing 1B
DEPDC1B


70
A.203071_at
NM_004636
Hs.82222
sema domain, immunoglobulin
SEMA3B






domain (Ig),






short basic domain, secreted,






(semaphorin) 3B


71
A.206204_at
NM_004490
Hs.411881
growth factor receptor-bound
GRB14






protein 14


72
A.205979_at
NM_002407
Hs.97644
secretoglobin, family 2A,
SCGB2A1






member 1


73
A.208335_s
NM_002036
Hs.517102
Duffy blood group
FY



at


74
B.227550_at
AW242720
Hs.388347
MRNA; cDNA






DKFZp686J0156 (from clone






DKFZp686J0156)


75
A.220187_at
NM_024636
Hs.44208
likely ortholog of mouse tumor
FLJ23153






necrosis-alpha-induced adipose-






related protein


76
B.226473_at
BE514414
Hs.103305
hypothetical protein MGC10561
MGC10561


77
A.204822_at
NM_003318
Hs.169840
TTK protein kinase
TTK


78
A.204724_s
NM_001853
Hs.126248
collagen, type IX, alpha 3
COL9A3



at


79
A.205240_at
NM_013296
Hs.278338
G-protein signalling modulator 2
GPSM2






(AGS3-like,







C. elegans)



80
A.205898_at
U20350
Hs.78913
chemokine (C—X3—C motif)
CX3CR1






receptor 1


81
B.223381_at
AF326731
Hs.234545
cell division cycle associated 1
CDCA1


82
A.209243_s
AF208967
Hs.201776
paternally expressed 3
PEG3



at


83
A.204146_at
BE966146
Data not





found


84
B.228273_at
BG165011
Hs.528654
hypothetical protein FLJ11029
FLJ11029


85
A.204162_at
NM_006101
Hs.414407
kinetochore associated 2
KNTC2


86
A.204914_s
AI360875
Hs.432638
SRY (sex determining region
SOX11



at


Y)-box 11


87
A.209309_at
D90427
Hs.512643
alpha-2-glycoprotein 1, zinc
AZGP1


88
A.205048_s
NM_003832
Data not



at

found


89
B.227419_x
AW964972
Hs.361171
placenta-specific 9
PLAC9



at


90
B.232944_at
AK024132
Hs.525858
MRNA; cDNA






DKFZp686I18125 (from clone






DKFZp686I18125)


91
B.224753_at
BE614410
Hs.434886
cell division cycle associated 5
CDCA5


92
A.210051_at
U78168
Hs.8578
Rap guanine nucleotide
RAPGEF3






exchange factor






(GEF) 3


93
A.215616_s
AB020683
Hs.301011
jumonji domain containing 2B
JMJD2B



at


94
A.210272_at
M29873
Hs.415794
cytochrome P450, family 2,
CYP2B7






subfamily B,






polypeptide 7 pseudogene


95
B.222608_s
AK023208
Hs.62180
anillin, actin binding protein
ANLN



at


(scraps homolog,







Drosophila)



96
B.240724_at
AI668629
Hs.25345
Transcribed sequences


97
B.228554_at
AL137566
Hs.32405
MRNA; cDNA






DKFZp686A0815 (from clone






DKFZp686A0815)


98
A.205280_at
NM_000824
Hs.32973
glycine receptor, beta
GLRB


99
B.238659_at
AA760689
Hs.210532
KIAA0141 gene product
KIAA0141


100
B.238116_at
AW959427
Hs.98849
dynein, cytoplasmic, light
DNCL2B






polypeptide 2B


101
A.212448_at
AB007899
Hs.249798
neural precursor cell expressed,
NEDD4L






developmentally down-regulated






4-like


102
B.235572_at
AI469788
Hs.381225
kinetochore protein Spc24
Spc24


103
A.209603_at
AI796169
Hs.169946
GATA binding protein 3
GATA3


104
A.205358_at
NM_000826
Hs.335051
glutamate receptor, ionotropic,
GRIA2






AMPA 2


105
A.202095_s
NM_001168
Hs.1578
baculoviral IAP repeat-
BIRC5



at


containing 5 (survivin)


106
A.211470_s
AF186255
Hs.38084
sulfotransferase family,
SULT1C1



at


cytosolic, 1C, member 1


107
A.205350_at
NM_004378
Hs.346950
cellular retinoic acid binding
CRABP1






protein 1


108
A.205890_s
NM_006398
Hs.44532
ubiquitin D
UBD



at


109
A.209680_s
BC000712
Hs.20830
kinesin family member C1
KIFC1



at


110
B.240192_at
AI631850
Hs.158992
FLJ45983 protein
FLJ45983


111
A.205225_at
NM_000125
Hs.1657
estrogen receptor 1
ESR1


112
B.235545_at
AI810054
Hs.445098
DEP domain containing 1
DEPDC1


113
B.224210_s
BC001147
Hs.436924
peroxisomal membrane protein
PXMP4



at


4, 24 kDa


114
B.229381_at
AI732488
Hs.29190
hypothetical protein MGC24047
MGC24047


115
A.210523_at
D89675
Hs.87223
bone morphogenetic protein
BMPR1B






receptor, type IB


116
A.204641_at
NM_002497
Hs.153704
NIMA (never in mitosis gene a)-
NEK2






related






kinase 2


117
B.227764_at
AA227842
Hs.21929
hypothetical protein MGC52057
MGC52057


118
B.238900_at
BE669692
Data not





found


119
A.202580_x
NM_021953
Hs.511941
forkhead box M1
FOXM1



at


120
A.205366_s
NM_018952
Hs.147465
homeo box B6
HOXB6



at


121
B.227966_s
AA524895
Hs.449141
Hypothetical protein



at


LOC285103, mRNA






(cDNA clone IMAGE: 5273139),






partial cds


122
B.228069_at
AL138828
Data not





found


123
A.210163_at
AF030514
Hs.103982
chemokine (C—X—C motif) ligand
CXCL11






11


124
A.204855_at
NM_002639
Hs.55279
serine (or cysteine) proteinase
SERPINB5






inhibitor,






clade B (ovalbumin), member 5


125
B.229390_at
AV734646
Hs.381220
Full length insert cDNA clone






ZA84A12


126
A.203213_at
AL524035
Hs.334562
cell division cycle 2, G1 to S and
CDC2






G2 to M


127
A.219555_s
NM_018455
Hs.283532
uncharacterized bone marrow
BM039



at


protein BM039


128
B.227282_at
AB037734
Hs.4993
protocadherin 19
PCDH19


129
A.220085_at
NM_018063
Hs.203963
helicase, lymphoid-specific
HELLS


130
A.203256_at
NM_001793
Hs.191842
cadherin 3, type 1, P-cadherin
CDH3






(placental)


131
B.234992_x
BG170335
Hs.293257
epithelial cell transforming
ECT2



at


sequence 2






oncogene


132
A.204825_at
NM_014791
Hs.184339
maternal embryonic leucine
MELK






zipper kinase


133
A.204126_s
NM_003504
Hs.114311
CDC45 cell division cycle 45-
CDC45L



at


like






(S. cerevisiae)


134
A.218663_at
NM_022346
Hs.528669
chromosome condensation
HCAP-G






protein G


135
B.239962_at
AA972452
Hs.292072
Transcribed sequences


136
A.205046_at
NM_001813
Hs.75573
centromere protein E, 312 kDa
CENPE


137
B.235717_at
AA180985
Hs.285574
zinc finger protein 229
ZNF229


138
B.233154_at
AK022197
Hs.130581
CDNA FLJ12135 fis, clone






MAMMA1000307


139
A.206754_s
NM_000767
Hs.1360
cytochrome P450, family 2,
CYP2B6



at


subfamily B,






polypeptide 6


140
A.204533_at
NM_001565
Hs.413924
chemokine (C—X—C motif) ligand
CXCL10






10


141
A.212925_at
AA143765
Hs.439180
chromosome 19 open reading
C19orf21






frame 21


142
B.223229_at
AB032931
Hs.5199
HSPC150 protein similar to
HSPC150






ubiquitin-conjugating enzyme


143
A.206599_at
NM_004695
Hs.90911
solute carrier family 16
SLC16A5






(monocarboxylic






acid transporters), member 5


144
A.208103_s
NM_030920
Hs.385913
acidic (leucine-rich) nuclear
ANP32E



at


phosphoprotein






32 family, member E


145
A.217953_at
AW189430
Hs.348921
PHD finger protein 3
PHF3


146
A.219686_at
NM_018401
Hs.58241
serine/threonine kinase 32B
STK32B


147
A.217276_x
AL590118
Hs.301947
kraken-like
dJ222E13.1



at


148
B.234863_x
AK026197
Hs.272027
F-box protein 5
FBXO5



at


149
B.240465_at
BF508074
Data not





found


150
A.218308_at
NM_006342
Hs.104019
transforming, acidic coiled-coil
TACC3






containing protein 3


151
A.206157_at
NM_002852
Hs.2050
pentaxin-related gene, rapidly
PTX3






induced






by IL-1 beta


152
A.209368_at
AF233336
Hs.212088
epoxide hydrolase 2,
EPHX2






cytoplasmic


153
B.230856_at
AI073396
Hs.9398
WD40 repeat protein Interacting
WIPI49






with






phosphoInositides of 49 kDa


154
A.201890_at
NM_001034
Hs.226390
ribonucleotide reductase M2
RRM2






polypeptide


155
A.205364_at
NM_003500
Hs.9795
acyl-Coenzyme A oxidase 2,
ACOX2






branched chain


156
B.225911_at
AL138410
Hs.282832
hypothetical protein LOC255743
LOC255743


157
B.244696_at
AI033582
Hs.372254
Transcribed sequences


158
A.218730_s
NM_014057
Hs.109439
osteoglycin (osteoinductive
OGN



at


factor, mimecan)


159
A.219498_s
NM_018014
Hs.314623
B-cell CLL/lymphoma 11A
BCL11A



at


(zinc finger protein)


160
A.203702_s
AL043927
Hs.169910
tubulin tyrosine ligase-like
TTLL4



at


family, member 4


161
A.206045_s
NM_003787
Hs.23567
nucleolar protein 4
NOL4



at


162
A.219919_s
NM_018276
Hs.29173
slingshot homolog 3
SSH3



at


(Drosophila)


163
A.215779_s
BE271470
Data not



at

found


164
B.230966_at
AI859620
Hs.437023
interleukin 4 induced 1
IL4I1


165
A.206378_at
NM_002411
Hs.46452
secretoglobin, family 2A,
SCGB2A2






member 2


166
A.221562_s
AF083108
Hs.511950
sirtuin (silent mating type
SIRT3



at


information






regulation 2 homolog) 3 (S. cerevisiae)


167
A.221258_s
NM_031217
Hs.301052
kinesin family member 18A
DKFZP434G2226



at


168
A.221577_x
AF003934
Hs.296638
growth differentiation factor 15
GDF15



at


169
B.235709_at
H37811
Hs.20575
growth arrest-specific 2 like 3
GAS2L3


170
B.235171_at
AI354636
Data not





found


171
A.207437_at
NM_006491
Hs.292511
neuro-oncological ventral
NOVA1






antigen 1


172
A.203638_s
NM_022969
Hs.404081
fibroblast growth factor receptor 2
FGFR2



at


(bacteria-expressed kinase,






keratinocyte






growth factor receptor,






craniofacial






dysostosis 1, Crouzon syndrome,






Pfeiffer syndrome, Jackson-






Weiss syndrome)


173
A.218542_at
NM_018131
Hs.14559
chromosome 10 open reading
C10orf3






frame 3


174
A.217613_at
AW173720
Hs.176227
hypothetical protein FLJ11155
FLJ11155


175
B.241310_at
AI685841
Hs.161354
Transcribed sequences


176
A.205234_at
NM_004696
Hs.351306
solute carrier family 16
SLC16A4






(monocarboxylic






acid transporters), member 4


177
A.203726_s
NM_000227
Hs.83450
laminin, alpha 3
LAMA3



at


178
A.221436_s
NM_031299
Hs.30114
cell division cycle associated 3
CDCA3



at


179
A.205242_at
NM_006419
Hs.100431
chemokine (C—X—C motif) ligand
CXCL13






13






(B-cell chemoattractant)


180
A.218726_at
NM_018410
Hs.104859
hypothetical protein
DKFZp762E1312






DKFZp762E1312


181
A.218856_at
NM_016629
Data not





found


182
B.226661_at
T90295
Data not





found


183
A.218741_at
NM_024053
Hs.208912
chromosome 22 open reading
C22orf18






frame 18


184
A.206201_s
NM_005924
Hs.77858
mesenchyme homeo box 2
MEOX2



at


(growth






arrest-specific homeo box)


185
B.236184_at
AI798959
Hs.131686
Transcribed sequences


186
A.220651_s
NM_018518
Hs.198363
MCM10 minichromosome
MCM10



at


maintenance






deficient 10 (S. cerevisiae)


187
A.216331_at
AK022548
Hs.74369
integrin, alpha 7
ITGA7


188
B.232105_at
AU148391
Hs.181245
MRNA; cDNA






DKFZp686B15184 (from clone






DKFZp686B15184)


189
B.226907_at
N32557
Hs.192822
protein phosphatase 1,
PPP1R14C






regulatory






(inhibitor) subunit 14C


190
B.234976_x
BG324504
Hs.321127
solute carrier family 4, sodium
SLC4A5



at


bicarbonate






cotransporter, member 5


191
A.211323_s
L38019
Hs.149900
inositol 1,4,5-triphosphate
ITPR1



at


receptor, type 1


192
A.206391_at
NM_002888
Hs.82547
retinoic acid receptor responder
RARRES1






(tazarotene induced) 1


193
A.222348_at
AW971134
Hs.212787
KIAA0303 protein
KIAA0303


194
B.235845_at
AI380207
Hs.368802
Sp5 transcription factor
SP5


195
B.239233_at
AA744613
Hs.292925
KIAA1212
KIAA1212


196
A.208383_s
NM_002591
Hs.1872
phosphoenolpyruvate
PCK1



at


carboxykinase 1 (soluble)


197
A.214440_at
NM_000662
Hs.155956
N-acetyltransferase 1 (arylamine
NAT1






N-acetyltransferase)


198
B.230456_at
BE501559
Hs.380824
NS5ATP13TP2 protein
NS5ATP13TP2


199
A.219650_at
NM_017669
Data not





found


200
A.210052_s
AF098158
Hs.9329
TPX2, microtubule-associated
TPX2



at


protein






homolog (Xenopus laevis)


201
A.204468_s
NM_005424
Hs.78824
tyrosine kinase with
TIE



at


immunoglobulin






and epidermal growth factor






homology domains


202
A.209531_at
BC001453
Hs.26403
glutathione transferase zeta 1
GSTZ1






(maleylacetoacetate isomerase)


203
A.217014_s
AC004522
Data not



at

found


204
B.227155_at
R10289
Hs.3844
LIM domain only 4
LMO4


205
A.213520_at
NM_004260
Hs.31442
RecQ protein-like 4
RECQL4


206
B.241505_at
BF513468
Data not





found


207
A.213451_x
BE044614
Hs.411644
tenascin XB
TNXB



at


208
A.214389_at
AI733515
Hs.148907
hypothetical protein MGC52019
MGC52019


209
B.235229_at
AI694413
Data not





found


210
A.203571_s
NM_006829
Hs.511763
chromosome 10 open reading
C10orf116



at


frame 116


211
B.237168_at
AA708016
Data not





found


212
A.203915_at
NM_002416
Hs.77367
chemokine (C—X—C motif) ligand 9
CXCL9


213
B.224509_s
BC006399
Hs.155839
reticulon 4 interacting protein 1
RTN4IP1



at


214
A.206093_x
NM_007116
Data not



at

found


215
A.205613_at
NM_016524
Hs.258326
B/K protein
LOC51760


216
B.236885_at
AI651930
Data not





found


217
B.236341_at
AI733018
Hs.247824
cytotoxic T-lymphocyte-
CTLA4






associated protein 4


218
A.221854_at
AI378979
Hs.313068
plakophilin 1 (ectodermal
PKP1






dysplasia/






skin fragility syndrome)


219
A.201291_s
NM_001067
Hs.156346
topoisomerase (DNA) II alpha
TOP2A



at


170 kDa


220
B.232734_at
AK023230
Hs.139709
hypothetical protein FLJ12572
FLJ12572


221
A.214053_at
AW772192
Hs.7888
CDNA FLJ44318 fis, clone






TRACH3000780


222
B.231195_at
AI492376
Data not





found


223
A.212956_at
AB020689
Hs.411317
KIAA0882 protein
KIAA0882


224
A.214404_x
AI307915
Hs.79414
SAM pointed domain containing
SPDEF



at


ets






transcription factor


225
B.237086_at
AI693336
Hs.163484
forkhead box A1
FOXA1


226
A.205948_at
NM_007050
Hs.225952
protein tyrosine phosphatase,
PTPRT






receptor type, T


227
A.214745_at
AW665865
Hs.193143
KIAA1069 protein
KIAA1069


228
A.208029_s
NM_018407
Hs.296398
lysosomal associated protein
LAPTM4B



at


transmembrane 4 beta


229
A.205569_at
NM_014398
Hs.10887
lysosomal-associated membrane
LAMP3






protein 3


230
B.235046_at
AA456099
Hs.176376
Transcribed sequences


231
A.203130_s
NM_004522
Data not



at

found


232
B.238584_at
W52934
Hs.113009
hypothetical protein FLJ22527
FLJ22527


233
A.220986_s
NM_030953
Hs.169333
tigger transposable element
TIGD6



at


derived 6


234
A.205023_at
D14134
Hs.446554
RAD51 homolog (RecA
RAD51






homolog, E. coli)






(S. cerevisiae)


235
B.237048_at
AW451103
Hs.71371
Clone IMAGE: 4797878,






mRNA, partial cds


236
B.225400_at
BF111780
Hs.440663
chromosome 1 open reading
C1orf19






frame 19


237
A.206134_at
NM_014479
Hs.145296
ADAM-like, decysin 1
ADAMDEC1


238
A.214469_at
NM_021052
Hs.121017
histone 1, H2ae
HIST1H2AE


239
A.202188_at
NM_014669
Hs.295014
nucleoporin 93 kDa
NUP93


240
A.204678_s
U90065
Hs.376874
potassium channel, subfamily K,
KCNK1



at


member 1


241
B.231517_at
AW243917
Hs.196566
ZYG-11A early embryogenesis






protein






mRNA, complete cds


242
A.210387_at
BC001131
Data not





found


243
B.223623_at
AF325503
Hs.43125
esophageal cancer related gene 4
ECRG4






protein


244
B.228729_at
N90191
Hs.23960
cyclin B1
CCNB1


245
A.204904_at
NM_002060
Hs.296310
gap junction protein, alpha 4,
GJA4






37 kDa






(connexin 37)


246
B.237301_at
BF433570
Hs.144479
Transcribed sequences


247
B.239623_at
N93197
Hs.49573
CDNA FLJ44606 fis, clone






BRACE2005991


248
B.242601_at
AA600175
Hs.443169
hypothetical protein LOC253012
LOC253012


249
B.223861_at
AL136755
Hs.298312
HORMA domain containing
NOHMA






protein


250
A.213122_at
AI096375
Hs.173094
TSPY-like 5
TSPYL5


251
A.204482_at
NM_003277
Hs.505337
claudin 5 (transmembrane
CLDN5






protein






deleted in velocardiofacial






syndrome)


252
B.240512_x
H10766
Hs.23406
potassium channel
KCTD4



at


tetramerisation






domain containing 4


253
A.209642_at
AF043294
Hs.287472
BUB1 budding uninhibited by
BUB1






benzimidazoles 1 homolog






(yeast)


254
B.239669_at
AW006409
Hs.532143
Transcribed sequences


255
B.243028_x
BE045392
Data not



at

found


256
A.210721_s
AB040812
Hs.32539
p21(CDKN1A)-activated kinase 7
PAK7



at


257
A.215942_s
BF973178
Hs.122552
G-2 and S-phase expressed 1
GTSE1



at


258
B.222895_s
AA918317
Hs.57987
B-cell CLL/lymphoma 11B
BCL11B



at


(zinc finger protein)


259
A.203708_at
NM_002600
Hs.188
phosphodiesterase 4B, cAMP-
PDE4B






specific






(phosphodiesterase E4 dunce






homolog,







Drosophila)



260
B.235178_x
AL120674
Data not



at

found


261
B.236471_at
AI949827
Hs.404741
nuclear factor (erythroid-derived
NFE2L3






2)-like 3


262
A.220024_s
NM_020956
Hs.205457
periaxin
PRX



at


263
A.213711_at
NM_002281
Hs.170925
keratin, hair, basic, 1
KRTHB1


264
A.204766_s
NM_002452
Hs.413078
nudix (nucleoside diphosphate
NUDT1



at


linked






moiety X)-type motif 1


265
B.227182_at
AW966474
Hs.88417
sushi domain containing 3
SUSD3


266
A.220061_at
NM_017888
Hs.122939
hypothetical protein FLJ20581
FLJ20581


267
A.220117_at
NM_024697
Hs.99256
hypothetical protein FLJ22419
FLJ22419


268
B.237395_at
AV700083
Hs.176588
cytochrome P450, family 4,
CYP4Z1






subfamily Z,






polypeptide 1


269
B.226034_at
BE222344
Hs.346735
Clone IMAGE: 3881549, mRNA


270
A.207038_at
NM_004694
Hs.42645
solute carrier family 16
SLC16A6






(monocarboxylic






acid transporters), member 6


271
B.238541_at
BE544855
Hs.236572
CDNA clone IMAGE: 5265729,






partial cds


272
A.207702_s
NM_012301
Hs.22599
atrophin-1 interacting protein 1
AIP1



at


273
B.236496_at
AW006352
Hs.159643
chromosome 14 open reading
C14orf66






frame 66


274
A.215300_s
AK022172
Hs.396595
flavin containing
FMO5



at


monooxygenase 5


275
A.219580_s
NM_024780
Hs.145807
transmembrane channel-like 5
TMC5



at


276
B.230469_at
AW665138
Hs.58559
pleckstrin homology domain
PLEKHK1






containing,






family K member 1


277
B.243636_s
AI042373
Hs.132917
Transcribed sequences



at


278
A.203764_at
NM_014750
Hs.77695
discs, large homolog 7
DLG7






(Drosophila)


279
A.209936_at
AF107493
Hs.439480
RNA binding motif protein 5
RBM5


280
A.207961_x
NM_022870
Data not



at

found


281
B.233059_at
AK026384
Hs.199776
potassium inwardly-rectifying
KCNJ3






channel,






subfamily J, member 3


282
A.221583_s
AI129381
Hs.354740
potassium large conductance
KCNMA1



at


calcium-activated






channel, subfamily M, alpha






member 1


283
B.228762_at
AW151924
Hs.159142
lunatic fringe homolog
LFNG






(Drosophila)


284
A.219415_at
NM_020659
Hs.268728
tweety homolog 1 (Drosophila)
TTYH1


285
A.203397_s
BF063271
Hs.278611
UDP-N-acetyl-alpha-D-
GALNT3



at


galactosamine:polypeptide






N-






acetylgalactosaminyltransferase






3(GalNAc-T3)


286
A.206091_at
NM_002381
Hs.278461
matrilin 3
MATN3


287
A.217562_at
BF589529
Hs.497208
DBCCR1-like
DBCCR1L


288
B.229764_at
AW629527
Hs.338851
FLJ41238 protein
FLJ41238


289
B.232544_at
AU144916
Hs.222056
CDNA FLJ11572 fis, clone






HEMBA1003373


290
A.203819_s
AU160004
Hs.79440
IGE-II mRNA-binding protein 3
IMP-3



at


291
A.206102_at
NM_021067
Data not





found


292
A.210738_s
AF011390
Hs.5462
solute carrier family 4, sodium
SLC4A4



at


bicarbonate






cotransporter, member 4


293
B.236285_at
AI631846
Hs.137007
hypothetical protein BC009980
LOC113730


294
A.209800_at
AF061812
Hs.432448
keratin 16 (focal non-
KRT16






epidermolytic






palmoplantar keratoderma)


295
A.218211_s
NM_024101
Hs.297405
Melanophilin
MLPH



at


296
B.223361_at
AF116682
Hs.238205
chromosome 6 open reading
C6orf115






frame 115


297
B.242776_at
AA584428
Hs.12742
zinc finger, CCHC domain
ZCCHC6






containing 6


298
A.221909_at
BF984207
Data not





found


299
A.209408_at
U63743
Hs.69360
kinesin family member 2C
KIF2C


300
A.215812_s
U41163
Data not



at

found


301
B.232238_at
AK001380
Hs.121028
asp (abnormal spindle)-like,
ASPM






microcephaly






associated (Drosophila)


302
B.223126_s
AF312864
Hs.12532
chromosome 1 open reading
C1orf21



at


frame 21


303
A.212141_at
X74794
Hs.460184
MCM4 minichromosome
MCM4






maintenance deficient 4






(S. cerevisiae)


304
A.222325_at
AW974812
Hs.433049
Transcribed sequences


305
B.224314_s
AF277174
Hs.130946
egl nine homolog 1 (C. elegans)
EGLN1



at


306
A.207470_at
NM_017535
Hs.194369
arginine-glutamic acid dipeptide
RERE






(RE) repeats


307
B.228504_at
AI828648
Hs.406684
sodium channel, voltage-gated,
SCN7A






type VII, alpha


308
B.228245_s
AW594320
Hs.405557
ovostatin 2
OVOS2



at


309
A.213712_at
BF508639
Hs.58488
catenin (cadherin-associated
CTNNAL1






protein),






alpha-like 1


310
A.213998_s
AW188131
Hs.250696
DEAD (Asp-Glu-Ala-Asp) box
DDX17



at


polypeptide 17


311
B.230323_s
AW242836
Hs.355663
hypothetical protein BC016153
LOC120224



at


312
A.212713_at
R72286
Hs.296049
microfibrillar-associated protein 4
MFAP4


313
B.230316_at
R49343
Hs.430576
SEC14-like 2 (S. cerevisiae)
SEC14L2


314
A.32128_at
Y13710
Hs.16530
chemokine (C—C motif) ligand
CCL18






18






(pulmonary and activation-






regulated)


315
B.236718_at
AI278445
Hs.43334
Transcribed sequence with weak






similarity






to protein sp: P39189 (H. sapiens)






ALU2_HUMAN Alu subfamily






SB sequence






Contamination warning entry


316
B.227030_at
BG231773
Hs.371680
CDNA FLJ46579 fis, clone






THYMU3042758


317
B.235658_at
AW058580
Hs.151444
Transcribed sequences


318
B.230622_at
BE552393
Hs.100469
myeloid/lymphoid or mixed-
MLLT4






lineage






leukemia (trithorax homolog,







Drosophila);







translocated to, 4


319
A.205213_at
NM_014716
Hs.337242
centaurin, beta 1
CENTB1


320
A.221754_s
AI341234
Hs.6191
coronin, actin binding protein,
CORO1B



at


1B


321
A.214612_x
U10691
Data not



at

found


322
A.203463_s
H05668
Hs.7407
epsin 2
EPN2



at


323
B.237350_at
AW027968
Hs.454465
Similar to CDNA sequence






BC021608






(LOC143941), mRNA


324
A.220789_s
NM_004749
Hs.231411
transforming growth factor beta
TBRG4



at


regulator 4


325
A.208496_x
NM_003534
Hs.247813
histone 1, H3g
HIST1H3G



at


326
A.202992_at
NM_000587
Hs.78065
complement component 7
C7


327
A.210432_s
AF225986
Hs.300717
sodium channel, voltage-gated,
SCN3A



at


type III, alpha


328
B.239525_at
AI733041
Hs.374649
hypothetical protein
DKFZp547A023






DKFZp547A023


329
B.244344_at
AW135316
Hs.105448
protein kinase, lysine deficient 4
PRKWNK4


330
B.236773_at
AI635931
Hs.147613
Transcribed sequences


331
A.207118_s
NM_004659
Hs.211819
matrix metalloproteinase 23B
MMP23B



at


332
B.228558_at
AL518291
Data not





found


333
B.230269_at
AI963605
Hs.406256
Transcribed sequences


334
B.228262_at
AW237462
Hs.127951
hypothetical protein FLJ14503
FLJ14503


335
B.238878_at
AA496211
Hs.157208
aristaless related homeobox
ARX


336
B.228559_at
BF111626
Hs.55028
CDNA clone IMAGE: 6043059,






partial cds


337
A.204542_at
NM_006456
Hs.288215
sialyltransferase 7
SIAT7B






((alpha-N-acetylneuraminyl-2,3-






beta-galactosyl-






1,3)-N-acetyl galactosaminide






alpha-2,6-






sialyltransferase) B


338
B.224839_s
BF310919
Hs.355862
glutamic pyruvate transaminase
GPT2



at


(alanine aminotransferase) 2


339
A.209755_at
AF288395
Hs.158244
nicotinamide nucleotide
NMNAT2






adenylyltransferase 2


340
B.229019_at
AI694320
Hs.6295
zinc finger protein 533
ZNF533


341
A.218039_at
NM_016359
Hs.279905
nucleolar and spindle associated
NUSAP1






protein 1


342
A.205947_s
NM_003382
Hs.170560
vasoactive intestinal peptide
VIPR2



at


receptor 2


343
B.244107_at
AW189097
Hs.444393
Transcribed sequences


344
B.228241_at
AI827789
Hs.100686
breast cancer membrane protein
BCMP11






11


345
A.204750_s
BF196457
Hs.95612
desmocollin 2
DSC2



at


346
A.204130_at
NM_000196
Hs.1376
hydroxysteroid (11-beta)
HSD11B2






dehydrogenase 2


347
A.220119_at
NM_022140
Hs.104746
erythrocyte membrane protein
EPB41L4A






band 4.1 like 4A


348
B.230238_at
AI744123
Hs.13308
hypothetical protein LOC134548
LOC134548


349
A.204719_at
NM_007168
Hs.58351
ATP-binding cassette, sub-
ABCA8






family A






(ABC1), member 8


350
A.219961_s
NM_018474
Hs.436632
chromosome 20 open reading
C20orf19



at


frame 19


351
A.219132_at
NM_021255
Hs.44038
pellino homolog 2 (Drosophila)
PELI2


352
A.220584_at
NM_025094
Data not





found


353
B.227350_at
AI807356
Hs.127797
CDNA FLJ11381 fis, clone






HEMBA1000501


354
B.230800_at
AV699353
Hs.443428
adenylate cyclase 4
ADCY4


355
A.204709_s
NM_004856
Hs.270845
kinesin family member 23
KIF23



at


356
B.243526_at
AI968904
Hs.174373
hypothetical protein LOC349136
LOC349136


357
A.219491_at
NM_024036
Hs.148438
leucine rich repeat and
LRFN4






fibronectin






type III domain containing 4


358
A.204686_at
NM_005544
Hs.390242
insulin receptor substrate 1
IRS1


359
B.228066_at
AI870951
Hs.445574
Transcribed sequence with weak






similarity






to protein pir: I37984 (H. sapiens)






I37984






keratin 9, type I, cytoskeletal -






human


360
A.206795_at
NM_004101
Hs.42502
coagulation factor II (thrombin)
F2RL2






receptor-like 2


361
A.209464_at
AB011446
Hs.442658
aurora kinase B
AURKB


362
B.229082_at
AI141520
Data not





found


363
B.240304_s
BG484769
Hs.115838
CDNA FLJ44282 fis, clone



at


TRACH2003516


364
B.227702_at
AA557324
Hs.439760
cytochrome P450, family 4,
CYP4X1






subfamily X,






polypeptide 1


365
B.235077_at
BF956762
Hs.418271
maternally expressed 3
MEG3


366
A.202705_at
NM_004701
Hs.194698
cyclin B2
CCNB2


367
A.209616_s
S73751
Hs.278997
carboxylesterase 1
CES1



at


(monocyte/macrophage






serine esterase 1)


368
A.211441_x
AF280113
Hs.306220
cytochrome P450, family 3,
CYP3A43



at


subfamily A,






polypeptide 43


369
B.241861_at
R89089
Data not





found


370
B.228425_at
BF056746
Hs.516311
MRNA; cDNA






DKFZp686E10196






(from clone DKFZp686E10196);






complete cds


371
A.213938_at
Z38645
Hs.476384
CAZ-associated structural
CAST






protein


372
A.202409_at
X07868
Data not





found


373
A.219115_s
NM_014432
Hs.288240
Interleukin 20 receptor, alpha
IL20RA



at


374
A.39248_at
N74607
Hs.234642
Aquaporin 3
AQP3


375
B.227232_at
T58044
Data not





found


376
B.230319_at
AI222435
Hs.90250
CDNA FLJ36413 fis, clone






THYMU2010816.


377
A.203287_at
NM_005558
Hs.18141
Ladinin 1
LAD1


378
A.218009_s
NM_003981
Hs.344037
Protein regulator of cytokinesis 1
PRC1



at


379
A.222351_at
AW009884
Hs.431156
Protein phosphatase 2 (formerly
PPP2R1B






2A),






Regulatory subunit A (PR 65),






beta isoform


380
A.204794_at
NM_004418
Hs.1183
Dual specificity phosphatase 2
DUSP2


381
A.211456_x
AF333388
Data not



at

found


382
A.206296_x
NM_007181
Hs.95424
Mitogen-activated protein kinase
MAP4K1



at


kinase






Kinase kinase 1


383
A.205357_s
NM_000685
Hs.197063
Angiotensin II receptor, type 1
AGTR1



at


384
B.244385_at
AA766126
Data not





found


385
A.202235_at
NM_003051
Hs.75231
Solute carrier family 16
SLC16A1






(monocarboxylic






Acid transporters), member 1


386
B.240422_at
AI935710
Hs.530456
Transcribed sequences


387
B.230644_at
AI375083
Hs.31522
Leucine rich repeat and
LRFN5






fibronectin type III






Domain containing 5


388
A.220238_s
NM_018846
Hs.376793
Kelch-like 7 (Drosophila)
KLHL7



at


389
B.235004_at
AI677701
Hs.201619
RNA binding motif protein 24
RBM24


390
A.201397_at
NM_006623
Hs.3343
Phosphoglycerate
PHGDH






dehydrogenase


391
A.208010_s
NM_012411
Hs.87860
Protein tyrosine phosphatase,
PTPN22



at


Non-receptor type 22






(lymphoid)


392
A.210138_at
AF074979
Hs.141492
Regulator of G-protein
RGS20






signalling 20


393
A.203828_s
NM_004221
Hs.943
Natural killer cell transcript 4
NK4



at


394
A.205862_at
NM_014668
Hs.438037
GREB1 protein
GREB1


395
A.219984_s
NM_020386
Hs.36761
HRAS-like suppressor
HRASLS



at


396
A.203358_s
NM_004456
Hs.444082
Enhancer of zeste homolog 2
EZH2



at


(Drosophila)


397
B.232570_s
AL356755
Data not



at

found


398
A.212613_at
AI991252
Hs.376046
Butyrophilin, subfamily 3,
BTN3A2






member A2


399
B.238077_at
T75480
Hs.13982
Potassium channel
KCTD6






tetramerisation






Domain containing 6


400
A.217023_x
AF099143
Data not



at

found


401
B.242093_at
AW263497
Hs.97774
Synaptotagmin-like 5
SYTL5


402
B.232979_at
AK000839
Hs.306410
CDNA FLJ20832 fis, clone






ADKA03033


403
B.232286_at
AA572675
Hs.188173
CDNA FLJ12187 fis, clone






MAMMA1000831


404
A.203223_at
NM_004703
Hs.390163
Rabaptin, RAB GTPase binding
RABEP1






effector protein 1


405
B.225834_at
AL135396
Hs.339665
Similar to RIKEN cDNA
MGC57827






2700049P18 gene


406
A.205591_at
NM_006334
Hs.74376
Olfactomedin 1
OLFM1


407
B.228058_at
AI559190
Hs.105887
Similar to common salivary
LOC124220






protein 1


408
A.207828_s
NM_005196
Data not



at

found


409
A.222379_at
AI002715
Hs.348522
Potassium voltage-gated
KCNE4






channel,






Isk-related family, member 4


410
A.210084_x
AF206665
Hs.405479
Tryptase, alpha
TPS1



at


411
B.233249_at
AU155297
Hs.287562
CDNA FLJ13313 fis, clone






OVARC1001489


412
B.232948_at
AU147218
Hs.297369
CDNA FLJ12111 fis, clone






MAMMA1000025


413
B.229033_s
AA143060
Hs.454758
Melanoma associated antigen
MUM1



at


(mutated) 1


414
B.229623_at
BF508344
Hs.112742
CDNA clone IMAGE: 6301163,






containing






Frame-shift errors


415
A.222339_x
AI054381
Hs.293379
Transcribed sequences



at


416
A.205347_s
NM_021992
Hs.56145
Thymosin, beta, identified in
TMSNB



at


neuroblastoma






Cells


417
B.229245_at
AA535361
Hs.343666
Phosphoinositol 3-phosphate-
PEPP3






binding






Protein-3


418
B.225491_at
AL157452
Hs.349088
Solute carrier family 1 (glial
SLC1A2






high affinity






Glutamate transporter), member 2


419
B.239594_at
BF110735
Data not





found


420
A.213906_at
AW592266
Hs.300592
v-myb myeloblastosis viral
MYBL1






oncogene






homolog (avian)-like 1


421
B.223757_at
AF305836
Hs.406958
Deiodinase, iodothyronine, type
DIO3OS






III opposite






Strand


422
B.242296_x
BF594828
Hs.91145
Transcribed sequences



at


423
B.236312_at
AA938184
Hs.44380
Transcribed sequence with weak






similarity






to protein ref: NP_071385.1






(H. sapiens)






hypothetical protein FLJ20958






[Homo sapiens]


424
B.227529_s
BF511276
Hs.197081
A kinase (PRKA) anchor protein
AKAP12



at


(gravin) 12


425
A.221928_at
AI057637
Hs.234898
acetyl-Coenzyme A carboxylase
ACACB






beta


426
B.244013_at
AI084430
Hs.113919
Hypothetical protein
LOC374969






LOC374969


427
A.219769_at
NM_020238
Hs.142179
inner centromere protein
INCENP






antigens 135/155 kDa


428
B.239758_at
AI142126
Hs.26125
Transcribed sequences


429
B.239913_at
AI421796
Hs.132591
solute carrier family 10
SLC10A4






(sodium/bile acid






cotransporter family), member 4


430
A.211226_at
AF080586
Hs.158351
galanin receptor 2
GALR2


431
A.206023_at
NM_006681
Hs.418367
Neuromedin U
NMU


432
A.210538_s
U37546
Data not



at

found


433
B.232277_at
AA643687
Hs.149425
solute carrier family 28 (sodium-
SLC28A3






coupled






nucleoside transporter), member 3


434
A.207339_s
NM_002341
Hs.376208
Lymphotoxin beta (TNF
LTB



at


superfamily, member 3)


435
A.37145_at
M85276
Data not





found


436
B.243837_x
AA639707
Hs.443239
Transcribed sequences



at


437
A.221198_at
NM_021920
Data not





found


438
B.233442_at
AU147500
Hs.287499
CDNA FLJ12196 fis, clone






MAMMA1000867


439
B.232545_at
AF176701
Hs.442734
F-box and leucine-rich repeat
FBXL9






protein 9


440
B.238323_at
BG387172
Hs.528776
TEA domain family member 2
TEAD2


441
B.231993_at
AK026784
Hs.301296
CDNA: FLJ23131 fis, clone






LNG08502


442
B.224212_s
AF169689
Hs.247734
Protocadherin alpha 2
PCDHA2



at


443
B.231560_at
D59759
Data not





found


444
A.201195_s
AB018009
Hs.184601
solute carrier family 7 (cationic
SLC7A5



at


amino acid






transporter, y+ system), member 5


445
B.239185_at
AI284184
Hs.388917
ATP-binding cassette, sub-
ABCA9






family A (ABC1),






member 9


446
B.232776_at
AU145289
Hs.193223
CDNA FLJ11646 fis, clone






HEMBA1004394


447
A.212865_s
BF449063
Hs.512555
collagen, type XIV, alpha 1
COL14A1



at


(undulin)


448
B.228750_at
AI693516
Hs.28625
Transcribed sequences


449
B.241577_at
AI732794
Data not





found


450
A.209125_at
J00269
Data not





found


451
B.238898_at
BG028463
Hs.163734
Transcribed sequences


452
A.203548_s
BF672975
Hs.180878
lipoprotein lipase
LPL



at


453
B.230363_s
BE858808
Hs.52463
inositol polyphosphate-5-
INPP5F



at


phosphatase F


454
A.221111_at
NM_018402
Hs.272350
interleukin 26
IL26


455
B.226597_at
AI348159
Hs.76277
polyposis locus protein 1-like 1
DP1L1


456
A.218169_at
NM_018052
Hs.445061
Hypothetical protein FLJ10305
FLJ10305


457
A.206107_at
NM_003834
Hs.65756
regulator of G-protein signalling
RGS11






11


458
B.230158_at
AA758751
Hs.484250
Hypothetical protein FLJ32949
FLJ32949


459
B.244706_at
AA521309
Hs.380763
similar to hypothetical protein
LOC115294






FLJ10883


460
B.228648_at
AA622495
Hs.10844
leucine-rich alpha-2-
LRG1






glycoprotein 1


461
B.237047_at
AI678049
Hs.508819
CDNA FLJ40458 fis, clone






TESTI2041778


462
A.205671_s
NM_002120
Hs.1802
major histocompatibility
HLA-DOB



at


complex, class II,






DO beta


463
A.217167_x
AJ252550
Data not



at

found


464
A.205399_at
NM_004734
Hs.21355
Doublecortin and CaM kinase-
DCAMKL1






like 1


465
B.236646_at
BE301029
Hs.226422
Hypothetical protein FLJ31166
FLJ31166


466
A.203354_s
AW117368
Hs.408177
ADP-ribosylation factor guanine
EFA6R



at


nucleotide






factor 6


467
B.237252_at
AW119113
Hs.2030
Thrombomodulin
THBD


468
A.206341_at
NM_000417
Hs.130058
interleukin 2 receptor, alpha
IL2RA


469
A.210525_x
BC001787
Hs.123232
Chromosome 14 open reading
C14orf143



at


frame 143


470
A.214897_at
AB007975
Hs.492779
MRNA, chromosome 1 specific






transcript






KIAA0506.


471
A.203362_s
NM_002358
Hs.79078
MAD2 mitotic arrest deficient-
MAD2L1



at


like 1 (yeast)


472
B.230874_at
AI241896
Hs.48653
CDNA FLJ39593 fis, clone






SKNSH2001222


473
B.224396_s
AF316824
Hs.435655
asporin (LRR class 1)
ASPN



at


474
A.208305_at
NM_000926
Hs.2905
Progesterone receptor
PGR


475
B.223867_at
AF334676
Hs.414648
tektin 3
TEKT3


476
A.211363_s
AF109294
Hs.459541
Methylthioadenosine
MTAP



at


phosphorylase


477
B.232267_at
AL162032
Hs.23644
G protein-coupled receptor 133
GPR133


478
B.244121_at
BE835502
Data not





found


479
B.242808_at
AI733287
Hs.203755
Transcribed sequence with






moderate similarity






to protein sp: P12947






(H. sapiens)






RL31_HUMAN 60S ribosomal






protein L31


480
A.215465_at
AL080207
Hs.134585
ATP-binding cassette, sub-
ABCA12






family A (ABC1),






member 12


481
A.210244_at
U19970
Hs.51120
Cathelicidin antimicrobial
CAMP






peptide


482
A.204603_at
NM_003686
Hs.47504
Exonuclease 1
EXO1


483
B.232986_at
AC074331
Data not





found


484
B.225241_at
BG253437
Hs.356289
steroid sensitive gene 1
URB


485
B.230760_at
BF592062
Hs.169859
zinc finger protein, Y-linked
ZFY


486
A.209480_at
M16276
Hs.409934
major histocompatibility
HLA-DQB1






complex,






class II, DQ beta 1


487
A.206664_at
NM_001041
Hs.429596
Sucrase-isomaltase (alpha-
SI






glucosidase)


488
A.206291_at
NM_006183
Hs.80962
Neurotensin
NTS


489
A.222085_at
AW452357
Hs.27373
Hypothetical gene supported by
LOC400451






AK075564;






BC060873


490
A.214899_at
AC007842
Data not





found


491
B.240174_at
BF512871
Hs.193522
Transcribed sequence with






moderate






Similarity to protein sp: P39188






(H. sapiens)






ALU1_HUMAN Alu subfamily






J sequence






Contamination warning entry


492
A.219148_at
NM_018492
Hs.104741
T-LAK cell-originated protein
TOPK






kinase


493
B.226303_at
AA706788
Hs.46531
Phosphoglucomutase 5
PGM5


494
B.222848_at
BC005400
Hs.164018
Leucine zipper protein FKSG14
FKSG14


495
A.202270_at
NM_002053
Hs.62661
Guanylate binding protein 1,
GBP1






interferon-inducible,






67 kDa


496
A.205266_at
NM_002309
Hs.2250
leukemia inhibitory factor
LIF






(cholinergic






differentiation factor)


497
B.239008_at
AW606588
Hs.430335
Transcribed sequence with weak






similarity to protein sp: P39195






(H. sapiens) ALU8_HUMAN






Alu subfamily SX sequence






contamination warning entry


498
B.228194_s
AI675836
Hs.348923
sortilin-related VPS10 domain
SORCS1



at


containing






receptor 1


499
A.215514_at
AL080072
Hs.21195
MRNA; cDNA






DKFZp564M0616 (from clone






DKFZp564M0616)


500
A.219010_at
NM_018265
Hs.73239
Hypothetical protein FLJ10901
FLJ10901










The 500-gene classifier: The genes are ranked according to their correlation with p53 status. The genes are identified by their GenBank Accession Nos., Affymetrix Probeset IDs, Unigene IDs, Unigene Names and Unigene Symbols.


For sequences and SEQ ID NOs for the genes described in Table 1, see FIGS. 9-508 in which each of the sequences for the above genes is shown and is associated with a GenBank Accession No., Unigene ID, and/or a Unigene Name, and a SEQ ID NO.


Example 3
The p53 Classifier has Significant Accuracy in Two Independent Datasets

The performance of the p53 classifier in the context of independent datasets was then evaluated. FIG. 3 shows that genes of the classifier can predict p53 status in independent cDNA microarray datasets. (A) A 9-gene subset of the 32-gene classifier can predict p53 status in an independent breast cancer dataset. 9 genes of our classifier were selected based on their presence in 50% or more of the tumors. The tumors used in the analysis were required to have expression data present for >50% of the genes. (B) An 8-gene subset of the p53 classifier can predict p53 status in an independent liver cancer dataset. 8 overlapping genes were selected based on their presence in 90% or more of the tumors. The tumors used in the analysis were required to have expression data present for >50% of the genes. (A&B) Black vertical bars indicate p53 mutant status. Gene symbols (Unigene build #167) and corresponding IMAGE clone IDs (from the original studies) are listed. The hierarchical clustergrams are shown. Genes (rows) and tumors (columns) were clustered. In the tumor dendrograms, the green branch denotes the wildtype-like configurations, and the red branch the mutant-like profiles.


Two publicly available microarray datasets where p53 status was known, were therefore accessed: a breast cancer study by Sorlie et al (Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100, 8418-23 (2003), incorporated herein by reference) and a liver cancer study by Chen et al (Chen, X. et al. Gene expression patterns in human liver cancers. Mol Biol Cell 13, 1929-39 (2002), incorporated herein by reference). Both studies were conducted on cDNA microarray platforms.


In the Sorlie dataset, 69 breast tumors were sequenced for p53 mutations. This subset of tumors was queried for the availability of expression data corresponding to the genes of the classifier. Twenty-eight genes in the classifier mapped to UniGene IDs (build #167). Though over half of these genes mapped to the Sorlie et. al. microarray, few were expressed in the majority of the tumors, and a number of tumors possessed measurements for less than half of the genes. Only 9 genes in the classifier were found to correspond to cDNA probes (representing 9 different genes) having expression measurements present in >50% of the tumors, where the tumors possessed measurements for >50% of the genes (resulting in a subset of 44 well-sampled tumors). Using this 9-gene subset of the classifier to hierarchically cluster the tumors (FIG. 3A), 77% of the p53 mt tumors clustered into one branch, and 77% of the wildtypes clustered into the other (pcs=3.0×10−4) recapitulating the robust predictive capability of the classifier.


A cDNA-microarray based liver cancer dataset where p53 status was ascertained by immunohistochemistry, IHC (Chen, X. et al. Gene expression patterns in human liver cancers. Mol Biol Cell 13, 1929-39 (2002), incorporated herein by reference) was next analyzed. In this study, p53 protein levels were ascertained by IHC. Here, 8 classifier genes could be mapped to all 59 tumors assayed for p53 status (with each gene having data present in 90% or more of all tumors, and where each tumor contained data for >50% of the genes). With similar statistical significance as that seen in the breast cancer dataset (i.e, pfe=3.5×10−4), this 8-gene subset of the classifier was able to cluster the HCC samples into two predominant clusters correlated with p53 status: 87% of the mutants in one cluster, and 61% of the wildtypes in the other (FIG. 3B). Together, these observations suggest that the genes comprising the p53 classifier are robust in their ability to classify not only breast tumors based on p53 status, but also liver cancers, and therefore may have generalizable utility in predicting p53 status in other cancer types.













TABLE 2





Genbank
Affymetrix
UniGene ID

UniGene


Accession No.
Probeset ID
(build #171)
UniGene Name (build #167)
Symbol







AI961235
B.235343_at
Hs.96885
Hypothetical protein FLJ12505
FLJ12505


BG271923
B.238581_at
Hs.237809
Guanylate binding protein 5
GBP5


NM_002466
A.201710_at
Hs.179718
v-myb myeloblastosis viral
MYBL2





oncogene homolog (avian)-like 2


BC001651
A.221520_s_at
Hs.48855
Cell division cycle associated 8
CDCA8


D38553
A.212949_at
Hs.308045
Barren homolog (Drosophila)
BRRN1


AK000345
A.214079_at
Hs.272499
Dehydrogenase/reductase (SDR
DHRS2





family) member 2


AA742697
B.230378_at
Hs.62492
Secretoglobin, family 3A, member 1
SCGB3A1


AL080170
A.215047_at


BF245284
B.238746_at
Hs.354427
Transcribed sequences


BC004504
A.221585_at
Hs.331904
Calcium channel, voltage-
CACNG4





dependent, gamma subunit 4


H15261
B.243929_at
Hs.21948
Transcribed sequences


NM_000909
A.205440_s_at
Hs.519057
Neuropeptide Y receptor Y1
NPY1R


NM_024843
A.217889_s_at
Hs.31297
Cytochrome b reductase 1
CYBRD1


R73030
B.230863_at
Hs.252938
Low density lipoprotein-related
LRP2





protein 2


NM_030896
A.221275_s_at


AI435828
A.203438_at
Hs.155223
Stanniocalcin 2
STC2


AL512727
A.215014_at
Hs.232127
MRNA; cDNA DKFZp547P042 (from clone





DKFZp547P042)


AW242997
B.229030_at


AI810764
B.229150_at
Hs.102406
Transcribed sequences


AI922323
B.228969_at
Hs.226391
Anterior gradient 2 homolog
AGR2





(Xenopus laevis)


AL360204
B.232855_at
Hs.283853
MRNA full length insert cDNA clone





EUROIMAGE 980547


NM_003225
A.205009_at
Hs.350470
Trefoil factor 1 (breast cancer,
TFF1





estrogen-inducible sequence





expressed in)


NM_003226
A.204623_at
Hs.82961
Trefoil factor 3 (intestinal)
TFF3


AW299538
B.227081_at
Hs.75528
Nucleolar GTPase
HUMAUAN






TIG


NM_003462
A.205186_at
Hs.406050
Dynein, axonemal, light
DNALI1





intermediate polypeptide 1


AI990465
A.205734_s_at
Hs.38070
Lymphoid nuclear protein related
LAF4





to AF4


NM_004392
A.205472_s_at
Hs.63931
Dachshund homolog (Drosophila)
DACH1


NM_001267
A.206869_at
Hs.97220
Chondroadherin
CHAD


AF269087
B.223864_at
Hs.326736
Breast cancer antigen NY-BR-1
NY-BR-1


AI826437
B.229975_at
Hs.283417
Transcribed sequences


AL355392
B.226067_at


AU156421
B.233413_at
Hs.518736
CDNA FLJ13457 fis, clone





PLACE1003343.










Optimized 32-gene p53 Classifier: The genes are identified by their GenBank Accession Nos., Affymetrix Probeset IDs, Unigene IDs, Unigene Names and Unigene Symbols.


Example 4
The p53 Classifier is a Greater Prognostic Indicator of Patient Outcome than p53 Mutation status Alone

It is widely accepted that in breast cancer and other tumor types p53 status is prognostic of clinical outcomes such as tumor recurrence, patient survival, and therapeutic response. The hypothesis that a classifier based on p53 activity would out-perform p53 mutation status alone as a prognostic indicator of clinical outcomes was tested. FIG. 4 shows that the p53 classifier has greater prognostic significance than p53 mutation status alone. Kaplan-Meier survival curves are shown for patients classified according to (A) p53 mutation status, (B&C) the p53 classifier, or (D) both. The clinical endpoint was death from breast cancer (ie, disease-specific survival). In A,B, and D all 257 patients were assessed; in C, only the 198 patients with p53 wildtype tumors were assessed. The Wald test (pw) was used to assess significance of the hazard ratios (HR).


The classifier and sequence-level p53 mutation status were compared with respect to their abilities to predict disease-specific survival (DSS) in all 257 patients of the Uppsala cohort regardless of treatment type or clinical stage.


The significance of the hazard ratio generated using the p53 classifier to segregate patients was an order of magnitude greater than that obtained using p53 mutation status alone (pw=0.00057 versus pw=0.012, respectively) (FIG. 4 A&B); notably, this improved p-value was statistically significant at pmc=0.0046. Furthermore, the p53 classifier could also significantly segregate patients into low and high risk groups in the subset of 198 women confirmed by sequencing to have wildtype p53 (pw=0.016) (FIG. 4C) indicating that those with p53 wt tumors classified as mutant-like have poorer DSS than those with wt tumors of the wt-like class. In FIG. 4D, survival curves among all four tumor subgroups were compared. Notably, it was observed that patients with p53 mt or wt tumors classified as mt-like (green and blue curves, respectively) have similar overall survival curves, while the twelve with p53 mt tumors classified as wt-like (red curve) show a survival curve that falls between that of the group with mutant-like p53 mt tumors (green curve) and that of the group with wt-like p53 wt tumors (black curve) and is not significantly different from either curve (pw=0.47 for mt/mt-like comparison and pw=0.37 for wt/wt-like comparison).


Next, the prognostic significance of the classifier on the Sorlie et al cDNA microarray dataset was examined (Sorlie, T. et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100, 8418-23 (2003), incorporated herein by reference). FIG. 5 shows that the p53 classifier has strong prognostic significance in an independent dataset of late-stage tumors. Tumors were hierarchically classified according to the 9-gene partial classifier described in FIG. 3 and analyzed for correlations with survival outcomes: (A) hierarchical clustergram of 76 tumors from the Sorlie et al dataset; the black branch of the tumor dendrogram denotes the wildtype-like configuration, and the red branch the mutant-like profile. Shown are Kaplan-Meier estimates for (B) disease-specific survival and (C) disease-free survival, where patient groups were determined according to the green and red branches of the tumor dendrogram in (A).


Here, the 9-gene partial classifier that could distinguish mt and wt tumors both with 77% accuracy, was used to hierarchically cluster 76 well-sampled tumor specimens with associated patient survival information (FIG. 5A). Importantly, the majority of these tumors (>80%) are derived from two independent prospective studies on chemotherapeutic response of stage III patients with locally advanced breast cancer (T3/T4 and/or N2). The tumors clustered into two predominant branches with 31 tumors in the wt-like cluster and 44 tumors in the mutant-like cluster. Grouping the patients according to these tumor profiles, the Kaplan-Meier survival curves for disease-specific and disease-free survival (FIGS. 5B& C) were both highly significant in this cohort (pw=0.00008 (DSS) and pw=0.00005 (DFS)). Remarkably, the 31 patients in the p53 wt-like cluster showed a 90% probability of surviving their breast cancer for a period of 7 years compared to a 35% probability of 7-year survival for the 44 patients in the p53 mt-like group (FIG. 5B). Thus, in this predominantly stage III patient population, the partial classifier can accurately predict not only which patients will relapse and die, but also which late stage patients will survive their cancer.


For hierarchical cluster analysis, log expression values were mean centered and normalized, and genes and tumors were clustered using the Pearson correlation metric and average linkage (Cluster and TreeView software courtesy Dr. Michael Eisen; software available on Lawrence Berkeley National Laboratory, UC Berkeley's website). For survival analysis, patients were stratified according to the p53 classifier output or, as in one case, according to p53 mutation status. The Kaplan Meier estimate was used to compute survival curves for the different patient groups and the Wald Test was used to assess the statistical significance of the resultant hazard ratio. The FIG. 4 survival analysis assesses the probability of achieving, by chance alone, the more significant Wald p-value of 0.00057 generated using the group assignments as determined by the p53 classifier (panel B) compared to p=0.012 using p53 status alone (panel A). In 100,000 iterative runs, 40 tumors were randomly selected (ie, the number of tumors that differed in group assignment between panel A and B), their p53 status inverted, and the Wald p-values computed for each run. A p-value ≦0.00057 was obtained only 564 times. The Monte Carlo p-value for this observation is estimated to be 0.0046.


For association tests (i.e., to ascertain the significance of the number of observed events in two or more groups), the Chi-square test was employed. When the number of events was sufficiently small (<5) in any category, Fisher's Exact test was applied instead of Chi-square test.


For the statistical analysis of expression levels for p53 downstream target genes and upstream effectors, two-tailed two-group T tests were employed to determine differentially expressed genes between the p53 wt and mt tumors (FIG. 8). One-tailed two group t-tests were performed for comparisons between the p53 wt tumors in the mt-like class and the p53 wt tumors in the wt-like class (and vice versa) to test whether the genes were significantly differentially expressed in the same direction (or opposite direction) as that observed between the p53 wildtypes and mutants.


It would be evident to one of skill in the art that the method embodiments of the present invention are not limited to the statistical methods disclosed herein. Embodiments of the present invention encompass equivalent analytical methods. The p-value abbreviations used herein include:


pwr=Wilcoxon rank-sum test


pt=T test


pcs=Chi-square test


pfe=Fisher's Exact test


pw=Wald test


pmc=Monte Carlo estimate


Promoter analysis for p53 binding sites was performed on each of the classifier genes with a known transcription start site (TSS). BEARR (Vega, V. B., Bangarusamy, D. K., Miller, L. D., Liu, E. T. & Lin, C. Y. BEARR: Batch Extraction and Analysis of cis-Regulatory Regions. Nucleic Acids Res 32, W257-60 (2004), incorporated herein by reference) was used to extract promoter sequences (3000 bp upstream to 500 bp downstream of the TSS) and predict putative binding sites using the P53 position weight matrix obtained from TRANSFAC (Kel, A. E. et al. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31, 3576-9 (2003), incorporated herein by reference) version 6.0 (Matrix accession: M00272) as well as simple pattern search based on the canonical p53 binding site consensus 5′-RRRCWWGYYYN(0-13)RRRCWWGYYY-3′ (el-Deiry, W. S., Kern, S. E., Pietenpol, J. A., Kinzler, K. W. & Vogelstein, B. Definition of a consensus binding site for p53. Nat Genet 1, 45-9 (1992), incorporated herein by reference.


Example 5
The p53-Deficiency Classifier, but not P53 Status Alone, is Significantly Correlated with Outcome in Endocrine-Treated Patients

To further test the robustness of the classifier in predicting patient outcome, its performance in other relevant therapeutic treatment groups was analyzed. Recently, it has been observed that p53 mt breast tumors show greater resistance to endocrine therapy than p53 wt tumors, and this has been explained, in part, by the uncoupling of p53-dependent apoptosis in the resistant tumors (Berns, E. M. et al. Complete sequencing of TP53 predicts poor response to systemic therapy of advanced breast cancer. Cancer Res 60, 2155-62 (2000), incorporated herein by reference). To test the ability of the classifier to predict outcome in a hormone therapy-specific patient cohort, a subpopulation of the Uppsala cohort consisting of 68 ER+ patients who received only adjuvant tamoxifen treatment following surgery, was examined. FIG. 6 shows that the p53 classifier has greater prognostic significance than p53 mutation status in endocrine-treated patients. Sixty-eight ER+, endocrine-treated patients were classified according to (A) p53 mutation status or (B) the p53 classifier and analyzed for correlations with disease-specific survival (DSS). Kaplan-Meier survival estimates are shown. As shown in the survival analysis in FIGS. 6A&B, it was observed that the classifier was a significant predictor of disease-specific survival (pw=0.047), while p53 mutation status alone was not (pw=0.395).


Next, the prognostic performance of the classifier on a set of 97 breast tumors published by van't Veer et al (van't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-6 (2002), incorporated herein by reference) was examined. FIG. 7 shows that the p53 classifier is prognostic of distant recurrence in an independent set of early-stage locally-treated breast tumors. 97 tumors from a Dutch cohort (van't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-6 (2002), incorporated herein by reference) of early-stage patients treated with postoperative adjuvant radiotherapy and followed for a period of at least 5 years were hierarchically clustered using a set of probes corresponding to 21 genes of the optimized classifier. The predominant cluster nodes are demarcated by color and “C” designations (i.e., C1-C5). Black arrows correspond to tumors from patients who developed a distant metastasis (DM) within 5 years. Gene symbols and corresponding Genbank accession numbers are shown. Hierarchical clustering was performed as described previously.


Here, all of the samples were controlled for clinical uniformity, i.e., <5 cm in size (T1/T2), with no advanced disease (pN0), from patients less than 55 years of age at diagnosis, treated by surgery and subsequent radiotherapy only (with the exception of 5 patients who received adjuvant systemic therapy). From the 32-gene classifier, 24 probes corresponding to 21 genes could be mapped to all 97 tumors with survival information. Upon clustering the tumors, approximately 4 clusters with similar average distance correlations were observed that significantly distinguished patients who would develop a distant metastasis within 5 years (pfe=2.2×10−4) (FIG. 7). Notably, of the 26 tumors in cluster 1, which bear the molecular configuration of p53 mt-like tumors, 73% had a distant metastasis within 5 years, compared to 26% of 39 tumors in cluster 3, which most closely resemble the p53 wt-like molecular configuration. These findings suggest that the p53 classifier is prognostic of tumor recurrence in early stage, locally-treated breast cancer.


Example 6
Analysis of Classifier Gene Functions

To gain some mechanistic insights, the functional annotations of the classifier genes were analysed for clues to explain the correlation between their expression levels and p53 status and patient outcome. Surprisingly, it was found that none of the classifier genes are known transcriptional targets of p53, nor have they been previously implicated in the p53 pathway. Promoter analysis of the 21 genes with defined promoter regions revealed no evidence of the canonical p53 binding site, or recently described novel p53 binding sites, within any of the promoters.


Twelve of the genes are of unknown function. However, of the characterized genes, a number are associated with cell growth and proliferation (MYBL2, TFF1, BRRN1, CHAD, SCGB3A1, DACH, CDCA8), transcription (LAF4, NY-BR-1, DACH, MYBL2), ion transport (CACNG4, CYBRD1, LRP2), and breast cancer biology (SCGB3A1, TFF1, STC2, NY-BR-1, AGR2). Speculatively, some of these genes may contribute mechanistically to the poor prognosis of the p53 mutant-like tumors. For example, MYBL2, which was observed to be upregulated in the p53 mutant-like tumors, is a growth-promoting transcription factor closely related to the c-MYB oncogene. It maps to a chromosomal region frequently amplified in breast cancer (20q13) and has previously been reported to be overexpressed in breast cancer cell lines and sporadic ovarian carcinomas (Forozan, F. et al. Comparative genomic hybridization analysis of 38 breast cancer cell lines: a basis for interpreting complementary DNA microarray data. Cancer Res 60, 4519-25 (2000) and Tanner, M. M. et al. Frequent amplification of chromosomal region 20q12-q13 in ovarian cancer. Clin Cancer Res 6, 1833-9 (2000), both of which are incorporated herein by reference. SCGB3A1 (HIN1), which was observed to be downregulated in the p53 mutant-like tumors, is a putative tumor suppressor gene that can inhibit breast cancer cell growth when overexpressed and has been found to be transcriptionally silenced by hypermethylation of its promoter in early stages of breast tumorigenesis (Krop, I. E. et al. HIN-1, a putative cytokine highly expressed in normal but not cancerous mammary epithelial cells. Proc Natl Acad Sci USA 98, 9796-801 (2001), incorporated herein by reference).


Example 7
Nature of Misclassified Tumors

It was observed that a number of cancers with wild type p53 sequence status were classified as p53 mutant by expression profiling using the 32-gene classifier. If the “misclassified” p53 wt tumors were in fact p53 deficient, they would possess certain molecular characteristics reflective of perturbations of the p53 pathway, and these characteristics would be found in the majority of p53 mutant tumors. First, the possibility that p53 deficiency could result from reduced transcript levels either by transcriptional repression of the p53 gene (TP53) or by the shortening of its mRNA half-life, was considered. The t test was used to compare the relative expression levels of TP53 (using the TP53 probe-sets present on the microarray) among the different tumor classes (FIG. 8). Indeed, consistent with this hypothesis, it was observed that the overall expression level of TP53 was significantly reduced in the 28 wt tumors classified as mt-like compared to the remaining 170 wt tumors classified as wt-like (pt=1.4×10−04). No statistically significant difference in expression levels was observed between the p53 mt tumors correctly classified as mt-like and all wt tumors, consistent with the fact that TP53 mRNA levels are not commonly reduced in p53 mutant breast cancers.



FIG. 8 shows that transcript levels of p53, its transcriptional targets, and its upstream effectors distinguish known and predicted classes. Expression levels of p53 pathway-relevant genes were examined. The statistical significance of transcript levels between the different tumor classes was determined by t test and is shown in a summary table to the right of the figure. The 4 tumor classes are as follows: 1) 47 p53 mt tumors classified as mutant, 2) 28 p53 wt tumors classified as mutant, 3) 170 p53 wt tumors classified as wildtype, and 4) 12 p53 mt tumors classified as wildtype. Statistical measurements in the summary shown in grey did not reach significance at p<0.05.


Table 3 shows a comparative analysis of p53 mutations. (I) Severe mutations were defined as insertions, deletions, or stop codons. Of the remaining missense point mutations (mpms; 11 in the wt-like group, 27 in the mt-like group) we determined the frequency of occurrence of (II) the most common missense point mutations in p53 as defined by the IARC TP53 Mutation Database (available online on the website of the International Agency for Research on Cancer, IARC), and (III) mutants previously shown, in vitro, to possess dominant negative activity were determined. P-values were calculated using Fisher's Exact test.


This strategy was applied to known transcriptional targets of p53, which were hypothesized to show altered transcription in p53-deficient tumors to some extent. Indeed, a number of p53 target genes demonstrated altered patterns of expression (FIG. 8). The TP53-inducible genes TP53INP1, SEMA3B, PMAIP1 (NOXA), FDXR, CCNG1, and LRDD, all of which contain functional p53-binding sites in their promoters, showed significantly lower expression in the 28 wt tumors classified as mt-like compared to the other wildtypes (all at pt<0.05). Moreover, all but one of these genes were also significantly reduced in the p53 mt tumors classified as mt-like (compared to all wt tumors); and in all but two cases, these genes showed significantly higher expression in the 12 mt tumors classified as wt-like when compared to the other mutants.


CHEK1 and CHEK2, both positive upstream effectors of p53 that phosphorylate p53 and thereby promote its stabilization, are known to be transcriptionally repressed by p53. A significant increase in the mRNA levels of these genes in both the p53 wt and mt tumors of the mutant-like class was observed. It was also observed that the 12 mt tumors misclassified as wildtype-like displayed significantly lower expression of these genes compared to the other 47 p53 mutants. Notably, no differential expression of the p53-regulated genes CDKN1A (p21), GADD45, PPM1D (WIP1), TP5313 (PIG3), TNFRSF6, BBC3 (PUMA), APAF1 or BCL2 was observed in these breast tumor specimens.


Taken together, these data suggest that the classifier can distinguish tumors based on some aspects of p53 transcriptional activity that are inhibited in both the p53 mutant and wildtype tumors of the mutant-like class, yet operative in the p53 wildtype tumors (and to some extent the 12 p53 mutant tumors) of the wildtype-like class.


Perhaps paradoxically, it was observed that the p53-inducible genes PERP, BAX and SFN (14-3-3 sigma) were all expressed at significantly higher levels in the 28 misclassified wt tumors, rather than at lower levels like their inducible gene counterparts described above. However, the significant overexpression of these genes in the p53 mt tumors classified as mutant-like was also observed, suggesting that in breast cancer, these genes may be induced by alternate regulatory mechanisms in the context of mutant or deficient p53.


Intriguingly, another positive upstream effector of p53, ATR, which is thought to enhance p53 activity in a manner similar to that of CHEK1 and CHEK2, was also found expressed at significantly higher levels in the p53 mutants and p53 wt tumors of the mutant-like class, even though this gene is not known to be modulated in a p53-dependent manner. Of note, no significant differences in the expression levels of the upstream effectors, ATM or PRKDC (DNA-PK) were observed.


The expression levels of other upstream modulators of p53 activity were then examined in order to ascertain possible alternate mechanisms by which p53 expression and activity might be reduced in the mutant-like p53 wt tumors. First, it was observed that several known positive regulators of p53 transactivation were significantly reduced in both the wildtypes and mutants of the mutant-like class including HOXA5, USF1, EGR1 and TP53BP1. HOXA5, USF1, and EGR1 are all transcription factors known to bind the p53 promoter and enhance its expression. Interestingly, deficiencies in all three have previously been implicated in breast carcinogenesis. Recently the coordinate loss of both p53 and HOXA5 mRNA and protein expression was observed in a panel of human breast cancer cell lines, and the HOXA5 promoter was found to be methylated in 16 of 20 p53-negative human breast tumors. USF1, which is structurally related to the c-Myc oncoprotein, has been found to have reduced transcriptional activity in breast cancer cell lines, and has recently been shown to activate the expression of estrogen receptor alpha. EGR1, a DNA damage-responsive gene with antiproliferative and apoptotic functions, can inhibit tumorigenicity when exogenously expressed in human breast cancer cells, and has been observed to have reduced expression in human and mouse breast cancer cell lines and tumors. TP53BP1 is not thought to be a transcription factor, but rather a BRCT domain-containing substrate of ATM that is phosphorylated in response to DNA damage. This gene product is known to bind the central DNA-binding domain of p53 and thus enhance the transcriptional activation of p53 target genes. A significantly reduced expression of all four genes in the 28 p53 wt tumors classified as mutant-like was found, and in the cases of USF1 and TP53BP1, significantly higher expression in the p53 mutants classified as wildtype-like. Interestingly, it was also observed that their expression levels are also significantly lower in the 47 p53 mt tumors classified as mutant-like, suggesting a possible positive feedback loop whereby wildtype p53 can enhance expression of these genes and impaired p53 cannot. Together, these observations suggest the possibility that either acting separately or in combination, these genes may be important for intact p53 activity in the breast, and when transcriptionally silenced, contribute to p53 deficiency.


Finally, the expression of several known negative regulators of p53 activity were examined. Notably, MDM2, which negatively regulates p53 through phosphorylation-mediated degradation of the p53 protein, and whose overexpression at the protein level has been implicated in a variety of cancers, was not found to be differentially expressed at the transcript level in the experiments described herein. However, both PLK1 and GTSE1 were. The M-phase regulator PLK1 has recently been shown to bind to the DNA-binding domain of p53 and thus inhibit its transcriptional activity in vitro. GTSE1 (B99) binds the C-terminal regulatory domain of p53 causing the inhibition of p53 transactivation function as well as a reduction of intracellular levels of p53 protein. Intriguingly, the transcript levels of both genes were among the most highly significantly overexpressed in both p53 wt and mt tumors of the mt-like class, suggesting a possible role for these gene products in suppression of p53 function in breast carcinogenesis.


The spectrum of p53 mutations for correlations that might explain the misclassification of the 12 p53-mutant tumors as wildtype-like was next analyzed. First, it was observed that only one mutation was common to the wildtype-like and the mutant-like tumors: a Tyr>Cys at amino acid 220 in the DNA-binding domain. Of the 47 p53 mt tumors correctly classified as mutants, it was observed that 42% (20/47) possessed “severe” mutations defined as insertions (n=2), deletions (n=11) and stop codons (n=7) (Table 3-I) resulting in frameshifts and subsequent trunctation, whereas in the 12 mutants classified as wildtype-like, only 1 (8%) contained a severe mutation: a 3-bp insertion in the DNA-binding domain resulting in the inframe addition of a glycine residue (pfe=0.025). Using the IARC TP53 Mutation Database (available online on the website of the International Agency for Research on Cancer, IARC), which, as of June 2003, has indexed 18,585 somatic and 225 germline mutations of p53, the frequencies of occurrence of the most common p53 mutations in human cancer (representing ˜20% of all p53 mutations; Table 1-II) in the 12 wt-like mutants and the 47 mt-like mutants were compared. None of the common mutations were found to overlap with the subset of 11 missense point mutations (mpms) in the wt-like group, compared to 9 of 27 in the mt-like group (pfe=0.029). The mpms in each tumor group was then cross-compared with the IARC TP53 Mutation Database's comprehensive listing of 418 mutants previously analyzed for dominant negative function in at least one of 44 previously published studies. As Table 2-III shows, it was found that only one of the 11 mpms among the 12 wt-like mutants had been demonstrated previously to have dominant negative activity, compared to 12 of 27 within the mt-like group (pfe=0.039). Together, these data suggest that at the sequence level, the 12 p53 mutants classified as wildtype-like may in fact comprise of mostly “benign” p53 mutant forms compared to those 47 classified as mutant-like, in agreement with their molecular consistencies with the majority of p53 wt tumors in our expression analyses.












TABLE 3






12 wt-like




mutation type
tumors
47 mt-like tumors
p-value:


















I. severe mutations:
1
20
0.025


deletions
0
11


stop codons
0
7


insertions
1
2



(11 tumors with
(27 tumors



mpms)
with mpms)


II. Common missense
0
9
0.029


pt. mutations:


175 (Arg->His)
0
2


248 (Arg->Gln)
0
3


248 (Arg->Trp)
0
2


273 (Arg->His)
0
0


273 (Arg->Cys)
0
2


282 (Arg->Trp)
0
0


III. pt. mutations with
1
12
0.039


known dominant


negative function:










Comparative analysis of p53 mutations. (I) Severe mutations were defined as insertions, deletions, or stop codons. Of the remaining missense point mutations (mpms; 11 in the wt-like group, 27 in the mt-like group) we determined the frequency of occurrence of (II) the most common missense point mutations in p53 as defined by the IARC TP53 Mutation Database (http://www.iarc.fr/p53/index.html), and (III) mutants previously shown, in vitro, to possess dominant negative activity. P-values were calculated using Fisher's Exact test.


The practice of the present invention may employ conventional biology methods known to the skilled artisan, software and systems. The foregoing examples have described methods for predicting disease outcome in a patient. In another aspect, there is also provided a computer system for predicting disease outcome in a patient. The computer system may comprise a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene; comparing said gene expression profiles to determine which genes are differentially expressed in the mutant or wildtype tumors; deriving from said differentially expressed genes a set of genes to predict p53 mutational status; and using the set of genes to predict disease outcome in the patient.


A suitable computer system may be a general purpose computer such as a PC or a Macintosh, for example. Computer software products of the invention typically include a computer readable medium having computer-executable instructions for performing the logic steps of the method of the invention. Suitable computer readable media include floppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM, magnetic tapes etc. The computer executable instructions may be written in a suitable computer language or a combination of several languages. Basic computational biology methods are described in, e.g. Setubal and Meidanis et al., Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics: Application in Biological Science and Medicine (CRC Press, London, 2000) and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc., 2nd Ed., 2001).


Additionally, the present invention may have preferred embodiments that include methods for providing genetic information over networks such as the Internet.


Additionally, some embodiments of the present invention may provide a plurality of pharmaceutical targets for designing chemotherapeutic drugs for a variety of cancers. For example, the 32 genes most correlated with p53 mutational status could serve as potential molecular targets for chemotherapy. Chemotherapy drugs (cytotoxics) and antihormonal treatments are commonly used to treat cancers. In several patients however, treatment regimens involving cytotoxics and antihormonals have been known to cause mild to severe side effects. In breast cancer for example, these side effects include vomiting, nausea, alopecia and fatigue. The future of effective treatment for cancer thus resides with drugs that are more specific for their targets. According to some studies, about 68% of breast cancer drugs in the clinical developmental pipeline are of the targeted class. Therefore, molecular signatures such as those embodied in certain aspects of the present invention will provide important leads or will prove to be targets in their own right for targeted chemotherapeutic drugs.


In conclusion, the disclosed embodiments of the present invention define a gene expression signature a gene expression signature that can predict p53 status and survival in human breast tumours (the p53 signature or classifier). In independent datasets of both breast and liver cancers, and regardless of other clinical features, subsets of the p53 signature can predict p53 status with significant accuracy. As a predictor of disease-specific survival (DSS), the signature significantly outperformed p53 mutation status alone in a large patient cohort with heterogeneous treatment. The p53 signature could significantly distinguish patients having more or less benefit from systemic adjuvant therapies and loco-regional radiotherapy. Though the p53 pathway may be compromised at some level in most human cancers, analysis of transcripts involved in the p53 pathway suggests that the p53 expression signature defines an operational configuration of this pathway in breast tumors (more so than p53 mutation status alone) that impacts patient survival, and therapeutic response. In cancer, it is clear that not all p53 mutations have equal effects: some simply confer loss of function, while others have a dominant negative effect (such as trans-dominant suppression of wildtype p53 or oncogenic gain of function), while still others show only a partial loss of function where, for example, only a small subset of p53 downstream transcriptional target genes are dysregulated. For these reasons, no single molecular assessment of p53 status appears to provide an absolute indication of the complete p53 function. The embodiments disclosed herein suggest that by looking at the downstream indicators of p53 function, the functional status of p53 may be ascertained more precisely than using sequencing or biochemical means.


It is to be understood that the above description in intended to be illustrative and not restrictive. Many variations of the invention will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. All cited references, including patent and non-patent literature, are incorporated herewith by reference in their entireties for all purposes.

Claims
  • 1. A method for predicting disease outcome, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene;comparing said gene expression profiles to determine which genes are differentially expressed in the tumor samples that may be mutant or wild type for the p53 gene;deriving from said differentially expressed genes a set of sequences to predict p53 mutational status; andassessing the ability of the set of sequences based on microarray analysis and Kaplan-Meier analysis to predict disease outcome, wherein the sequences consist of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5 and SEQ ID NO: 25 and wherein the disease is late-stage breast cancer.
  • 2. The method of claim 1 wherein disease outcome is selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response.
  • 3. The method of claim 1 wherein predicted p53 mutational status is obtained by ranking the differentially expressed genes according to their association with p53 mutational status, ER status and histologic grade of the tumor.
  • 4. The method of claim 3 wherein the genes are ranked according to a multivariate ranking procedure.
  • 5. The method of claim 4 wherein the multivariate ranking procedure is Linear Model-Fit.
  • 6. The method of claim 3 wherein predicted p53 mutational status is obtained by employing a supervised learning method.
  • 7. The method of claim 6 wherein the supervised learning method is Diagonal Linear Discriminant Analysis.
  • 8. The method of claim 2 wherein the disease outcome is disease-specific survival.
  • 9. A method of identifying a group of sequences for predicting disease outcome in a patient, the method comprising the steps of: obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene;comparing said gene expression profiles to determine which genes are differentially expressed in the tumor samples that may be mutant or wild type for the p53 gene;ranking the differentially expressed genes according to their ability to predict p53 mutational status;employing a supervised learning method to distinguish between mutant and wildtype p53 gene expression profiles;obtaining a p53 classifier including a set of sequences capable of predicting p53 mutational status;validating the p53 classifier in independent datasets; andassessing the ability of the p53 classifier based on microarray analysis and Kaplan-Meier analysis to predict disease outcome in the patient, wherein the p53 classifier includes sequences consisting of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5, and SEQ ID NO: 25 and wherein the disease is late-stage breast cancer.
  • 10. The method of claim 9 wherein the differentially expressed genes are ranked by a multivariate ranking procedure according to their association with p53 status, ER status and histologic grade of the tumor.
  • 11. The method of claim 10 wherein the multivariate ranking procedure is a Linear Model-Fit.
  • 12. The method of claim 9 wherein the supervised learning method is a Diagonal Linear Discriminant Analysis.
  • 13. The method of claim 9 wherein disease outcome is selected from the group consisting of disease-specific survival, disease-free survival, tumor recurrence and therapeutic response.
  • 14. The method of claim 13 wherein the disease outcome is disease-specific survival.
  • 15. A computer system for predicting disease outcome in a patient, the computer system comprising: a computer having a processor and a memory, the memory having executable code stored thereon for execution by the processor for performing the steps of:obtaining gene expression profiles from a plurality of genes from tumor samples, wherein said tumor samples may be mutant or wildtype for the p53 gene;comparing said gene expression profiles to determine which genes are differentially expressed in the tumor samples that may be mutant or wild type for the p53 gene;deriving from said differentially expressed genes a set of sequences to predict p53 mutational status; andassessing the ability of the set of sequences based on microarray analysis and Kaplan-Meier analysis to predict disease outcome in the patient, wherein the set includes sequences consisting of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5, and SEQ ID NO: 25 and wherein the disease is late-stage breast cancer.
  • 16. The method of claim 15 wherein the disease outcome is disease-specific survival.
  • 17. A method for predicting disease outcome for a late-stage breast cancer patient, the method comprising the steps of obtaining tumor tissue from the late-stage breast cancer patient;extracting RNA from the tumor tissue;determining by an empirical method if the RNA from the tumor tissue expresses a set of nucleotide sequences consisting of SEQ ID NO: 22, SEQ ID NO: 31, SEQ ID NO: 11, SEQ ID NO: 9, SEQ ID NO: 1, SEQ ID NO: 29, SEQ ID NO: 28, SEQ ID NO: 5 and SEQ ID NO: 25; andpredicting the disease outcome for the late-stage breast cancer patient based on the determination.
  • 18. The method of claim 17, wherein the disease outcome is disease-specific survival.
  • 19. The method of claim 17, wherein the set of sequences is immobilized on a solid support.
  • 20. The method of claim 19, wherein the solid support is a microarray.
  • 21. The method of claim 17, wherein the disease outcome is disease-free survival.
  • 22. The method of claim 17, wherein the tumor tissue is frozen prior to RNA extraction from the tumor tissue.
  • 23. The method of claim 17, wherein the determination step is performed using cRNA.
  • 24. The method of claim 17, wherein the empirical method is hybridization of the RNA from the tumor tissue to the set of nucleotide sequences.
US Referenced Citations (6)
Number Name Date Kind
6208983 Parra et al. Mar 2001 B1
6306087 Barnhill et al. Oct 2001 B1
6468476 Friend et al. Oct 2002 B1
6714925 Barnhill et al. Mar 2004 B1
6757412 Parsons et al. Jun 2004 B1
20040058340 Dai et al. Mar 2004 A1
Foreign Referenced Citations (1)
Number Date Country
WO 2004065583 Aug 2004 WO
Related Publications (1)
Number Date Country
20060074565 A1 Apr 2006 US