GENE EXPRESSION PROFILING OF PRIMARY BREAST CARCINOMAS USING ARRAYS OF CANDIDATE GENES

TECHNICAL FIELD

This disclosure relates to polynucleotide analysis and, in particular, to polynucleotide expression profiling of carcinomas using arrays of candidate polynucleotides.

BACKGROUND

Pathologists and clinicians in charge of the management of breast cancer patients are facing two major problems, namely the extensive heterogeneity of the disease and the lack of factors—among conventional histological and clinical features—predicting with reliability the evolution of the disease and its sensitivity to cancer therapies. Breast tumors of the same apparent prognostic type vary widely in their responsiveness to therapy and consequent survival of the patient. New prognostic and predictive factors are needed to allow an individualization of therapy for each patient.

Great hope is currently being placed on molecular studies, which address the problem in a global fashion. Methods such as cytogenetics, comparative genomic hybridization, and whole-genome allelotyping have addressed the issue at the genome level. Currently, the modifications that take place in human tumors at the level of transcription can also be studied in a large, unprecedented scale, using new methods such as cDNA arrays that allow quantitative measurement of the mRNA expression levels of many genes simultaneously. Thus, it would be advantageous to provide a means to assess the capacity of cDNA array testing-in clinical practice to better classify an heterogeneous cancer into tumor subtypes with more homogeneous clinical outcomes, and to identify new potential prognostic factors and therapeutics targets.

SUMMARY

We provide a method for the molecular characterization of a carcinoma comprising the steps of:

(i) detecting in tumor cells corresponding to breast tumor cells at least one polynucleotide selected from a first group comprising: -EST T89980 (SEQ ID No: 16), -SOX 4 (SEQ ID No: 22, SEQ ID No: 23, SEQ ID No: 24), -ENPP2 (SEQ ID No: 39, SEQ ID No: 40, SEQ ID No. 41), -MUC 1 (SEQ ID No: 57, SEQ ID No: 58), -GATA3 (SEQ ID No: 76, SEQ ID No: 77, SEQ ID No: 78), -TOP2B (SEQ ID No: 82. SEQ ID No: 83), -IL2RB (SEQ ID No: 97, SEQ ID No: 98, ID No: 99), -ERBB2 (SEQ ID No: 118, SEQ ID No: 119), -EGFR (SEQ ID No: 135, SEQ ID No: 136, SEQ ID No: 137), -THBS1 (SEQ ID No: 216, SEQ ID No: 217), -PPP2R2C (SEQ ID No: 238, SEQ ID No: 239), -ATF3 (SEQ ID No: 250, SEQ ID No: 251, SEQ ID No: 252), -KIAA1075 (SEQ ID No: 322, SEQ ID No: 323), -CDH1 (SEQ ID No: 326, SEQ ID No: 327, SEQ ID No: 328); -ZNF144 (SEQ ID N6: 329, SEQ ID No: 330), -GSTP1 (SEQ ID No: 334. SEQ ID No: 335, SEQ ID No: 336), -CD44 (SEQ ID No: 374, SEQ ID No: 375, SEQ ID No: 376), -GZMA (SEQ ID No: 402, SEQ ID No: 403), -EST T80406 (SEQ ID No: 430), and -ESTs H30141 & H27466 (SEQ ID No: 438, SEQ ID No: 439) determining the expression level of the at least one polynucleotide from the first group to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell;

(ii) detecting in tumor cells corresponding to breast tumor cells at least one polynucleotide selected from a second group comprising: -SOX4 11 (SEQ ID No: 22, SEQ ID No: 23, SEQ ID No: 24), -CSF1 (SEQ ID No: 48, SEQ ID No: 49, SEQ ID No: 50), -VIL2 (SEQ ID No: 51, SEQ ID No: 52, SEQ ID No: 53), -IGF2 (SEQ ID No: 59, SEQ ID No: 60, SEQ ID No: 61), -KIAA0427 (SEQ ID No: 65, SEQ ID No: 66, SEQ ID No: 67), -MYC (SEQ ID No: 73, SEQ ID No: 74, SEQ ID No: 75), -GATA3 (SEQ ID No: 76, SEQ ID No: 77, SEQ ID No: 78), -TOP2B (SEQ ID No: 82, SEQ ID No: 83), -ERBB2 (SEQ ID No: 118, SEQ ID No: 119), -EGFR (SEQ ID No: 135, SEQ ID No: 136, SEQ ID No: 137), -CRABP2 (SEQ ID No: 156, SEQ ID No: 157, SEQ ID No: 158), -GZMB 73 (SEQ ID No: 178, SEQ ID No: 179), -IGKC (SEQ ID No: 186), -ANG (SEQ ID No: 194, SEQ ID No: 195), -EFNA1 (SEQ ID No: 226, SEQ ID No: 227), -MYBL2 (SEQ ID No: 308, SEQ ID No: 309, SEQ ID No: 310), CDH1 (SEQ ID No: 326, SEQ ID No: 327, SEQ ID No: 328), -MST1 (SEQ ID No: 331, SEQ ID No: 332, SEQ ID No: 333), -MYB (SEQ ID No: 354, SEQ ID No: 355), -XBP1 (SEQ ID No: 385, SEQ ID No: 386, SEQ ID No: 387), -SRF (SEQ ID No: 391, SEQ ID No: 392, SEQ ID No: 393), -SOX9 (SEQ ID No: 394, SEQ ID No: 395), and -ESTs H21879 & H21880 (SEQ ID No: 433, SEQ ID No: 434) determining the expression level of the at least one polynucleotide from the second group to distinguish tumors sensitive to anthracycline from tumors insensitive to anthracycline;

(iii) detecting in tumor cells corresponding to breast tumor cells at least one polynucleotide selected from a third group comprising: -CTSB (SEQ ID No: 30, SEQ ID No: 31), -VIL2 (SEQ ID No: 51, SEQ ID No: 52, SEQ ID No: 53), -MUC1 (SEQ ID No: 57, SEQ ID No: 58), -EMR1 (SEQ ID No: 62, SEQ ID No: 63, SEQ ID No: 64), -KIAA0427 (SEQ ID No: 65, SEQ ID No: 66, SEQ ID No: 67), -GATA3 (SEQ ID No: 76, SEQ ID No: 77, SEQ ID No: 78), -PRLR 39 (SEQ ID No: 94, SEQ ID No: 95, SEQ ID No: 96), -GATA3 (SEQ ID No: 100, SEQ ID No: 101, SEQ ID No: 78), -TC21 (SEQ ID No: 106, SEQ ID No: 107, SEQ ID No: 108), -BCL2 (SEQ ID No: 115, SEQ ID No: 116, SEQ ID No: 117), -CRABP2 (SEQ ID No: 156, SEQ ID No: 157, SEQ ID No: 158), -ANG (SEQ ID No: 194, No: 195), -EGF (SEQ ID No: 199, SEQ ID No: 200), -THBS 1 (SEQ ID No: 216, SEQ ID No: 217), -EDNRA (SEQ ID No: 228, SEQ ID No: 229), -SMARCA2 (SEQ ID No: 235, SEQ ID No: 236, SEQ ID No: 237), ABCB1 (SEQ ID No: 257, SEQ ID No: 258), -BIRC4 (SEQ ID No: 273, SEQ ID No: 274), -DAPS (SEQ ID No: 275, SEQ ID No: 276), -GNRH1 (SEQ ID No: 277, SEQ ID No: 278), -EST 897218 (SEQ ID No: 296, SEQ ID No: 297), -BS69 (SEQ ID No: 342, SEQ ID No: 343, SEQ ID No: 344), -MYB (SEQ ID No: 354, SEQ ID No: 355), -CTSB (SEQ ID No: 361, SEQ ID No: 31), -MLANA (SEQ ID No: 362, SEQ ID No: 363, SEQ ID No: 364), -APR-1 (SEQ ID No: 365, SEQ ID No: 366, SEQ ID No: 367), -CDKN3 (SEQ ID No: 377, SEQ ID No: 378, SEQ ID No: 379), -XBP1 (SEQ ID No: 385, SEQ ID No: 386, SEQ ID No: 387), -CDH15 (SEQ ID No: 396, SEQ ID No: 397, SEQ ID No: 398), -EST W73386 168 ests (SEQ ID No: 401), -ILF1 (SEQ ID No: 406, SEQ ID No: 407, SEQ ID No: 408), -ARHGDIA (SEQ ID No: 409, SEQ ID No: 410, SEQ ID No: 411), -C4A (SEQ 1D No: 412, SEQ ID No: 413), -ESR1 (SEQ ID No: 420, SEQ ID No: 421, SEQ ID No: 422), -PBX1 (SEQ ID No: 423, SEQ ID No: 424, SEQ ID No: 425), -GLI3 (SEQ ID No: 426, SEQ ID No: 427, SEQ ID No: 428), -ESTs 1-124628 & H24592 (SEQ ID No: 435, SEQ ID No: 436), and -EST H28056 (SEQ ID No: 437) determining the expression levels of the at least one polynucleotide from the third group to classify good and poor prognosis primary breast tumors.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example of differential gene expression between normal breast tissue (NB) and breast tumor samples.

FIG. 2 is a representation of expression levels of 176 genes in normal breast tissue (NB) and 34 samples of breast carcinoma.

FIG. 3 is prognostic classification of breast cancer by gene expression profiling.

FIG. 4 shows the correlation of GATA3 (SEQ ID No: 78) expression with ER phenotype.

DETAILED DESCRIPTION

In the context of this disclosure, a number of terms shall be utilized.

The term “polynucleotide” refers to a polymer of RNA or DNA that is single-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. A polynucleotide in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.

The term “subsequence” refers to a sequence of nucleic acids that comprises a part of a longer sequence of nucleic acids.

The term “immobilized on a support” means bound directly or indirectly thereto including attachment by covalent binding, hydrogen bonding, ionic interaction, hydrophobic interaction or otherwise.

Breast cancer is characterized by an important histoclinical heterogeneity that currently hampers the selection of the most appropriate treatment for each case. This problem could be solved by the identification of new parameters that better predict the natural history of the disease and its sensitivity to treatment. An important object of this disclosure relates to a large-scale molecular characterization of breast cancer that could help in prediction, prognosis and cancer treatment.

An important aspect of this disclosure relates to the use of cDNA arrays, which allows quantitative study of mRNA expression levels of 188 candidate genes in 34 consecutive primary breast carcinomas in three areas: comparison of tumor samples, correlations of molecular data with conventional histoclinical prognostic features and gene correlations. The experimentation evidenced extensive heterogeneity of breast tumors at the transcriptional level. Hierarchical clustering algorithm identified two molecularly distinct subgroups of tumors characterized by a different clinical outcome after chemotherapy. This outcome could not have been predicted by the commonly used histoclinical parameters. No correlation was found with the age of patients, tumor size, histological type and grade. However, expression of genes was differential in tumors with lymph node metastasis and according to the estrogen receptor status; ERBB2 (SEQ ID No: 119) expression was strongly correlated with the lymph node status (p≦0.0001) and that of GA TA 3 (SEQ ID No: 78) with the presence of estrogen receptors (p≦0.001). Thus, experimental results identified new ways to group tumors according to outcome and new potential targets of carcinogenesis. They show that the systematic use of cDNA array testing holds great promise to improve the classification of breast cancer in terms of prognosis and chemosensitivity and to provide new potential therapeutic targets.

DNA arrays consist of large numbers of DNA molecules spotted in a systematic order on a solid support or substrate such as a nylon membrane, glass slide, glass beads, a membrane on a glass support, or a silicon chip. Depending on the size of each DNA spot on the array, DNA arrays can be categorized as microarrays (each DNA spot has a diameter less than 250 microns) and macroarrays (spot diameter is greater than 300 microns). When the solid substrate used is small in size, arrays are also referred to as DNA chips. Depending on the spotting technique used, the number of spots on a glass microarray can range from hundreds to thousands.

DNA microarrays serve a variety of purposes, including gene expression profiling, de novo gene sequencing, gene mutation analysis, gene mapping and genotyping. cDNA microarrays are printed with distinct cDNA clones isolated from cDNA libraries. Therefore, each spot represents, an expressed gene, since it is derived from a distinct mRNA.

Typically, a method of monitoring gene expression involves (1) providing a pool of sample polynucleotides comprising RNA transcript(s) of one or more target gene(s) or nucleic acids derived from the RNA transcript(s); (2) reacting, such as hybridizing the sample polynucleotide to an array of probes (for example, polynucleotides obtained from a polynucleotide library) (including control probes) and (3) detecting the reacted/hybridized polynucleotides. Detection can also involve calculating/quantifying a relative expression (transcription) level.

We provide a polynucleotide library useful in the molecular characterization of a carcinoma, said library comprising a pool of polynucleotide sequences or subsequences thereof wherein said sequences or subsequences are either underexpressed or overexpressed in tumor cells, further wherein said sequences or subsequences correspond substantially to any of the polynucleotide sequences set forth in any of SEQ ID Nos: 1-468 in annex or the complement thereof.

Obviously, complementary sequences (“complements”) having a great degree of homology with the above sequences could also be used to realize our molecular characterization, namely when those sequences present one or a few punctual mutations when compared with any one of the sequences represented by SEQ ID Nos: 1-468.

A particular embodiment of this disclosure relates to a polynucleotide library of sequences or subsequences corresponding substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets 1 to 188 as defined in Table 4.

A polynucleotide sequence library useful for our realization can comprise also any sequence comprised between 3′end and 5′end of each polynucleotide sequence set as defined in Table 4, allowing the complete detection of the implicated gene.

We also provide a polynucleotide library useful to differentiate a normal cell from a cancer cell wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequences sets indicated in Table 5, useful in differentiating a normal cell from a cancer cell.

Preferably the polynucleotide library useful to differentiate a normal cell from a cancer cell corresponds substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5A, and of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5B.

The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 5A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 5B allows distinction between normal patients and patients suffering from tumor pathology.

We further provide a polynucleotide library useful to detect a hormone-sensitive tumor cell wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6.

Preferably the polynucleotide library useful to detect a hormone-sensitive tumor cell correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6B.

The detection of an overexpression of genes identified with sets of polynucleotides sequences defined in Table 6A, together with detection of an underexpression of genes identified with sets of polynucleotides sequences defined in Table 6B allows distinction between patients having a hormone-sensitive tumor and patients having a hormone-resistant tumor.

We also provide a polynucleotide library useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one or predefined polynucleotide sequence sets defined in Table 7.

Preferably, the polynucleotide library useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7B.

The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 7A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 7B allows distinction between patients having a tumor in which a lymph node has been invaded by a tumor cell and patients having a tumor in which a lymph node has not been invaded by a tumor cell.

We further provide a polynucleotide library useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8.

Preferably, the polynucleotide library useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8B.

The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 8A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 8B allows distinction between patients having an anthracycline-sensitive tumor from patients having an anthracycline-insensitive tumor.

We provide a polynucleotide library useful to classify good and poor prognosis primary breast tumors wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9.

Preferably, the polynucleotide library useful to classify good and poor prognosis primary breast tumors correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9B.

The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 9A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 9B allows to classify patients having good or poor prognosis primary breast tumors.

In a preferred embodiment, the tumor cell presenting underexpressed or overexpressed sequences from our polynucleotide library are breast tumor cells.

In a particular embodiment our polynucleotides of the polynucleotide library are immobilized on a solid support in order to form a polynucleotide array, and said solid support is selected from the group consisting of a nylon membrane, nitrocellulose membrane, glass slide, glass beads, membranes on glass support or a silicon chip.

Another object of ours concerns a polynucleotide array useful for prognosis or diagnosis of a tumor bearing at least one immobilized polynucleotide library set as previously defined.

We also provide a polynucleotide array useful to differentiate a normal cell from a cancer cell bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5, useful in differentiating a normal cell from a cancer cell.

Preferably the polynucleotide array useful to differentiate a normal cell from a cancer cell bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5A, and of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5B.

This disclosure relates also to a polynucleotide array useful to detect a hormone-sensitive tumor cell bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6.

Preferably the polynucleotide array useful to detect a hormone-sensitive tumor cell bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6B.

We also provide a polynucleotide array useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7.

Preferably, the polynucleotide array useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has been invaded by a tumor cell bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7B.

We also provide a polynucleotide array useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8.

Preferably, the polynucleotide array useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8B.

This disclosure concerns also a polynucleotide array useful to classify good and poor prognosis primary breast tumors bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence set defined in Table 9.

Preferably, the polynucleotide array useful to classify good and poor prognosis primary breast tumors bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9B.

We also provide a method for detecting differentially expressed polynucleotide sequences that are correlated with a cancer, said method comprising:

obtaining a polynucleotide sample from a patient;

reacting the polynucleotide sample obtained in step (a) with a probe immobilized on a solid support wherein said probe comprises any of the polynucleotide sequences of the libraries previously defined or an expression product encoded by any of the polynucleotide sequences of the libraries previously defined; and

detecting the reaction product of step (b).

Preferably, the polynucleotide sample obtained at step (a) is labeled before its reaction at step (b) with the probe immobilized on a solid support.

The label of the polynucleotide sample is selected from the group consisting of radioactive, colorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels.

In a particular embodiment the reaction product of step (c) is quantified by further comparison of said reaction product to a control sample.

In a first embodiment, the polynucleotide sample isolated from the patient and obtained at step (a) is either RNA or mRNA.

In another embodiment the polynucleotide sample isolated from the patient is cDNA is obtained by reverse transcription of the mRNA.

Preferably the reaction step (b) of the method for detecting differentially expressed polynucleotide sequences comprises a hybridization of the sample RNA issued from patient with the probe.

Preferably the sample RNA is labeled before hybridization with the probe and the label is selected from the group consisting of radioactive, calorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels.

This method for detecting differentially expressed polynucleotide sequences is particularly useful for detecting, diagnosing, staging, monitoring, predicting, preventing or treating conditions associated with cancer, and particularly breast cancer.

The method for detecting differentially expressed polynucleotide sequences is also particularly useful when the product encoded by any of the polynucleotide sequence or subsequence set is involved in a receptor-ligand reaction on which detection is based.

This disclosure is also related to a method for screening an anti-tumor agent comprising the above-depicted method for detecting differentially expressed polynucleotide sequences wherein the sample has been treated with the anti-tumor agent to be screened.

In a particular embodiment the method for screening an anti-tumor agent comprises detecting polynucleotide sequences reacting with at least one library of polynucleotides or polynucleotide sequence set as previously defined or of products encoded by said library in a sample obtained from a patient.

Tumor Samples and RNA Extraction

To avoid any bias of selection as to the type and size of the tumors, the RNAs to be tested were prepared from unselected samples. Samples of primary invasive breast carcinomas were collected from 34 patients undergoing surgery at the Institute Paoli-Calmette. After surgical resection, the tumors were macrodissected: a section was taken for the pathologist's diagnosis and an adjacent piece was quickly frozen in liquid nitrogen for molecular analyses. The median age of patients at the time of diagnosis was 55 years (range 39, 83) and most of them were post-menopausal. Tumors were classified according to the WHO histological typing of breast tumors in: 29 ductal carcinomas, 2 lobular carcinomas, 1 mixed ductal and lobular carcinoma, and 2 medullar carcinomas. They had various sizes, inferior or equal to 20 mm (n=13), between 20 and 50 mm (n=18) or superior to 50 mm (n=3), axillary's lymph node status (negative: 19 tumors, positive: 15 tumors), SBR grading (I: 3 tumors, II: 20 tumors, III: 10 tumors, not evaluable: 1 tumor), and estrogen receptor status (ER) evaluated by immunohistochemical assay (23 ER-positive. 11 ER-negative). ER positivity cutoff value was 10%. Adjuvant treatment with radiotherapy and when necessary multi-agent anthracycline-based chemotherapy (n=16) was given to patients according to local practice.

Total RNA was extracted from tumor samples by standard methods (43). Total RNA from normal breast tissue was obtained from Clontech (Palo Alto, Calif.): RNA was isolated from 8 tissue specimens from Caucasian females, age range 23-47. RNA integrity was controlled by denaturing formaldehyde agarose gel electrophoresis and Northern blots using a 28S-specific oligonucleotide.

cDNA Arrays Preparation

Gene expression was analyzed by hybridization of arrays with radioactive probes. The arrays contained PCR products of 5 control clones, and 180 IMAGE human cDNA clones selected with practical criteria (3′ sequence of mRNA, same cloning vector, host bacteria and insert size). This represented 176 genes (4 genes were represented by 2 different clones): 121 with proven or putative implication in cancer and 55 implicated in immune reactions. Their identity was verified by 5′ tag-sequencing of plasmid DNA and comparison with sequences in the EST (dbEST) and nucleotide (GenBank) databases at the NCBI. Identity was confirmed for all but 14 clones without significant gene similarity, which were referenced by their GenBank accession number. The control clones were: Arabidopsis thaliana cytochrome c554 gene (used for hybridization signal normalization), 3 poly(A) sequences of different sizes and the vector pT7T3D (negative controls).

PCR amplification, purification and, robotical spotting of PCR, products: onto Hybond-N+membranes (Amersham) were done according to described protocols (4). All PCR products were spotted in duplicate. For normalization purpose, the c554 gene was spotted 96-fold scattered over the whole membrane.

cDNA Array Hybridizations

Hybridizations were done successively with a vector oligonucleotide (to precisely determine the amount of target DNA accessible to hybridization in each spot), then after stripping of vector probe, with complex probes made from the RNAs (4). Each complex probe was hybridized to a distinct filter. Probes were prepared from total RNA with an excess of oligo(dT25) to saturate the poly(A) tails of the messengers, and to insure that the reverse transcribed product did not contain long poly(T) sequences. A precise amount of c554 mRNA was added to the total RNA before labeling to allow normalization of the data.

Five ng of total RNA (−100 ng of mRNA) from tissue samples were used for each labeling. Probe preparation and hybridization of the membranes were done according to known procedures (http:/tagc.univ-mrs.fr/pub/Cancer/). Hybridization was done in excess of target (−15 ng of DNA in each spot) and binding of cDNAs to the targets was linear and proportional to the quantity of cDNA in the probe.

Detection and Quantification of cDNA Array Hybridization Signals

Quantitative data were obtained using an imaging, plate device. Hybridization signal detection with a FUJI BAS 1500 machine and quantification with the HDG Analyzer software (Genomic Solutions, Ann Arbor, Mich.) were done as previously described. Quantification was done by integrating all spot pixel intensities and subtracting a spot background value determined in the neighboring area. Spots were located with a LaPlacian transformation. Spot background level was the median intensity of all the pixels present in a small window centered on the spot and which were not part of any spot (44). Quantified data were normalized in three steps and expressed as absolute gene expression levels (i.e. in percentage of abundance of individual mRNA with respect to mRNA within the sample), as described (4).

Array Data Analysis

Before analysis of the results, the reproducibility of the experiments was verified by comparing duplicate spots, or one hybridization with the same probe on two independent arrays, or two independent hybridizations with probes prepared from the same RNA. In every case, the results showed good reproducibility with respective correlation coefficients of 0.95, 0.98 and 0.98 (data not shown). Moreover, genes represented by two different clones on the array, such as CDK4 (SEQ ID No: 288) or ETV5 (SEQ ID No: 300), displayed similar expression profiles for the two clones in all samples. This reproducibility was sufficient to consider a 2-fold expression difference as significantly differential.

For graphical representation, data were displayed as absolute expression levels (FIG. 2a). For better visualization of clustering, results were log-transformed and displayed as relative values median-centered in each row and in each column (FIG. 2b). Hierarchical clustering was applied to the tissue samples and the genes using the Cluster program developed by Eisen (45) (average linkage clustering using Pearson correlation as similarity metric). Results in FIGS. 2 and 3 were displayed with the TreeView program (45).

Subsequent analysis was done using Excel software (Microsoft) and statistical analyses with the SPSS software. Metastasis-free survival and overall survival were measured from diagnosis until the first metastatic relapse or death respectively. They were estimated with the Kaplan-Meier method and compared between groups with the Log-Rank test. Correlations of gene pairs based on expression profiles were measured with the correlation coefficient r. The search for genes with expression levels correlated with tumor parameters was done in several successive steps.

First, genes were detected by comparing their median expression level in the two subgroups of tumors discordant according to the parameter of interest. The median values rather than the mean values were used because of the high variability of the expression levels for many genes, resulting in a standard deviation of expression level similar or superior to the mean value and making comparisons with means impossible. Second, these detected genes were inspected visually on graphics, and finally, an appropriate statistical analysis was applied to those that were convincing to validate the correlation. Comparison of GATA3 (SEQ ID No: 78) expression between ER-positive tumors and ER-negative tumors was validated using a Mann-Witney test. Correlation coefficients were used to compare the gene expression levels to the number of axillary nodes involved.

Northern Blot Analysis

Seventy-nine breast tumors, including 22 of the 34 tested on the arrays, were analyzed for GATA3 (SEQ ID No: 78) expression by Northern blot hybridization. RNA extraction from tumor samples and Northern blots were done as previously described (43). The GATA3 probe was prepared from the IMAGE cDNA clone 129757 (SEQ ID No: 78), which corresponds to the 3′ region (from +843 to +1689) of the GATA3 cDNA sequence (GenBank accession no. X55122). The insert (846 bp) was obtained by digestion of the clone with EcoRI and PacI enzymes. Northern blots were stripped and re-hybridized using an â-actin probe (46).

FIG. 1 shows an example of differential gene expression between normal breast tissue (NB) and breast tumor samples. Each cDNA array on Nylon filter was hybridized with a complex probe made from 5 μg of total RNA. The top image corresponds to the whole membrane. For the two bottom images, only the tight portion of the membranes is shown. Numbers below the spots indicate housekeeping genes (1, GAPDH and 2, actin), negative control clones (3, 4 and 5) and examples of genes differentially expressed between NB and breast tumor (6, stromelysin3 (SEQ ID No: 346); 7, ERBB2 (SEQ ID No: 119); 8, MYBL2 (SEQ ID No: 310); 9, FOS (SEQ ID No: 318); 10, TGFâR3; 11, desmin (SEQ ID No: 170)), and between ER− breast tumor and ER+ breast tumor (12, GATA3).

FIG. 2 is a representation of expression levels of 176 genes in normal breast tissue (NB) and 34 samples of breast carcinoma. Each column corresponds to a single tissue, and each row to a single gene. (a) The results are expressed as percentage abundance of individual mRNA within the sample, and are represented using a gray color scale. The color scale (log scale with a 3-fold interval) indicated at the bottom left ranges from light gray (expression level ≧0.001%) to dark gray (expression level ≧3%). White squares indicate clones with undetectable expression levels. The tissue samples are arbitrarily ordered and the clones are ordered from top to bottom according to increasing median expression levels. Horizontal black arrows on the right of the figure mark three clones with highly variable expression levels between the tumors (stromelysin3 (SEQ ID No: 346), IGF2 (SEQ ID No: 61), GATA3 (SEQ ID No: 78) from top to bottom). (b) The results are shown as differential expression levels (relative to the median value of each row and each column) and are represented with a gray scale indicated at the bottom left ranging from 1/100 to 100 fold changes (gray squares: missing data). Lighter gray indicates a decrease in expression, whereas dark gray represents increased expression. Black represents no change in expression levels. Eighteen clones with median expression level equal to zero in the 34 tumors are omitted. The clustering program arranges samples (n=35) along the horizontal axis so that those with the most similar expression profiles are placed adjacent to each other. Similarly, clones (n=162) are near each other along the vertical axis if they show a strong expression profile correlation across all tissues. The length of the branches of the dendrograms capturing respectively the samples (top) and the clones (left) reflects the similarity of the related elements. Two groups of tumors are separated and color coded: group A and group B. Numerically identified horizontal arrows from 1 to 7 on the right of the figure respectively mark three genes with highly variable expression levels between the tumors (IGF2 (SEQ ID No: 61) (arrow #1), GATA3 (SEQ ID No: 78) (arrow #2), stromelysin3 (SEQ ID No: 346) (arrow #3) from top to bottom) and four pairs of different clones representing four genes (arrows #4-7). The upper portion of FIG. 2b (approximately above the position of arrow #3) shows a grouping of genes with general increased expression in Tumor Group A, whereas Tumor Group B grouping of decreased expression for those genes. The lower portion of FIG. 2b shows a grouping of genes with decreased expression in Tumor Group A that have increased expression in Tumor Group B. (c) Zoom representation of group A from FIG. 2b, excluding the two outlyer tumors at the right. The clustering separates two subgroups of tumors, A1 and A2. The dotted branches correspond to tumors associated with metastatic relapse and death. Follow-up was longer in A2 than in A1 (median 81 months for A2 versus 47 months for A1).

FIG. 3 is prognostic classification of breast cancer by gene expression profiling showing that gene expression-based tumor classification correlates with clinical outcome. The 12 samples of group A (see FIGS. 2b and 2c) were reclustered using the top 32 differentially expressed genes between A1 and A2 subgroups. Data were displayed as in FIG. 2b and shown with the same gray color key. The hierarchical clustering was applied to expression data from the 23 clones, out of 32, of which expression levels presented an at least two-fold change in at least two samples (out of 12). Two subgroups of tumors A1 and A2 are shown as well as two groups of differentially expressed clones. The dotted branches of tumor cluster A1 correspond to samples associated with metastatic relapse and death. FIG. 3a shows two-dimensional representation of hierarchical clustering results shown in FIGS. 2a and 2b. The analysis delineates 4 groups of tumours A, B, C and D. Squares indicate patients alive at last follow-up visit and triangles indicate patients who died. Three classes of patients with a statistically different clinical outcome were defined according to gene expression profiles: class A (n=16), class B+C (n=34), class D (n=5). FIG. 3b illustrates a Kaplan-Meier plot of overall survival of the 3 classes of patients (p≦0.005, log-rank test). And FIG. 3c illustrates a Kaplan-Meier plot of metastasis-free survival of the 3 classes of patients (p≦0.05, log-rank test).

FIG. 4 shows the correlation of GATA3 (SEQ ID No: 78) expression with ER phenotype. (a) The expression levels of GATA3 in 34 breast cancer samples (y axis) monitored by cDNA array analysis are reported in percentage of abundance of individual mRNA with respect to mRNA within the sample (log scale). GATA3 is significantly overexpressed in the ER-positive tumors (n=23) versus the ER-negative tumors (n=11) using the Mann-Witney test (p=0.0004). The expression level of GATA3 in normal breast tissue is reported on the right (NB). (b) Northern blot analysis of GATA3 in normal breast sample (NB) and 9 breast cancer samples (AT: tumor analyzed with cDNA array and Northern blot; NT: tumor analyzed with Northern blot). Blots were probed successively with cDNA from GATA3 (top) and d-actin (bottom). ER status is indicated for each tumor sample.

Data Representation

FIG. 1 shows examples of hybridizations of cDNA arrays with probes made from RNA extracted from normal breast tissue and breast tumors.

The crude results of all hybridizations were processed to be presented either as absolute or relative values in schematic figures. The normalization procedure allowed display of absolute values expressed in percent of abundance of mRNA in the probe as shown in FIG. 2a. Each level of the blue color ladder represents a 3-fold interval of absolute abundance of mRNA. Each column corresponds to a tissue sample and each row to a gene. For graphic purposes, genes were ordered from top to bottom according to increasing median expression levels. Tumor samples were not ordered. The values in each sample displayed a wide range of intensities (3 decades in log scale) corresponding to expression levels ranging from approximately 0.002% to 5% of mRNA abundance. Many genes (see for example stromelysin3 (SEQ ID No: 346), IGF2 (SEQ ID No: 61) and GATA3 (SEQ ID No: 78), arrows) displayed highly variable expression levels across all tumor samples, scattered over the whole dynamic range of values. A representation of relative values is shown in FIG. 2b. Absolute values were log-transformed, omitting 18 clones whose median intensity was equal to zero across all tissues. Data for each of the 162 remaining clones were then median-centered, as well as data for each sample, so that the relative variation was shown, rather than the absolute intensity. A color scale was used to display data: red for expression level higher than the median and green for expression level lower than the median. The magnitude of the deviation from the median was represented by the color intensity. A hierarchical clustering program was then applied to group the 35 samples according to their overall gene expression profiles, and to group the 162 clones on the basis of similarity of their expression levels in all tissues. This resulted in a picture highlighting groups of correlated tissues and groups of correlated genes as depicted by dendrograms.

Breast Tumor Classification

As shown in FIG. 2b, the clustering algorithm identified two groups of samples, designated A (n=15, including normal breast, NB) and B (n=20). These groups were similar with respect to patient age, menopausal status at diagnosis, SBR grading and tumor pathological size. However, 72% of tumors in group A were node-positive and 75% in group B were node-negative. Moreover, 80% of the tumors in group B were estrogen receptor (ER) positive and 50% in group A were ER-negative. With a median follow-up of 44 months after diagnosis, overall survival was different between A and B groups: 5 women died in A (median follow-up 58 months) and 1 in B (median follow-up 40 months). But the frequency of metastatic relapse was relatively similar in the two groups, with 5 women who relapsed in A and 6 in B. Because the time between the diagnosis of metastasis and last follow-up is too short in B, a longer follow-up is needed to determine if these two different groups, defined with expression profiles, have really a different outcome with respect to overall survival.

In the group A of 15 samples, three samples (normal breast and two tumors) were different from each other and from the other 12 samples. The latter constituted two subgroups of tumors, A1 (n=6) and A2 (n=6), which could be further separated by clustering as shown in FIG. 2c. The 12 tumors had a uniformly high risk of metastatic relapse according to conventional prognostic features as shown in Table 1. Most of them had received comparable adjuvant anthracycline-based chemotherapy after surgery, with more women treated in the A1 subgroup. Interestingly, these two subgroups, which could not be distinguished with commonly used histoclinical features, had a very different clinical outcome: there were 4 metastatic relapses and 4 deaths in A1 (median follow-up: 44 months). In contrast and despite a longer median follow-up (90 months), no metastasis or death occurred in A2. This resulted in a significant better metastasis-free survival (p≦0.01) and overall survival (p≦0.005) for group A2 than for group A1 tumors. No such subgrouping could be done in B.

TABLE 1

Subgroup
A1
A2

Tumor position
1
2
3
4
5
6
7
8
9
10
11
12

in the cluster

Age, years
46
58
60
63
51
58
46
47
50
47
46
66

Nodal status
1
0
0
16
13
37
10
4
1
2
0
0

Histological size,
60
20
26
35
20
30
27
25
30
25
20
22

mm

SBR grade
| |
| | |
| |
| | |
| |
| | |
| |
| |
| |
| |
| |
| | |

ER status
neg
neg
neg
neg
neg
neg
pos
neg
pos
pos
pos
pos

Adjuvant
yes
yes
no
yes
yes
yes
yes
yes
no
yes
no
no

chemotherapy

Metastasis
yes
no
yes
yes
no
yes
no
no
no
no
no
no

Follow-up,
58
106
35
47
41
31
85
98
95
49
19
141

months

Patients status
D
A
D
D
A
D
A
A
A
A
A
A

Patient characteristics in subgroups A1 and A2. The 12 tumors are numbered from 1 to 12 according to their position from left to right in the clustering graphic displayed in FIG. 3. Adjuvant chemotherapy was anthracycline-based. In the line concerning the patient status, A means alive and D means death from cancer progression.

Genes responsible for group A substructure were searched. These are potentially relevant to the prognosis and the sensitivity to chemotherapy in these tumors. Thirty-two genes out of 188 were identified by comparing their median expression level in A1 vs A2. Then, the 12 tumors were reclustered using the expression profiles of these genes as shown in FIG. 3. The same subgroups A1 and A2 were evident and separated by 2 groups of genes: as expected, high expression of ERBB2 (SEQ ID No: 119), MYC (SEQ ID No: 75) and EGFR (SEQ ID No: 137) was associated with bad prognosis subgroup A1 (6-8), and that of E-cadherin (SEQ ID No: 328) and the proto-oncogene MYB (SEQ ID No: 355) with good prognosis subgroup A2 (9, 10). For most of the other genes, these results may stimulate new investigations. Differentiation state is a good prognostic factor in breast cancer and, accordingly, genes associated with cell differentiation, such as GATA3 (SEQ ID No: 78) (11) and CRABP2 (SEQ ID No: 158) (12), had a high level of expression in the better outcome group. The high expression of Ephrin-A1 mRNA in the bad prognosis subgroup suggests a role of this growth factor in breast cancer and can be paralleled with its up-regulation during melanoma progression (13).

Differential Gene Expression between Normal Breast and Breast Tumor's

To identify genes differentially expressed between breast tumors (T) and normal breast (NB), the NB value for each gene was compared to its expression level in each tumor. When the expression level of a gene in NB was undetectable, only qualitative information could be deduced and the mRNA was considered as differentially expressed if the signal intensity in the tumor was superior to the reproducibility threshold (0.002% of mRNA abundance). In the other cases, differential expression was defined by an at least 2-fold expression difference. Also, the number of tumors where it was over- or underexpressed was measured. Table 2 shows a list of the top 20 over- and underexpressed genes. For these genes, the T/NB ratio is reported, where T represented their median expression value in the 34 tumors. This ratio ranged from 2.70 (ABCC5; (SEQ ID No: 325) to 17.76 (GATA3; (SEQ ID No: 78) for the overexpressed genes, and from 0.00 (desmin, (SEQ ID No: 170) to 0.29 (APC; (SEQ ID No: 56) for the underexpressed genes.

TABLE 2

Gene
Chrom.

Clone ID
Gene/Protein Identity
symbol
location
N
T/NB

Overexpressed genes

154343
Granzyme H
GZMH
14q11.2
32
9.51

235947
Stromelysin 3
STMY3
22q11.2
31
15.92

207378
MYB Related Protein B
MYBL2
20q13.1
31
(a)

153275
Cellular Retinoic Acid Binding Protein 2
CRABP2
1q21.3
29
7.16

129757
GATA-binding protein 3
GATA3
10p15
28
17.76

120649
T-Lymphocyte surface CD2 antigen
CD2
1p13.1
28
7.54

109677
CREB Binding Protein
CREBBP
16p13.3
28
5.08

172152
EGFR-binding protein GRB2
GRB2
17q24-q25
28
5.00

66969
Transcription factor RELB
RELB
19
28
3.61

182007
ETS-Related Transcription Factor ELF1
ELF1
13q13
27
3.58

153446
LIM domain protein RIL
RIL
5q31.1
26
4.03

203394
ETS Variant gene 5 (ETS-related molecule)
ETV5
3q28
25
3.67

160963
Thrombospondin 1
THBS1
15q15
25
3.39

188393
POU domain, class 2, transcription Factor 2
POU2F2
19
24
4.02

187822
Integrin, beta 2
ITGB2
21q22.3
24
3.01

243907
Nuclear Factor of Activating T cell Subunit p45
NF45
1
24
2.84

158347
EST H27202
EST

23
2.91

230933
EST AW184517
EST

22
2.85

212366
ATP-Binding Cassette, sub-family C (CFTR/MRP), 5
ABCC5
3q27
22
2.70

149401
Cathepsin D
CTSD
11p15.5
21
2.97

Underexpressed genes

153854
Desmin
DES
2q35
34
0.00

208717
P55-C-FOS proto-oncogene protein
FOS
14q24.3
33
0.05

159093
Transcription Factor AF4
TFAP4
16p13
33
0.11

124340
Tenascin XA
TNXA
6p21.3
33
0.14

133738
Prolactin
PRL
6p22.2-p21.3
32
0.00

133891
Chorionic Somatomammotropin Hormone 1
CSH1
17q22-q24
32
0.00

151501
Tyrosine Kinase Receptor TEK
TEK
9p21
32
0.00

183030
Activating Transcription Factor 3
ATF3
1
32
0.07

120916
Phosphodiesterase I
PDNP2
8q24.1
32
0.14

155716
EST R72075
EST

31
0.00

208118
Transforming Growth Factor Beta Receptor Type III
TGFBR3
1p33-p32
31
0.14

187547
Diphtheria Toxin Receptor
DTR
5q23
31
0.17

108490
HIV-1 Rev Binding protein
HRB
2q36
31
0.20

147002
B-cell CLL/lymphoma 2
BCL2
18q21.3
31
0.26

182610
Microsomal Glutathione S Transferase 1
MGST1
12p12.3-p12.1
31
0.28

152802
Phospholipase A2 Membrane Associated, group IIA
PLA2G2A
1p35
30
0.03

183087
Interleukin 3 Receptor Alpha chain
IL3RA
Xp22.3; Yp13.3
30
0.24

108571
Retinoblastoma-Like 2 (p130)
RBL2
16q12.2
29
0.28

125294
Adenomatous Polyposis Coli Protein
APC
5q21-q22
29
0.29

151767
FASL Receptor
TNFRSF6
10q24.1
28
0.27

List of the genes that show the most frequent differential expression between normal breast tissue and 34 breast carcinomas as measured by cDNA array analysis. N indicates the number of tumor samples where the gene is dysregulated (fold change □ 2) compared to normal breast tissue. T/NB represents the ratio: median expression level in 34 breast tumors/expression level in normal breast. (a) MYBL2 transcript displayed a median expression level of 0.025% in breast tumors and was undetectable in NB.

High expression of mucin 1 (SEQ ID No: 58), NM23, ERBB2 (SEQ ID No: 119), FGFR1 (SEQ ID No: 182) and FGFR2 (SEQ ID No: 15), MYC (SEQ ID No: 75), stromelysin3 (SEQ ID No: 346), cathepsin D (SEQ ID No: 128) and downregulation of FOS (SEQ ID No: 318), APC (SEQ ID No: 56), RBL2, FAS, BCL2 (SEQ ID No: 117) were found, reflecting what is known about their biology in cancer. GATA3 (SEQ ID No: 78), which codes for a member of the GATA family of zinc finger transcription factors, and CRABP2 (SEQ ID No 158), encoding one of the two cellular retinoic acid-binding proteins, showed high expression of mRNA, extending previous results on cDNA arrays (4).

Differential Gene Expression Among Various Breast Tumors and Correlation with Histoclinical Prognostic Parameters

To search for potential prognostic markers in breast cancer, genes with expression levels correlated with conventional histoclinical prognostic parameters were looked for: age of patients, axillary node status, tumor size, histological grade and ER status. No significant correlation was found with age, tumor size and histological grade. However, the expression profiles of some genes correlated with ER status and axillary node involvement.

To identify genes potentially relevant to the hormone-responsive phenotype, the gene expression profiles in ER-positive breast cancers (n=23) versus ER-negative breast cancers (n=11) were compared. Sixteen clones displayed a median intensity of 0 in both groups. Twenty-five presented a fold change superior to 2. Table 3a displays the top 10 over- and underexpressed genes. Among them, the most differentially expressed was GATA3 (SEQ ID No: 78) with a median intensity ratio ER+/ER− of 28.6 and a value for the first quartile of ER-positive tumors superior (5-fold) to the value of the third quartile of the ER-negative tumors as shown in FIG. 4a. The high expression of GATA3 in ER-positive tumors was statistically significant using a Mann-Witney test (p≦0.001). All ER-positive tumors and only 18% of ER-negative tumors displayed a GATA3 expression level greatly superior (fold change >3) to the normal breast value. Furthermore GATA3 expression was analyzed by Northern blot hybridization (FIG. 4b) in a panel of 79 breast cancers (21 ER-negative tumors and 58 ER-positive tumors), including 22 of the tumors analyzed with cDNA arrays. It confirmed the array results for those 22 tumors as well as the strong correlation between ER status and GATA3 RNA expression (Mann-Witney test, p≦0.0001).

TABLE 3a

Gene

Clone ID
Gene/Protein Identity
symbol
ER+/ER−

129757
GATA-binding protein 3
GATA3
28.6

356763
Granzyme A
GZMA
5.7

248613
MYB proto-oncogene
MYB
3.4

211999
KIAA1075 protein
KIAA1075
3.3

235947
Stromelysin 3
STMY3
3.1

229839
Macrophage Stimulating 1
MST1
2.8

153275
Cellular Retinoic Acid Binding
CRABP2
2.7

Protein 2

301950
X-box Binding Protein 1
XBP1
2.7

205314
Tumor Protein p53
TP53
2.5

126233
Insulin-like Growth Factor 2
IGF2
2.4

66322
CD3G antigen, Gamma
CD3G
0.0

195022
Interleukin 2 Receptor Gamma
IL2RG
0.0

chain

111461
SOX4 Protein
SOX4
0.4

151475
Epidermal Growth Factor Receptor
EGFR
0.5

195022
Interleukin 2 Receptor Beta chain
IL2RB
0.5

130788
Topoisomerase (DNA) II beta
TOP2B
0.6

(180 kD)

323948
SOX9 Protein
SOX9
0.6

183641
S100 calcium-binding protein Beta
S100B
0.6

246620
EST N53133
EST
0.6

231424
Glutathione S Transferase Pi
GSTP1
0.6

To search for genes whose expression profile was correlated with axillary lymph node status, a strong prognostic factor in breast cancer, the group of node-negative tumors (n=19) was compared with the group of tumors with massive axillary extension (10 or more positive nodes). Furthermore, because survival decreases with the increase of the number of tumor-involved lymph nodes and because the expression measurements were quantitative, correlation between the expression levels of these genes and the number of tumor-involved nodes (quantitative variables) was determined. Table 3b shows a list of the top 10 over- and underexpressed genes between these 2 groups. Most of these genes have not been previously reported as associated with node status, but some of these results are in agreement with literature data. The gene encoding the tyrosine kinase receptor ERBB2 (SEQ ID No: 119) was the most significantly overexpressed gene in node-positive tumors and displayed the highest correlation coefficient (r=0.68; p≦0.0001).

TABLE 3b

Clone ID
Gene/Protein Identity
Gene symbol
N−/10N+

129757
GATA-binding protein 3
GATA3
11.0

160963
Thrombospondin 1
THBS1
6.6

151475
Epidermal Growth Factor Receptor
EGFR
5.4

120916
Phosphodiesterase I
PDNP2
4.9

183030
Activating Transcription Factor 3
ATF3
4.6

211999
KIAA1075 protein
KIAA1075
4.5

110480
Nuclear Factor 1 A-type
NF1A
4.5

182264
P-Selectin
SELP
4.4

356763
Granzyme A
GZMA
4.3

214008
E-cadherin
CDH1
4.0

147016
ERBB2 Receptor Protein-Tyrosine
ERBB2
0.2

Kinase

179197
Protein Phosphatase PP2A, 55 kD
PP2A BR
0.2

Subunit
gamma

231424
Glutathione S Transferase Pi
GSTP1
0.4

111461
SOX4 Protein
SOX4
0.4

195022
Interleukin 2 Receptor Beta chain
IL2RB
0.4

220451
Zinc Finger protein 144
ZNF144
0.5

125413
Mucin 1
MUC1
0.6

290007
CD44 antigen, epithelial form
CD44
0.6

108571
Retinoblastoma-Like 2 (p130)
RBL2
0.7

130788
Topoisomerase (DNA) II Beta
TOP2B
0.7

(180 kD)

List of genes differentially expressed between ER-positive and ER-negative breast tumors (a) and between axillary lymph node-negative tumors and tumors with 10 or more involved lymph nodes (b).

Gene Clusters

Gene clustering from FIG. 2b showed groups of genes with correlated expression across samples. When different clones represented the same gene, they were clustered next to each other (red arrows). Correlation coefficients between gene pairs in the 34 tumors were often high (1% of the 13,041 gene pairs showed a correlation coefficient superior to 0.95—not shown). An example of highly correlated gene expression is that of BCL2 (SEQ ID No: 117) and RBL2. Such correlated expression, although it has not been described in the literature, probably reflects a common mechanism of regulation for these two genes. Furthermore, these genes also exhibited significant correlated expression with other genes such as PPP2CA (SEQ ID No; 184), AKT2 (SEQ ID No: 254), PRKCSH (SEQ ID No: 264) or TNFRSF6/FAS SEQ ID No.143). In particular, a striking correlated expression between BCL2 and FAS could be observed (r=0.91; data not shown). The exact meaning of this correlation is unknown, although it may reflect the necessary balance between apoptosis and anti-apoptosis for cell survival.

Although in human cancer the proportion of changes that is reflected at the RNA level is not known, monitoring gene expression patterns appears as a very promising way of increasing the knowledge of the disease. Several different types of cancer have been investigated using cDNA arrays: cervical (14), hepatocellular (15), ovarian (16), colon (17) and renal carcinomas (18), glioblastomas (19), melanomas (20) (21), rhabdomyosarcomas (22), acute leukemias (23) and lymphomas (24). In breast cancer, pioneering studies have yielded the first expression patterns (4, 25-31). They have in particular addressed the important issue of molecular differences in hormone-responsive and non-responsive breast tumors. Thus, Yang et al. (28) and Hoch et al. (25) compared expression profiles of breast carcinoma cell lines known to represent these two categories and identified a few genes with differential expression. One of these genes was GATA3. In these studies, cell lines were mostly used and tumor samples were rarely tested and generally in small numbers. The first study analyzing the expression profiles of a large series of breast cancers was published recently (32), but no correlation with clinical outcome was mentioned.

Several interesting points can be made based on the present experimentation. First, the differences in expression patterns among the tumors provided molecular transcriptional evidence of the histoclinical heterogeneity of breast cancer. This diversity was multifactorial, linked to many different genes, highlighting the interest of high throughput analysis in this context. It was possible, with a hierarchical clustering program integrating the expression profiles, to separate normal breast tissue from most tumors and, moreover, to identify two different groups of tumors. Most importantly, two different subgroups of tumors with a very distinct clinical outcome that could not be predicted with classical prognostic factors have been identified by clustering. Indeed, all these tumors had a theoretically bad prognosis as evaluated by current histoclinical tools. All these patients would be at the present time treated with adjuvant chemotherapy, but without the capacity for the physicians to identify patients who will benefit from this treatment and those who will not benefit.

Gene expression profiles were able to make this discrimination. Such predictive tools have important therapeutic implications. Patients with features of poor prognosis are candidates for other treatment than standard chemotherapy, avoiding loss of time and toxicities related to first-line chemotherapy. These results suggest that the histoclinical category of poor prognosis breast cancer, currently treated with adjuvant anthracycline-based chemotherapy, groups together at least two molecularly distinct subgroups of tumors with different outcome which would require distinct chemotherapy regimens. Expression profiles could thus provide a new and more accurate way of classifying breast tumors of poor prognosis and managing patients.

Similarly, despite molecular heterogeneity, significant correlations between the expression level of genes (GATA3 (SEQ ID No: 78), ERBB2 (SEQ ID No: 119)) and histological tumor parameters were identified. The ER-positivity in breast cancer has been correlated with tumor differentiation, low proliferating rate, favorable prognosis and response to hormonal therapy. The relation between hormone sensitivity of breast cancer and ER status is not perfect, and it is possible that some genes related to ER expression are more important than ER to characterize the hormone-sensitive phenotype. These genes could serve as predictive factors to guide the therapy.

GATA3 mRNA expression was highly correlated with ER status. GATA3, which is not estrogen-regulated (25), is a transcription factor that could regulate the expression of genes involved in the ER-positive phenotype. Among the other genes that were found associated with ER status during the experimental work leading to our disclosure, some, such as MYB (SEQ ID No: 355) (10), stromelysin 3 (SEQ ID No: 346) (33), and CRABP2 (SEQ ID No: 158) (34), have been previously reported expressed at high levels in ER-positive breast tumors. The higher levels of TP53 MnRNA in ER-positive tumors studied were surprising, although in agreement with a recent study (27). Most studies concerning TP53 expression analyzed the protein level rather than the mRNA level, and TP53 protein levels are classically negatively correlated with the ER status (35). The high expression of CRABP2 could be related to the better differentiated status of the ER-positive tumors. The low expression of the three immunity-related genes IL2RB (SEQ ID No: 99), IL2RG (SEQ ID No: 281) and CD3G (SEQ ID No: 416) may be related to the low lymphoid infiltration in these well differentiated tumors. ERBB2 high expression in breast cancer has been associated with a poor prognosis and some resistance to hormonal therapy and chemotherapy (36). It is involved in the regulation of cellular differentiation, adhesion, and motility. The motility-enhancing activity of ERBB2 (37) could be responsible for the increased metastatic potential and the unfavorable prognosis of the breast tumors that overexpress ERBB2. The low expression of E-cadherin (SEQ ID No: 328) and thrombospondin 1 (SEQ ID No: 217) in node-positive tumors are consistent with their putative role in different steps of metastatic spread: E-cadherin is an epithelial cell adhesion molecule whose disturbance is a prerequisite for the release of invasive cells in carcinomas (38) and thrombospondin 1 inhibits angiogenesis (39). Similarly, the high expression of the molecule surface antigen Mucin 1 in node-positive tumors (40) can reduce cell-cell interactions facilitating cell detachment and metastasis. CD44 (SEQ ID No: 376), encoding a transmembrane glycoprotein involved in cell adhesion and lymph node homing (41) was expressed at high levels in node-positive tumors as well as GSTP1 (SEQ ID No: 336) (Glutathione-S-Transferase Pi), recently reported associated with increased tumor size (27).

Second, there were a number of genes with highly correlated expression patterns. Gene correlations have already been reported with larger series of genes, essentially under dynamic experimental conditions (42) and recently in steady states (17). Here, correlations were based on expression profiles of a relatively small but selected series of genes and in steady states represented by different breast tumors. Gene correlations are potentially useful tools for cancer research in two ways: i) they can provide information about the general regulation circuitry of a cancerous cell, allowing the identification of regulatory elements controlling expression networks; ii) they offer the possibility of reducing the complexity of the system analyzed by replacing, for example, the intensities of a large number of genes present in a gene cluster by their respective mean intensities.

Finally, these results highlight the great potential of cDNA array in cancer research. The gene expression profiles confirmed the heterogeneity of breast cancer, and most importantly allowed us to identify, among a series of poor prognosis breast tumors, two subtypes of the disease not yet recognized with usual histoclinical parameters but with a different clinical outcome after adjuvant chemotherapy. Furthermore, this disclosure allows detection of genes of which expression was correlated with classical prognostic factors.

Table 4 displays a library of polynucleotides SEQ ID No: 1 to SEQ ID No: 468 corresponding to a population of polynucleotide sequences underexpressed or overexpressed in cells derived from tumors, more particularly breast tumors, and their respective complements.

TABLE 4

CORRELATION BETWEEN SEQ ID NO AS FILED WITH US PROVISIONAL APPLICATION

NO. 60/254,090 and SEQ ID NO FILED WITH NEW APPLICATION

Gene

Provisional
Provisional
Current,
Current,
Current,

Symbols
No
Name
Image
Seq3′
Seq5′
Seq3′
Seq5′
(mRNA)

GATA3
1
GATA-binding protein 3 (GATA3)
129757
SEQ ID

SEQ ID
SEQ ID
SEQ ID

NO: 1

NO: 76
NO: 77
NO: 78

MYB
2
v-myb avian myeloblastosis viral
248613

SEQ ID
0
SEQ ID
SEQ ID

oncogene homolog (MYB)

NO: 2

NO: 354
NO: 355

KIAA1075
3
KIAA1075 protein
211999
SEQ ID
SEQ ID
SEQ ID
SEQ ID
0

NO: 3
NO: 4
NO: 322
NO: 323

STMY3
4
matrix metalloproteinase 11
235947
SEQ ID

SEQ ID
0
SEQ ID

(stromelysin 3) MMP11) (ex

NO: 5

NO: 345

NO: 346

STMY3)

HGFL
5
macrophage-stimulating protein
229839
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(MSTl) (ex HGFL)

NO: 6
NO: 7
NO: 331
NO: 332
NO: 333

CRABP
6
cellular retinoic acid-binding
153275
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

protein 2 CRABP2)

NO: 8
NO: 9
NO: 156
NO: 157
NO: 158

XBP1
7
X-box binding protein 1 (XBP1)
301950
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 10
NO: 11
NO: 385
NO: 386
NO: 387

TP53
8
tumor protein p53 (Li-Fraumeni
205314

SEQ ID
SEQ ID
0
0

syndrome) (TP53)

NO: 12
NO: 442

IGF2
9
insulin-like growth factor 2
126233
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(somatomedin A) (IGF2),

NO: 13
NO: 14
NO: 59
NO: 60
NO: 61

CD3G
10
CD3G antigen, gamma polypeptide
66322
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(TiT3 complex) (CD3G)

NO: 15
NO: 16
NO: 414
NO: 415
NO: 416

IL2RG
11
interleukin 2 receptor, gamma
195022
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(severe combined

NO: 17
NO: 18
NO: 279
NO: 280
NO: 281

immunodeficiency) (IL2RG)

SOX4
12
SRY (sex determining region Y)-
111461
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

box 4 (SOX4)

NO: 19
NO: 20
NO: 22
NO: 23
NO: 24

EGFR
13
epidermal growth factor receptor
151475
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(avian erythroblastic)

NO: 21
NO: 22
NO: 135
NO: 136
NO: 137

TOP2B
14
topIIb mRNA for topoisomerase
130788

SEQ ID
0
SEQ ID
SEQ ID

IIb.

NO: 23

NO: 82
NO: 83

S100B
15
S100 calcium-binding protein, beta
183641

SEQ ID
0
SEQ ID
SEQ ID

(neural) (S100B)

NO: 24

NO: 255
NO: 256

EST N53133
16
EST N53133
246620
SEQ ID

SEQ ID
0
SEQ ID

NO: 25

NO: 352

NO: 353

GSTP1
17
glutathione S-transferase pi
231424
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(GSTP1)

NO: 26
NO: 27
NO: 334
NO: 335
NO: 336

THBS1
18
thrombospondin 1 (THBS1)
160963
SEQ ID

SEQ ID
0
SEQ ID

NO: 28

NO: 216

NO: 217

PDNP2
19
ectonucleotide
120916
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

pyrophosphatase/phosphodiesterase

NO: 29
NO: 30
NO: 39
NO: 40
NO: 41

2(autotaxin) (ENPP2) (ex PDNP2)

ATF3
20
activating transcription factor 3
183030
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(ATF3)

NO: 31
NO: 32
NO: 250
NO: 251
NO: 252

NF1A
21
(ex NF1A)
110480
SEQ ID

SEQ ID
0
0

NO: 33

NO: 16

SELP
22
selectin P (granule membrane
182264

SEQ ID
SEQ ID
SEQ ID
0

protein 140 kD, antigen CD62)

NO: 34
NO: 438
NO: 439

(SELP)

CDH1
23
cadherin 1, E-cadherin (epithelial)
214008
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(CDH1)

NO: 35
NO: 36
NO: 326
NO: 327
NO: 328

ERBB2
24
v-erb-b2 avian erythroblastic
147016
SEQ ID

0
SEQ ID
SEQ ID

leukemia viral oncogene homolog

NO: 37

NO: 118
NO: 119

2 (neuro/glioblastoma derived

oncogene homolog) (ERBB2)

PP2A BR
25
(PP2A BR gamma)
179197
SEQ ID
SEQ ID
SEQ ID
SEQ ID
0

gamma

NO: 38
NO: 39
NO: 238
NO: 239

ZNF144
26
zinc finger protein 144 (Mel-18)
220451
SEQ ID
SEQ ID
0
SEQ ID
SEQ ID

(ZNF144)

NO: 40
NO: 41

NO: 329
NO: 330

MUC1
27
mucin 1, transmembrane (MUC1)
125413

SEQ ID
0
SEQ ID
SEQ ID

NO: 42

NO: 57
NO: 58

CD44
28
CD44E (epithelial form)
290007
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 43
NO: 44
NO: 374
NO: 375
NO: 376

PLA2G2A
29
phospholipase A2, group IIA
152802
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(platelets, synovial fluid)

NO: 45
NO: 46
NO: 147
NO: 148
NO: 149

(PLA2G2A), nuclear gene

encoding mitochondrial protein

ACVRL1
30
activin A receptor type II-like 1
153350
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(ACVRL1)

NO: 47
NO: 48
NO: 159
NO: 160
NO: 161

AXL
31
AXL receptor tyrosine kinase
112500
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(AXL)

NO: 49
NO: 50
NO: 27
NO: 28
NO: 29

PKU-ALPHA
32
KU-alpha, partial cds (new gene
109569

SEQ ID
0
SEQ ID
SEQ ID

symbol Tlk2)

NO: 51

NO: 5
NO: 6

ABCC5
33
ATP-binding cassette, sub-family
212366

SEQ ID
0
SEQ ID
SEQ ID

C (CFTR/MRP), member 5

NO: 52

NO: 324
NO: 325

(ABCC5)

EDNRB
34
endothelin receptor type B
154244

SEQ ID
0
SEQ ID
SEQ ID

(EDNRB), transcript variant 1

NO: 53

NO: 176
NO: 177

DTR
35
diphtheria toxin receptor (heparin-
187547

SEQ ID
0
SEQ ID
SEQ ID

binding epidermal)

NO: 54

NO: 265
NO: 266

IGF1R
36
insulin-like growth factor 1
150361

SEQ ID
0
SEQ ID
SEQ ID

receptor (IGF1R)

NO: 55

NO: 129
NO: 130

KIAA0427
37
KIAA0427
127507
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 56
NO: 57
NO: 65
NO: 66
NO: 67

CD69
38
CD69 antigen (p60, early T-cell
276727

SEQ ID
0
SEQ ID
SEQ ID

activation antigen)

NO: 58

NO: 370
NO: 371

FGFR4
39
fibroblast growth factor receptor 4
116781
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(FGFR4)

NO: 59
NO: 60
NO: 36
NO: 37
NO: 38

EST T85683
40
EST T85683 cathepsin B (CTSB)
112622

SEQ ID
0
SEQ ID
SEQ ID

NO: 61

NO: 30
NO: 31

EST R00569
41
EST R00569 IL2-inducible T-cell
123871

SEQ ID
0
SEQ ID
SEQ ID

kinase (ITK)

NO: 62

NO: 44
NO: 45

TGFBR3
42
transforming growth factor, beta
208118
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

receptor III (TGFBR3)

NO: 63
NO: 64
NO: 311
NO: 312
NO: 313

INSR
43
insulin receptor (INSR)
151149

SEQ ID
0
SEQ ID
SEQ ID

NO: 65

NO: 131
NO: 132

MARK3
44
MAP/microtubule affinity-
110599
SEQ ID
SEQ ID
#N/A
#N/A
#N/A

regulating kinase 3 (MARK3)

NO: 66
NO: 67

TIMP2
45
tissue inhibitor of
131504

SEQ ID
0
SEQ ID
SEQ ID

metalloproteinase 2 (TIMP2)

NO: 68

NO: 86
NO: 87

EST R85557
46
EST R85557 thrombospondin 3
180219
SEQ ID

SEQ ID
0
SEQ ID

(THBS3)

NO: 69

NO: 240

NO: 241

GNRH1
47
gonadotropin-releasing hormone 1
192688

SEQ ID
0
SEQ ID
SEQ ID

(GNRH1)

NO: 70

NO: 277
NO: 278

FGFR2
48
fibroblast growth factor receptor 2
110387
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(FGFR2)

NO: 71
NO: 72
NO: 13
NO: 14
NO: 15

NFKB2
49
NFKB2
114879
SEQ ID

SEQ ID
0
0

NO: 73

NO: 35

VIL2
50
villin 2 (ezrin) (VIL2)
124701
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 74
NO: 75
NO: 51
NO: 52
NO: 53

ENG
51
endoglin (ENG)
156979
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 196
NO: 197
NO: 198

EPHA2
52
EphA2(EPHA2)
162004
SEQ ID

SEQ ID
0
SEQ ID

NO: 78

NO: 221

NO: 222

CREM
53
cAMP responsive element
258584
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

modulator (CREM)

NO: 79
NO: 80
NO: 358
NO: 359
NO: 360

ETV5-a
54
ets variant gene 5 (ETV5)
270549
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 81
NO: 82
NO: 368
NO: 369
NO: 300

EST N68536
55
EST N68536 MAX-interacting
298242
SEQ ID
SEQ ID
0
SEQ ID
SEQ ID

protein 1 (MXI1)

NO: 83
NO: 84

NO: 380
NO: 381

EST R81126
56
EST R81126 lymphotoxin beta
146635
SEQ ID
SEQ ID
SEQ ID
0
0

receptor (LTBR)

NO: 85
NO: 86
NO: 114

POU2F2
57
(POU2F2)
188393
SEQ ID
SEQ ID
SEQ ID
0
SEQ ID

NO: 87
NO: 88
NO: 271

NO: 272

FLI1
58
Friend leukemia virus integration 1
198144
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(FLI1)

NO: 89
NO: 90
NO: 293
NO: 294
NO: 295

TIE
59
tyrosine kinase with
144081

SEQ ID
0
SEQ ID
SEQ ID

immunoglobulin and epidermal

NO: 91

NO: 109
NO: 110

growth factor homology domains

(TIE)

PRLR
60
prolactin receptor (PRLR)
138788
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 92
NO: 93
NO: 94
NO: 95
NO: 96

PPP3CA
61
protein phosphatase 3 (formerly
110481
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

2B), catalytic subunit, gamma

NO: 94
NO: 95
NO: 17
NO: 18
NO: 19

isoform (calcineurin A gamma)

(PPP3CC) (ex PPP3CA)

PTPN2
62
protein tyrosine phosphatase, non-
161451
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

receptor type 2 (PTPN2)

NO: 96
NO: 97
NO: 218
NO: 219
NO: 220

PGF
63
placental growth factor, vascular
139326

SEQ ID
0
SEQ ID
SEQ ID

endothelial growth factor-related

NO: 98

NO: 102
NO: 103

protein (PGF)

TNFAIP3
64
tumor necrosis factor, alpha-
309943
SEQ ID

SEQ ID
SEQ ID
SEQ ID

induced protein 3 (TNFAIP3)

NO: 99

NO: 388
NO: 389
NO: 390

PHB
65
PHB (prohibitin)
236008
SEQ ID

SEQ ID
SEQ ID
SEQ ID

NO: 100

NO: 347
NO: 348
NO: 349

RIL
66
LIM domain protein (RIL)
153446

SEQ ID
0
SEQ ID
SEQ ID

NO: 101

NO: 162
NO: 163

MYBL2
67
v-myb avian myeloblastosis viral
207378
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

oncogene homolog-like 2

NO: 102
NO: 103
NO: 308
NO: 309
NO: 310

(MYBL2)

RELB
68
v-rel avian reTiculoendotheliosis
66969
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

viral oncogene homolog B (nuclear

NO: 104
NO: 105
NO: 417
NO: 418
NO: 419

factor of kappa light polypeptide

gene enhancer in B-cells 3) (RELB

ESTR97218
69
Est R97218
200394
SEQ ID

SEQ ID
SEQ ID
0

NO: 106

NO: 296
NO: 297

GZMH
70
granzyme B (granzyme 2,
154343
SEQ ID

SEQ ID
0
SEQ ID

cytotoxic T-lymphocyte-associated

NO: 107

NO: 178

NO: 179

serine esterase 1) (GZMB) (ex

GZMH)

MYC
71
c-myc proto-oncogene
129438
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 108
NO: 109
NO: 73
NO: 74
NO: 75

CASP1
72
caspase 4, apoptosis-related
131502

SEQ ID
SEQ ID
0
SEQ ID

cysteine protease (CASP4) (ex

NO: 110
NO: 84

NO: 85

CASP1)

SYK
73
spleen tyrosine kinase (SYK)
128142
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 111
NO: 112
NO: 68
NO: 69
NO: 70

EST 1127202
74
EST H27202 transcription factor
158347
SEQ ID
SEQ ID
SEQ ID
SEQ ID
0

E1AF gene

NO: 113
NO: 114
NO: 204
NO: 205

HRB
75
syndecan 1 (SDC1)(ex HRB)
108490
SEQ ID
SEQ ID
SEQ ID
0
SEQ ID

NO: 115
NO: 116
NO: 1

NO: 2

SHC1
76
p66shc (SHC)
153548

SEQ ID
0
SEQ ID
SEQ ID

NO: 117

NO: 164
NO: 165

CSF1
77
colony stimulating factor 1 (CSF1)
124554
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 118
NO: 119
NO: 48
NO: 49
NO: 50

UBE3A
78
ubiquitin protein ligase E3A
141924

SEQ ID
0
SEQ ID
SEQ ID

(UBE3A)

NO: 120

NO: 104
NO: 105

FKHR
79
forkhead box O1A
151247

SEQ ID
0
SEQ ID
SEQ ID

(rhabdomyosarcoma) (FOXO1A)

NO: 121

NO: 133
NO: 134

(ex FKHR)

CSF1R
80
colony stimulating factor 1 receptor
196282
SEQ ID

SEQ ID
0
SEQ ID

(CSF1R)

NO: 122

NO: 291

NO: 292

IFI75
81
interferon-induced protein 75
205612
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(IFI75)

NO: 123
NO: 124
NO: 305
NO: 306
NO: 307

GATA1
82
GATA-binding protein 1 (globin
109093

SEQ ID
0
SEQ ID
SEQ ID

transcription factor 1) (GATA1)

NO: 125

NO: 3
NO: 4

STAT1
83
signal transducer and activator of
110101

SEQ ID
0
SEQ ID
SEQ ID

transcription 1 (STAT1)

NO: 126

NO: 11
NO: 12

CREBBP
84
CREB binding protein (Rubinstein-
109677
SEQ ID
SEQ ID
SEQ ID
SEQ ID
0

Taybi syndrome) (CREBBP)

NO: 127
NO: 128
NO: 7
NO: 8

IL7R
85
interleukin 7 receptor (IL7R)
129059

SEQ ID
0
SEQ ID
SEQ ID

NO: 129

NO: 71
NO: 72

ANXA7
86
annexin A7 (ANXA7)
160580

SEQ ID
0
SEQ ID
SEQ ID

NO: 130

NO: 214
NO: 215

TNXA
87
tenascin XA (TNXA)
124340

SEQ ID
0
SEQ ID
SEQ ID

NO: 131

NO: 46
NO: 47

CNBP1
88
zinc finger protein 9 (a cellular
251963
SEQ ID

SEQ ID
0
SEQ ID

retroviral nucleic acid binding

NO: 132

NO: 356

NO: 357

protein) (ZNF9) (ex CNBP1)

CDK4-a
89
cyclin-dependent kinase 4 (CDK4)
204586
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 133
NO: 134
NO: 301
NO: 302
NO: 288

CSNK2B
90
gene for casein kinase II subunit
153879

SEQ ID
0
SEQ ID
SEQ ID

beta (EC 2.7.1.37)

NO: 135

NO: 171
NO: 172

EFNA1
91
ephrin-A1 (EFNA1)
162997

SEQ ID
0
SEQ ID
SEQ ID

NO: 136

NO: 226
NO: 227

SELE
92
selectin E (endothelial adhesion
186132
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

molecule 1) (SELE)

NO: 137
NO: 138
NO: 259
NO: 260
NO: 261

APC
93
adenomatosis polyposis coli (APC)
125294
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 139
NO: 140
NO: 54
NO: 55
NO: 56

FAK
94
PTK2 protein tyrosine kinase 2
195731

SEQ ID
0
SEQ ID
SEQ ID

(PTK2) (ex FAK)

NO: 141

NO: 284
NO: 285

FOS-a
95
v-fos FBJ murine osteosarcoma
208717

SEQ ID
0
SEQ ID
SEQ ID

viral oncogene homolog (FOS)

NO: 142

NO: 317
NO: 318

FGFR1
96
fibroblast growth factor receptor
154472
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(FGFr)

NO: 143
NO: 144
NO: 180
NO: 181
NO: 182

MC1R
97
melanocortin 1 receptor (alpha
155691

SEQ ID
0
SEQ ID
SEQ ID

melanocyte stimulating hormone

NO: 145

NO: 187
NO: 188

receptor) (MC1R)

PCNA
98
proliferating cell nuclear antigen
232941
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(PCNA)

NO: 146
NO: 147
NO: 339
NO: 340
NO: 341

DDT
99
D-dopachrome tautomerase (DDT)
132109
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 148
NO: 149
NO: 88
NO: 89
NO: 90

GRB2
100
growth factor receptor-bound
172152
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

protein 2 (GRB2)

NO: 150
NO: 151
NO: 230
NO: 231
NO: 232

AMFR
101
autocrine motility factor receptor
146280
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(AMFR)

NO: 152
NO: 153
NO: 111
NO: 112
NO: 113

ITGB2
102
integrin, beta 2 (antigen CD 18
187822
SEQ ID

0
SEQ ID
SEQ ID

(p95), lymphocyte function-

NO: 154

NO: 267
NO: 268

associated antigen 1; macrophage

antigen 1 (mac-1) beta subunit)

(ITGB2)

JUND
103
jun D proto-oncogene (JUND)
175421
SEQ ID

SEQ ID
0
SEQ ID

NO: 155

NO: 233

NO: 234

NF45
104
interleukin enhancer binding factor
243907

SEQ ID
0
SEQ ID
SEQ ID

2 (ILF2) (ex NF45)

NO: 156

NO: 350
NO: 351

PPP4C
105
protein phosphatase 4 (formerly X)
114097
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(PPP4C)

NO: 157
NO: 158
NO: 32
NO: 33
NO: 34

EMS1
106
ATX1 (antioxidant protein 1 ,
149172
SEQ ID

SEQ ID
SEQ ID
SEQ ID

yeast) homolog 1 (ATOX1) (ex

NO: 159

NO: 123
NO: 124
NO: 125

EMS1)

BCL2
107
B-cell CLL/lymphoma 2 (BCL2),
147002
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

nuclear gene encoding

NO: 160
NO: 161
NO: 115
NO: 116
NO: 117

mitochondrial protein, transcript

variant alpha

MGST1
108
protein phosphatase 1, catalytic
182610
SEQ ID
SEQ ID
SEQ ID
0
SEQ ID

subunit, alpha isoform (PPP1CA)

NO: 162
NO: 163
NO: 248

NO: 249

(ex MGST1)

PDGFRB
109
platelet-derived growth factor
158976

SEQ ID
0
SEQ ID
SEQ ID

receptor, beta polypeptide

NO: 164

NO: 208
NO: 209

(PDGFRB)

ANXA11
110
amiexin A11 (ANXA11)
158892

SEQ ID
0
SEQ ID
SEQ ID

NO: 165

NO: 206
NO: 207

GPX1
111
histocompatibility class II antigen
159809

SEQ ID
0
SEQ ID
SEQ ID

gamma chain (CD74) (ex GPX1

NO: 166

NO: 212
NO: 213

Glulation S transferase)

CFR-1
112
Golgi apparatus protein 1 (GLG1)
153974
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(ex CFR-1)

NO: 167
NO: 168
NO: 173
NO: 174
NO: 175

BTF3L3
113
basic transcription factor 3 (BTF3)
195889
SEQ ID

SEQ ID
0
SEQ ID

NO: 169

NO: 289

NO: 290

EST R55460
114
EST R55460
154997

SEQ ID
0
SEQ ID
0

NO: 170

NO: 185

AKT2
115
v-akt murine thymoma viral
183552
SEQ ID

SEQ ID
0
SEQ ID

oncogene homolog 2 (AKT2)

NO: 171

NO: 253

NO: 254

CDKN1A
116
cyclin-dependent kinase inhibitor
152524
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(CDKN1A)

NO: 172
NO: 173
NO: 144
NO: 145
NO: 146

PPP2CA
117
protein phosphatase 2 (formerly
154685
SEQ ID
SEQ ID
0
SEQ ID
SEQ ID

2A), catalytic subunit, alpha

NO: 174
NO: 175

NO: 183
NO: 184

isoform (PPP2CA)

MDM2
118
mouse double minute 2, human
148052
SEQ ID

0
SEQ ID
SEQ ID

homolog of; p53-binding protein

NO: 176

NO: 120
NO: 121

(MDM2), transcript variant MDM2

TNFRSF6
119
tumor necrosis factor receptor
151767
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

superfamily, member 6

NO: 177
NO: 178
NO: 141
NO: 142
NO: 143

(TNFRSF6)

CNTFR
120
ciliary neurotrophic factor receptor
156431

SEQ ID
0
SEQ ID
SEQ ID

(CNTFR)

NO: 179

NO: 192
NO: 193

JUNB
121
jun B proto-oncogene (JUNB)
153213
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 180
NO: 181
NO: 153
NO: 154
NO: 155

CCND1
122
cyclin D1 (PRAD1: parathyroid
110022
SEQ ID

SEQ ID
0
SEQ ID

adenomatosis 1) (CCND1)

NO: 182

NO: 9

NO: 10

TDPX1
123
peroxiredoxin 2 (PRDX2) (ex
208439
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

TDPX1)

NO: 183
NO: 184
NO: 314
NO: 315
NO: 316

GRB7
124
growth factor receptor-bound
130323
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

protein 7 (GRB7)

NO: 185
NO: 186
NO: 79
NO: 80
NO: 81

RBBP7
125
retinoblastoma-binding protein 7
210874
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(RBBP7)

NO: 187
NO: 188
NO: 319
NO: 320
NO: 321

TIMP1
126
tissue inhibitor of
162246
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

metalloproteinase 1 (erythroid

NO: 189
NO: 190
NO: 223
NO: 224
NO: 225

potentiating activity, collagenase

inhibitor) (TIMP1)

YES1
127
v-yes-1 Yamaguchi sarcoma viral
204634
SEQ ID

SEQ ID
0
SEQ ID

oncogene homolog 1 (YES1)

NO: 191

NO: 303

NO: 304

RNF5
128
ring finger protein 5 (RNF5)
112098

SEQ ID
0
SEQ ID
SEQ ID

NO: 192

NO: 25
NO: 26

PRKCSH
129
protein kinase C substrate 80K-H
187232

SEQ ID
0
SEQ ID
SEQ ID

(PRKCSH)

NO: 193

NO: 263
NO: 264

CTSD
130
cathepsin D (lysosomal aspartyl
149401
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

protease) (CTSD)

NO: 194
NO: 195
NO: 126
NO: 127
NO: 128

NEO1
131
neogenin (chicken) homolog 1
188380

SEQ ID
0
SEQ ID
SEQ ID

(NEO1)

NO: 196

NO: 269
NO: 270

GAPD-a
132
glyceraldehyde-3-phosphate
152847
SEQ ID

SEQ ID
SEQ ID
SEQ ID

dehydrogenase GAPD)

NO: 197

NO: 150
NO: 151
NO: 152

ACTG1
133
actin, gamma 1 (ACTG1)
182291
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 198
NO: 199
NO: 242
NO: 243
NO: 244

ITGA6
134
integrin, alpha 6 (ITGA6)
182431
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 200
NO: 201
NO: 245
NO: 246
NO: 247

GAPD-b
135
glyceraldehyde-3-phosphate
153607
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

dehydrogenase GAPD)

NO: 202
NO: 203
NO: 166
NO: 167
NO: 152

ETV5-b
136
ets variant gene 5 (ets-related
203394
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

molecule) (ETV5)

NO: 204
NO: 205
NO: 298
NO: 299
NO: 300

CDK4-b
137
cyclin-dependent kinase 4 (CDK4)
195800
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 206
NO: 207
NO: 286
NO: 287
NO: 288

FOS-b
138
v-fos FBJ murine osteosarcoma
363796
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

viral oncogene homolog (FOS)

NO: 208
NO: 209
NO: 404
NO: 405
NO: 318

HOXA5
139
homeobox protein (HOX-1.3) (ex
300564
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

Hox A5)

NO: 210
NO: 211
NO: 382
NO: 383
NO: 384

RELA
140
NF-kappa-B transcription factor
122056
SEQ ID

SEQ ID
0
SEQ ID

p65 DNA binding subunit (ex

NO: 212

NO: 42

NO: 43

RELa)

SU11
141
S100 calcium-binding protein A11
155345
SEQ ID
SEQ ID
SEQ ID
0
0

(calgizzarin) (S100A11)

NO: 213
NO: 214
NO: 186

ANG
142
angiogenin, ribonuclease, RNase A
156720

SEQ ID
0
SEQ ID
SEQ ID

family, 5 (ANG)

NO: 215

NO: 194
NO: 195

ITGA6
143
integrin, alpha 6 (ITGA6)
182431
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 216
NO: 217
NO: 245
NO: 246
NO: 247

PRMT2
144
HMT1 (hnRNP methyltransferase,
158038
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

S. cerevisiae)-like 1 (HRMTlLl)

NO: 218
NO: 219
NO: 201
NO: 202
NO: 203

(ex PRMT2)

EST R55460
145
EST R55460
154997

SEQ ID
0
SEQ ID
0

NO: 220

NO: 185

GZMA
146
granzyme A (granzyme 1,
356763
SEQ ID
SEQ ID
SEQ ID
0
SEQ ID

cytotoxic T-lymphocyte-associated

NO: 221
NO: 222
NO: 402

NO: 403

serine esterase 3) (GZMA)

SOX9
147
SRY (sex-determining region Y)-
323948
SEQ ID

SEQ ID
0
SEQ ID

box 9 (campomelic dysplasia,

NO: 223

NO: 394

NO: 395

autosomal sex-reversal) (SOX9)

SRF
148
serum response factor (c-fos serum
321329

SEQ ID
SEQ ID
SEQ ID
SEQ ID

response element-binding

NO: 224
NO: 391
NO: 392
NO: 393

transcription factor) (SRF)

EDN1
149
endothelin 1 (EDN1)
153424
SEQ ID

#N/A
#N/A
#N/A

NO: 225

PTPN6
150
protein tyrosine phosphatase; non-
66778
SEQ ID

#N/A
#N/A
#N/A

receptor type 6(PTPN6)

NO: 226

TFAP4
151
transcription factor AP-4
159093
SEQ ID

0
SEQ ID
SEQ ID

(activating enhancer binding

NO: 227

NO: 210
NO: 211

protein 4) (TFAP4)

ELF1
152
Human cis-acting sequence. Elf-1
182007
SEQ ID

SEQ ID
0
0

NO: 228

NO: 437

CD2
153
CD2 antigen (p50), sheep red blood
120649
SEQ ID

SEQ ID
0
0

cell receptor (CD2)

NO: 229

NO: 431

CCND2
154
cyclin D2 (CCND2)
175256
SEQ ID

#N/A
#N/A
#N/A

NO: 230

IL3RA
155
interleukin 3 receptor (hIL-3Ra)
183087
SEQ ID

SEQ ID
SEQ ID
0

NO: 231

NO: 440
NO: 441

JUP
156
junction plakoglobin (JUP)
157958
SEQ ID

#N/A
#N/A
#N/A

NO: 232

RBL2
157
retinoblastoma-like 2 (p130)
108571
SEQ ID

SEQ ID
0
0

(RBL2)

NO: 233

NO: 430

HOXA4
158
homeo box A4 (HOXA4)
110731
SEQ ID

SEQ ID
SEQ ID
0

NO: 234

NO: 20
NO: 21

ACY1
159
aminoacylase 1 (ACY1)
160764
SEQ ID

SEQ ID
SEQ ID
0

NO: 235

NO: 435
NO: 436

GADD45A
160
growth arrest and DNA-damage-
115176
SEQ ID

#N/A
#N/A
#N/A

inducible, alpha (GADD45A)

NO: 236

nm23
161
non-metastatic cells 1, protein
174388
SEQ ID

#N/A
#N/A
#N/A

(NM23A) expressed in (NME1)

NO: 237

BBC1
162
ribosomal protein L13 (RPL13) (ex
178317
SEQ ID

#N/A
#N/A
#N/A

BBC1)

NO: 238

VEGFB
163
vascular endothelial growth factor
162499
SEQ ID

#N/A
#N/A
#N/A

B (VEGFB)

NO: 239

LAMR1
164
laminin receptor 1 (67 kD,
199837
SEQ ID

#N/A
#N/A
#N/A

ribosomal protein SA)(LAMR1)

NO: 240

IL2RB
165
interleukin 2 receptor, beta
139073
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

(IL2RB)

NO: 241
NO: 242
NO: 97
NO: 98
NO: 99

DES
166
desmin
153854
SEQ ID

SEQ ID
SEQ ID
SEQ ID

NO: 243

NO: 168
NO: 169
NO: 170

PRL
167
prolactin
133738
SEQ ID

SEQ ID
SEQ ID
SEQ ID

NO: 244

NO: 91
NO: 92
NO: 93

CSH1
168
Chorionic somatomammotropin
133891

SEQ ID
SEQ ID
0
0

hormone 1 (placental lactogen) =

NO: 245
NO: 432

LACTOGEN Precursor

TEK
169
tyrosine proteine kinase receptor
151501
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 246
NO: 247
NO: 138
NO: 139
NO: 140

Nrg1
170
neuregulin 1 (EST R72075)
155716
SEQ ID
SEQ ID
SEQ ID
SEQ ID
SEQ ID

NO: 248
NO: 249
NO: 189
NO: 190
NO: 191

PLAT
rien
pas d'EST ni mRNA
160149

SEQ ID
SEQ ID
0

NO: 433
NO: 434

EST AW184517
rien

image ?

Tables 5 hereunder displays subpopulations of polynucleotide sequences interesting to distinguish a person without cancer from a cancer patient.

TABLE 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

HRB
1
hiv-1 rev binding protein
SEQ ID

SEQ ID

NO: 1

NO: 2

EST T81919
4
ests, weakly similar to alu7_human alu subfamily sq
SEQ ID
SEQ ID

sequence contamination warning entry [h. sapiens]
NO: 7
NO: 8

ENPP2
18
ectonucleotide pyrophosphatase/phosphodiesterase 2
SEQ ID
SEQ ID
SEQ ID

(autotaxin)
NO: 39
NO: 40
NO: 41

TNXB
21
tenascin xb

SEQ ID
SEQ ID

NO: 46
NO: 47

APC
24
adenomatosis polyposis coli
SEQ ID
SEQ ID
SEQ ID

NO: 54
NO: 55
NO: 56

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

PRL
38
prolactin
SEQ ID
SEQ ID
SEQ ID

NO: 91
NO: 92
NO: 93

BCL2
48
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 115
NO: 116
NO: 117

CTSD
53
cathepsin d (lysosomal aspartyl protease)
SEQ ID
SEQ ID
SEQ ID

NO: 126
NO: 127
NO: 128

TEK
58
tek tyrosine kinase, endothelial (venous malformations,
SEQ ID
SEQ ID
SEQ ID

multiple cutaneous and mucosal)
NO: 138
NO: 139
NO: 140

TNFRSF6
59
tumor necrosis factor receptor superfamily, member 6
SEQ ID
SEQ ID
SEQ ID

NO: 141
NO: 142
NO: 143

PLA2G2A
61
phospholipase a2, group iia (platelets, synovial fluid)
SEQ ID
SEQ ID
SEQ ID

NO: 147
NO: 148
NO: 149

CRABP2
64
cellular retinoic acid-binding protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 156
NO: 157
NO: 158

RIL
66
lim domain protein

SEQ ID
SEQ ID

NO: 162
NO: 163

DES
69
desmin
SEQ ID
SEQ ID
SEQ ID

NO: 168
NO: 169
NO: 170

GZMB
73
granzyme b (granzyme 2, cytotoxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 1)
NO: 178

NO: 179

ETV4
85
ets variant gene 4 (ela enhancer-binding protein, elaf)
SEQ ID
SEQ ID

NO: 204
NO: 205

WBSCR14
88
williams-beuren syndrome chromosome region 14

SEQ ID
SEQ ID

NO: 210
NO: 211

THBS1
91
thrombospondin 1
SEQ ID

SEQ ID

NO: 216

NO: 217

GRB2
97
growth factor receptor-bound protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 230
NO: 231
NO: 232

RAD9
104
rad9 (s. pombe) homolog
SEQ ID

SEQ ID

NO: 248

NO: 249

ATF3
105
activating transcription factor 3
SEQ ID
SEQ ID
SEQ ID

NO: 250
NO: 251
NO: 252

DTR
112
diphtheria toxin receptor (heparin-binding epidermal

SEQ ID
SEQ ID

growth factor-like growth factor)

NO: 265
NO: 266

ITGB2
113
integrin, beta 2 (antigen cdl8 (p95), lymphocyte

SEQ ID
SEQ ID

function-associated antigen 1; macrophage antigen 1

NO: 267
NO: 268

(mac-1) beta subunit)

POU2F2
115
pou domain, class 2, transcription factor 2
SEQ ID

SEQ ID

NO: 271

NO: 272

MYBL2
131
v-myb avian myeloblastosis viral oncogene homolog-like 2
SEQ ID
SEQ ID
SEQ ID

NO: 308
NO: 309
NO: 310

TGFBR3
132
transforming growth factor, beta receptor iii
SEQ ID
SEQ ID
SEQ ID

(betaglycan, 300 kd)
NO: 311
NO: 312
NO: 313

FOS
134
v-fos fbj murine osteosarcoma viral oncogene homolog

SEQ ID
SEQ ID

NO: 317
NO: 318

ABCC5
137
atp-binding cassette, sub-family c (cftr/mrp), member 5

SEQ ID
SEQ ID

NO: 324
NO: 325

MMP11
145
matrix metalloproteinase 11 (stromelysin 3)
SEQ ID

SEQ ID

NO: 345

NO: 346

ILF2
147
interleukin enhancer binding factor 2, 45 kd

SEQ ID
SEQ ID

NO: 350
NO: 351

ETV5
155
ets variant gene 5 (ets-related molecule)
SEQ ID
SEQ ID
SEQ ID

NO: 368
NO: 369
NO: 300

RELB
175
v-rel avian reticuloendotheliosis viral oncogene
SEQ ID
SEQ ID
SEQ ID

homolog b (nuclear factor of kappa light polypeptide
NO: 417
NO: 418
NO: 419

gene enhancer in b-cells 3)

ESTT80406
180
similar to SP:S36648 S36648 RB2/P130 PROTEIN
SEQ ID

NO: 430

ESTT95640
181
similar to gb:M16336 T-CELL SURFACE ANTIGEN CD2
SEQ ID

NO: 431

EST R28523
182
similar to placental lactogen (CSH1)
SEQ ID

NO: 432

EST H28056
185

Homo sapiens E74-like factor 1 (ets domain
SEQ ID

transcription factor) (ELF 1)
NO: 437

ESTs H42957
187
Human interleukin 3 receptor (hIL-3Ra)
SEQ ID
SEQ ID

& H42888

NO: 440
NO: 441

Tables 5A and 5B hereunder displays two subpopulations corresponding to the 5 top overexpressed and to the 5 top underexpressed polynucleotide sequences particularly interesting to distinguish a person without cancer from a cancer patient.

TABLE 5A

overexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

GZMB
73
granzyme b (granzyme 2, cytotoxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 1)
NO: 178

NO: 179

MYBL2
131
v-myb avian myeloblastosis viral oncogene
SEQ ID
SEQ ID
SEQ ID

homolog-like 2
NO: 308
NO: 309
NO: 310

MMP11
145
matrix metalloproteinase 11 (stromelysin 3)
SEQ ID

SEQ ID

NO: 345

NO: 346

EST T95640
181
similar to gb:M16336 T-CELL SURFACE ANTIGEN CD2
SEQ ID

NO: 431

TABLE 5B

undcrexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

PRL
38
prolactin
SEQ ID
SEQ ID
SEQ ID

NO: 91
NO: 92
NO: 93

TEK
58
tek tyrosine kinase, endothelial (venous malformations,
SEQ ID
SEQ ID
SEQ ID

multiple cutaneous and mucosal)
NO: 138
NO: 139
NO: 140

PLA2G2A
61
phospholipase a2, group iia (platelets, synovial fluid)
SEQ ID
SEQ ID
SEQ ID

NO: 147
NO: 148
NO: 149

DES
69
desmin
SEQ ID
SEQ ID
SEQ ID

NO: 168
NO: 169
NO: 170

EST R28523
182
similar to placental lactogen (CSH1)
SEQ ID

NO: 432

Table 6 hereunder relates to subpopulations of polynucleotide sequences interesting to detect hormone-sensitive tumors allowing distinction between ER+ and ER− samples.

TABLE 6

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

SOX4
11
sry (sex determining region y)-box 4
SEQ ID
SEQ ID
SEQ ID

NO: 22
NO: 23
NO: 24

IGF2
26
insulin-like growth factor 2 (somatomedin a)
SEQ ID
SEQ ID
SEQ ID

NO: 59
NO: 60
NO: 61

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

TOP2B
34
topoisomerase (dna) ii beta (180 kd)

SEQ ID
SEQ ID

NO: 82
NO: 83

IL2RB
40
interleukin 2 receptor, beta
SEQ ID
SEQ ID
SEQ ID

NO: 97
NO: 98
NO: 99

EGFR
57
epidermal growth factor receptor (avian erythroblastic
SEQ ID
SEQ ID
SEQ ID

leukemia viral (v-erb-b) oncogene homolog)
NO: 135
NO: 136
NO: 137

CRABP2
64
cellular retinoic acid-binding protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 156
NO: 157
NO: 158

S100B
107
s100 calcium-binding protein, beta (neural)

SEQ ID
SEQ ID

NO: 255
NO: 256

IL2RG
119
interleukin 2 receptor, gamma (severe combined
SEQ ID
SEQ ID
SEQ ID

immunodeficiency)
NO: 279
NO: 280
NO: 281

KIAA1075
136
kiaa1075 protein
SEQ ID
SEQ ID

NO: 322
NO: 323

MST1
140
macrophage stimulating 1 (hepatocyte growth factor-like)
SEQ ID
SEQ ID
SEQ ID

NO: 331
NO: 332
NO: 333

GSTP1
141
glutathione s-transferase pi
SEQ ID
SEQ ID
SEQ ID

NO: 334
NO: 335
NO: 336

MMP11
145
matrix metalloproteinase 11 (stromelysin 3)
SEQ ID

SEQ ID

NO: 345

NO: 346

FLJ11307
148
hypothetical protein flj11307
SEQ ID

SEQ ID

NO: 352

NO: 353

MYB
149
v-myb avian myeloblastosis viral oncogene homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

XBP1
162
x-box binding protein 1
SEQ ID
SEQ ID
SEQ ID

NO: 385
NO: 386
NO: 387

SOX9
165
sry (sex determining region y)-box 9 (campomelic
SEQ ID

SEQ ID

dysplasia, autosomal sex-reversal)
NO: 394

NO: 395

GZMA
169
granzyme a (granzyme 1, cytotoxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 3)
NO: 402

NO: 403

CD3G
174
cd3g antigen, gamma polypeptide (tit3 complex)
SEQ ID
SEQ ID
SEQ ID

NO: 414
NO: 415
NO: 416

EST H57912
188
Human tumor protein p53 (Li-Fraumeni syndrome) (TP53)
SEQ ID

NO: 442

Tables 6A and 6B hereunder relate to two subpopulations of polynucleotide sequences particularly interesting to detect hormone-sensitive tumors allowing distinction between ER+ and ER− samples

TABLE 6A

overexpressed genes: top 5 ER+/ER−

Gene
CL

symbol
No
Name
Seq3′
Seq5′
Ref

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

KIAA1075
136
kiaa1075 protein
SEQ ID
SEQ ID

NO: 322
NO: 323

MMP11
145
matrix metalloproteinase 11 (stromelysin 3)
SEQ ID

SEQ ID

NO: 345

NO: 346

MYB
149
v-myb avian myeloblastosis viral oncogene homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

GZMA
169
granzyme a (granzyme 1, cytotoxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 3)
NO: 402

NO: 403

TABLE 6B

underexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

SOX4
11
sry (sex determining region y)-box 4
SEQ ID
SEQ ID
SEQ ID

NO: 22
NO: 23
NO: 24

L2RB
40
interleukin 2 receptor, beta
SEQ ID
SEQ ID
SEQ ID

NO: 97
NO: 98
NO: 99

EGFR
57
epidermal growth factor receptor (avian erythroblastic
SEQ ID
SEQ ID
SEQ ID

leukemia viral (v-erb-b) oncogene homolog)
NO: 135
NO: 136
NO: 137

L2RG
119
interleukin 2 receptor, gamma (severe combined
SEQ ID
SEQ ID
SEQ ID

immunodeficiency)
NO: 279
NO: 280
NO: 281

CD3G
174
cd3g antigen, gamma polypeptide (tit3 complex)
SEQ ID
SEQ ID
SEQ ID

NO: 414
NO: 415
NO: 416

Tables 7 hereunder relates to subpopulations of polynucleotide sequences interesting to distinguish tumors in which a lymph node has been invaded by a tumor cell from tumors in which a lymph node has not been so invaded.

TABLE 7

Gene
CL

symbol
No
Name
Seq3′
Seq5′
Ref

EST T89980
8
ests
SEQ ID

NO: 16

SOX4
11
sry (sex determining region y)-box 4
SEQ ID
SEQ ID
SEQ ID

NO: 22
NO: 23
NO: 24

ENPP2
18
ectonucleotide pyrophosphatase/phosphodiesterase 2
SEQ ID
SEQ ID
SEQ ID

(autotaxin)
NO: 39
NO: 40
NO: 41

MUC1
25
mucin 1, transmembrane

SEQ ID
SEQ ID

NO: 57
NO: 58

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

TOP2B
34
topoisomerase (dna) ii beta (180 kd)

SEQ ID
SEQ ID

NO: 82
NO: 83

IL2RB
40
interleukin 2 receptor, beta
SEQ ID
SEQ ID
SEQ ID

NO: 97
NO: 98
NO: 99

ERBB2
49
v-erb-b2 avian erythroblastic leukemia viral oncogene

SEQ ID
SEQ ID

homolog 2 (neuro/glioblastoma derived oncogene homolog)

NO: 118
NO: 119

EGFR
57
epidermal growth factor receptor (avian erythroblastic
SEQ ID
SEQ ID
SEQ ID

leukemia viral (v-erb-b) oncogene homolog)
NO: 135
NO: 136
NO: 137

THBS1
91
thrombospondin 1
SEQ ID

SEQ ID

NO: 216

NO: 217

PPP2R2C
100
protein phosphatase 2 (formerly 2a), regulatory subunit
SEQ ID
SEQ ID

b (pr 52), gamma isoform
NO: 238
NO: 239

ATF3
105
activating transcription factor 3
SEQ ID
SEQ ID
SEQ ID

NO: 250
NO: 251
NO: 252

KIAA1075
136
kiaa1075 protein
SEQ ID
SEQ ID

NO: 322
NO: 323

CDH1
138
cadherin 1, type 1, e-cadherin (epithelial)
SEQ ID
SEQ ID
SEQ ID

NO: 326
NO: 327
NO: 328

ZNF144
139
zinc finger protein 144 (mel-18)

SEQ ID
SEQ ID

NO: 329
NO: 330

GSTP1
141
glutathione s-transferase pi
SEQ ID
SEQ ID
SEQ ID

NO: 334
NO: 335
NO: 336

CD44
158
cd44 antigen (homing function and indian blood group system)
SEQ ID
SEQ ID
SEQ ID

NO: 374
NO: 375
NO: 376

GZMA
169
granzyme a (granzyme 1, cytotoxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 3)
NO: 402

NO: 403

EST T80406
180
similar to SP:S36648 S36648 RB2/P130 PROTEIN
SEQ ID

NO: 430

ESTs H30141
186

Homo sapiens selectin P
SEQ ID
SEQ ID

& H27466

NO: 438
NO: 439

Tables 7A and 7B hereunder relate to two subpopulations of polynucleotide sequences particularly interesting to distinguish tumors in which a lymph node has been invaded by a tumor cell from tumors in which a lymph node has not been so invaded.

TABLE 7A

Overexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

ENPP2
18
ectonucleotide pyrophosphatase/phosphodiesterase 2
SEQ ID
SEQ ID
SEQ ID

(autotaxin)
NO: 39
NO: 40
NO: 41

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

EGFR
57
epidermal growth factor receptor (avian erythroblastic
SEQ ID
SEQ ID
SEQ ID

leukemia viral (v-erb-b) oncogene homolog)
NO: 135
NO: 136
NO: 137

THBS1
91
thrombospondin 1
SEQ ID

SEQ ID

NO: 216

NO: 217

ATF3
105
activating transcription factor 3
SEQ ID
SEQ ID
SEQ ID

NO: 250
NO: 251
NO: 252

TABLE 7B

Underexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

SOX4
11
sry (sex determining region y)-box 4
SEQ ID
SEQ ID
SEQ ID

NO: 22
NO: 23
NO: 24

IL2RB
40
interleukin 2 receptor, beta
SEQ ID
SEQ ID
SEQ ID

NO: 97
NO: 98
NO: 99

ERBB2
49
v-erb-b2 avian erythroblastic leukemia viral oncogene

SEQ ID
SEQ ID

homolog 2 (neuro/glioblastoma derived oncogene homolog)

NO: 118
NO: 119

PPP2R2C
100
protein phosphatase 2 (formerly 2a), regulatory subunit
SEQ ID
SEQ ID

b (pr 52), gamma isoform
NO: 238
NO: 239

GSTP1
141
glutathione s-transferase pi
SEQ ID
SEQ ID
SEQ ID

NO: 334
NO: 335
NO: 336

Table 8 hereunder relates to subpopulations of polynucleotide sequences particularly interesting to distinguish tumors sensitive to anthracycline from tumors insensitive to anthracycline.

TABLE 8

A1/A2

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

SOX4
11
sry (sex determining region y)-box 4
SEQ ID
SEQ ID
SEQ ID

NO: 22
NO: 23
NO: 24

CSF1
22
colony stimulating factor 1 (macrophage)
SEQ ID
SEQ ID
SEQ ID

NO: 48
NO: 49
NO: 50

VIL2
23
villin 2 (ezrin)
SEQ ID
SEQ ID
SEQ ID

NO: 51
NO: 52
NO: 53

IGF2
26
insulin-like growth factor 2 (somatomedin a)
SEQ ID
SEQ ID
SEQ ID

NO: 59
NO: 60
NO: 61

KIAA0427
28
kiaa0427 gene product
SEQ ID
SEQ ID
SEQ ID

NO: 65
NO: 66
NO: 67

MYC
31
v-myc avian myelocytomatosis viral oncogene homolog
SEQ ID
SEQ ID
SEQ ID

NO: 73
NO: 74
NO: 75

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

TOP2B
34
topoisomerase (dna) ii beta (180 kd)

SEQ ID
SEQ ID

NO: 82
NO: 83

ERBB2
49
v-erb-b2 avian erythroblastic leukemia viral oncogene

SEQ ID
SEQ ID

homolog 2 (neuro/glioblastoma derived oncogene homolog)

NO: 118
NO: 119

EGFR
57
epidermal growth factor receptor (avian erythroblastic
SEQ ID
SEQ ID
SEQ ID

leukemia viral (v-erb-b) oncogene homolog)
NO: 135
NO: 136
NO: 137

CRABP2
64
cellular retinoic acid-binding protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 156
NO: 157
NO: 158

GZMB
73
granzyme b (granzyme 2, cytotpxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 1)
NO: 178

NO: 179

IGKC
77
immunoglobulin kappa constant
SEQ ID

NO: 186

ANG
81
angiogenic ribonuclease, rnase a family, 5

SEQ ID
SEQ ID

NO: 194
NO: 195

EFNA1
95
ephrin-a1

SEQ ID
SEQ ID

NO: 226
NO: 227

MYBL2
131
v-myb avian myeloblastosis viral oncogene homolog-like 2
SEQ ID
SEQ ID
SEQ ID

NO: 308
NO: 309
NO: 310

CDH1
138
cadherin 1, type 1, e-cadherin (epithelial)
SEQ ID
SEQ ID
SEQ ID

NO: 326
NO: 327
NO: 328

MST1
140
macrophage stimulating 1 (hepatocyte growth factor-like)
SEQ ID
SEQ ID
SEQ ID

NO: 331
NO: 332
NO: 333

MYB
149
v-myb avian myeloblastosis viral oncogene homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

XBP1
162
x-box binding protein 1
SEQ ID
SEQ ID
SEQ ID

NO: 385
NO: 386
NO: 387

SRF
164
serum response factor (c-fos serum response element-
SEQ ID
SEQ ID
SEQ ID

binding transcription factor)
NO: 391
NO: 392
NO: 393

SOX9
165
sry (sex determining region y)-box 9 (campomelic
SEQ ID

SEQ ID

dysplasia, autosomal sex-reversal)
NO: 394

NO: 395

ESTs H21879
183

Homo sapiens plasminogen activator (PLAT)
SEQ ID
SEQ ID

& H21880

NO: 433
NO: 434

Tables 8A and 8B hereunder relate to two subpopulations of polynucleotide sequences particularly interesting to distinguish tumors sensitive to anthracycline from tumors insensitive to anthracycline.

TABLE 8A

Overexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

KIAA1075
136
kiaa1075 protein
SEQ ID
SEQ ID

NO: 322
NO: 323

MMP11
145
matrix metalloproteinase 11 (stromelysin 3)
SEQ ID

SEQ ID

NO: 345

NO: 346

MYB
149
v-myb avian myeloblastosis viral oncogene homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

GZMA
169
Granzyme a (granzyme 1, cytotoxic t-lymphocyte-
SEQ ID

SEQ ID

associated serine esterase 3)
NO: 402

NO: 403

TABLE 8B

underexpressed genes: top 5

Gene

symbol
No
Name
Seq3′
Seq5′
Ref

SOX4
11
sry (sex determining region y)-box 4
SEQ ID
SEQ ID
SEQ ID

NO: 22
NO: 23
NO: 24

IL2RB
40
interleukin 2 receptor, beta
SEQ ID
SEQ ID
SEQ ID

NO: 97
NO: 98
NO: 99

EGFR
57
epidermal growth factor receptor (avian erythroblastic
SEQ ID
SEQ ID
SEQ ID

leukemia viral (v-erb-b) oncogene homolog)
NO: 135
NO: 136
NO: 137

IL2RG
119
interleukin 2 receptor, gamma (severe combined
SEQ ID
SEQ ID
SEQ ID

immunodeficiency)
NO: 279
NO: 280
NO: 281

CD3G
174
cd3g antigen, gamma polypeptide (tit3 complex)
SEQ ID
SEQ ID
SEQ ID

NO: 414
NO: 415
NO: 416

Tables 9, 9A and 9B hereunder relate to subpopulations of polynucleotide sequences particularly interesting in classifying good and poor prognosis primary breast tumors.

TABLE 9

Gene
SET

symbol
No
Name
Seq3′
Seq5′
Ref

CTSB
14
cathepsin b

SEQ ID
SEQ ID

NO: 30
NO: 31

VIL2
23
villin 2 (ezrin)
SEQ ID
SEQ ID
SEQ ID

NO: 51
NO: 52
NO: 53

MUC1
25
mucin 1, transmembrane

SEQ ID
SEQ ID

NO: 57
NO: 58

EMR1
27
egf-like module containing, mucin-like hormone
SEQ ID
SEQ ID
SEQ ID

receptor-like sequence 1
NO: 62
NO: 63
NO: 64

KIAA0427
28
kiaa0427 gene product
SEQ ID
SEQ ID
SEQ ID

NO: 65
NO: 66
NO: 67

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

PRLR
39
prolactin receptor
SEQ ID
SEQ ID
SEQ ID

NO: 94
NO: 95
NO: 96

GATA3
41
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 100
NO: 101
NO: 78

TC21
44
oncogene tc21
SEQ ID
SEQ ID
SEQ ID

NO: 106
NO: 107
NO: 108

BCL2
48
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 115
NO: 116
NO: 117

GATA3
51
gata-binding protein 3
SEQ ID

SEQ ID

NO: 122

NO: 78

CRABP2
64
cellular retinoic acid-binding protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 156
NO: 157
NO: 158

ANG
81
angiogenin, ribonuclease, mase a family, 5

SEQ ID
SEQ ID

NO: 194
NO: 195

EGF
83
epidermal growth factor (beta-urogastrone)
SEQ ID

SEQ ID

NO: 199

NO: 200

THBS1
91
thrombospondin 1
SEQ ID

SEQ ID

NO: 216

NO: 217

EDNRA
96
endothelin receptor type a
SEQ ID

SEQ ID

NO: 228

NO: 229

SMARCA2
99
swi/snf related, matrix associated, actin dependent
SEQ ID
SEQ ID
SEQ ID

regulator of chromatin, subfamily a, member 2
NO: 235
NO: 236
NO: 237

ABCB1
108
atp-binding cassette, sub-family b (mdr/tap), member 1
SEQ ID

SEQ ID

NO: 257

NO: 258

EGF
110
epidermal growth factor (beta-urogastrone)
SEQ ID

SEQ ID

NO: 262

NO: 200

BIRC4
116
baculoviral iap repeat-containing 4
SEQ ID

SEQ ID

NO: 273

NO: 274

DAP3
117
death associated protein 3
SEQ ID

SEQ ID

NO: 275

NO: 276

GNRH1
118
gonadotropin-releasing hormone 1 (leutinizing-

SEQ ID
SEQ ID

releasing hormone)

NO: 277
NO: 278

DAP3
120
death associated protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 282
NO: 283
NO: 276

EST R97218
126
ests, highly similar to tvhume hepatocyte growth
SEQ ID
SEQ ID

factor receptor precursor [h. sapiens]
NO: 296
NO: 297

BCL2
142
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 337
NO: 338
NO: 117

BS69
144
adenovirus 5 e l a binding protein
SEQ ID
SEQ ID
SEQ ID

NO: 342
NO: 343
NO: 344

MYB
149
v-myb avian myeloblastosis vira oncogene homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

CTSB
152
cathepsin b
SEQ ID

SEQ ID

NO: 361

NO: 31

MLANA
153
melan-a
SEQ ID
SEQ ID
SEQ ID

NO: 362
NO: 363
NO: 364

APR-1
154
apr-1 protein
SEQ ID
SEQ ID
SEQ ID

NO: 365
NO: 366
NO: 367

TC21
157
oncogenetc21
SEQ ID
SEQ ID
SEQ ID

NO: 372
NO: 373
NO: 108

CDKN3
159
cyclin-dependent kinase inhibitor 3 (cdk2-associated
SEQ ID
SEQ ID
SEQ ID

dual specificity phosphatase)
NO: 377
NO: 378
NO: 379

XBP1
162
x-box binding protein 1
SEQ ID
SEQ ID
SEQ ID

NO: 385
NO: 386
NO: 387

CDH15
166
cadherin 15, m-cadherin (myotubule)
SEQ ID
SEQ ID
SEQ ID

NO: 396
NO: 397
NO: 398

BCL2
167
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 399
NO: 400
NO: 117

EST W73386
168
ests
SEQ ID

NO: 401

ILF1
171
interleukin enhancer binding factor 1
SEQ ID
SEQ ID
SEQ ID

NO: 406
NO: 407
NO: 408

ARHGDIA
172
rho gdp dissociation inhibitor (gdi) alpha
SEQ ID
SEQ ID
SEQ ID

NO: 409
NO: 410
NO: 411

C4A
173
complement component 4a
SEQ ID

SEQ ID

NO: 412

NO: 413

ESR1
176
estrogen receptor 1
SEQ ID
SEQ ID
SEQ ID

NO: 420
NO: 421
NO: 422

PBX1
177
pre-b-cell leukemia transcription factor 1
SEQ ID
SEQ ID
SEQ ID

NO: 423
NO: 424
NO: 425

GLI3
178
gli-kruppel family member gli3 (greig
SEQ ID
SEQ ID
SEQ ID

cephalopolysyndactyly syndrome)
NO: 426
NO: 427
NO: 428

ILF1
179
interleukin enhancer binding factor 1
SEQ ID

SEQ ID

NO: 429

NO: 408

ESTs H24628
184

Homo sapiens aminoacylase 1 (ACY1).
SEQ ID
SEQ ID

& H24592

NO: 435
NO: 436

EST H28056
185

Homo sapiens E74-like factor 1 (ets domain
SEQ ID

transcription factor) (ELF1)
NO: 437

TABLE 9A

Gene
SET

symbol
No
Name
Seq3′
Seq5′
Ref

VIL2
23
villin 2 (ezrin)
SEQ ID
SEQ ID
SEQ ID

NO: 51
NO: 52
NO: 53

MUC1
25
mucin 1, transmembrane

SEQ ID
SEQ ID

NO: 57
NO: 58

GATA3
32
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 76
NO: 77
NO: 78

GATA3
41
gata-binding protein 3
SEQ ID
SEQ ID
SEQ ID

NO: 100
NO: 101
NO: 78

BCL2
48
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 115
NO: 116
NO: 117

GATA3
51
gata-binding protein 3
SEQ ID

SEQ ID

NO: 122

NO: 78

CRABP2
64
cellular retinoic acid-binding protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 156
NO: 157
NO: 158

ANG
81
angiogenin, ribonuclease, rnase a family, 5

SEQ ID
SEQ ID

NO: 194
NO: 195

EGF
83
epidermal growth factor (beta-urogastrone)
SEQ ID

SEQ ID

NO: 199

NO: 200

THBS1
91
thrombospondin 1
SEQ ID

SEQ ID

NO: 216

NO: 217

SMARCA2
99
swi/snf related, matrix associated, actin dependent
SEQ ID
SEQ ID
SEQ ID

regulator of chromatin, subfamily a, member 2
NO: 235
NO: 236
NO: 237

EGF
110
epidermal growth factor (beta-urogastrone)
SEQ ID

SEQ ID

NO: 262

NO: 200

BIRC4
116
baculoviral iap repeat-containing 4
SEQ ID

SEQ ID

NO: 273

NO: 274

BCL2
142
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 337
NO: 338
NO: 117

BS69
144
adenovirus 5 ela binding protein
SEQ ID
SEQ ID
SEQ ID

NO: 342
NO: 343
NO: 344

MYB
149
v-myb avian myeloblastosis viral oncogenc homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

XBP1
162
x-box binding protein 1
SEQ ID
SEQ ID
SEQ ID

NO: 385
NO: 386
NO: 387

BCL2
167
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 399
NO: 400
NO: 117

ILF1
171
interleukin enhancer binding factor 1
SEQ ID
SEQ ID
SEQ ID

NO: 406
NO: 407
NO: 408

ARHGDIA
172
rho gdp dissociation inhibitor (gdi) alpha
SEQ ID
SEQ ID
SEQ ID

NO: 409
NO: 410
NO: 411

C4A
173
complement component 4a
SEQ ID

SEQ ID

NO: 412

NO: 413

ESR1
176
estrogen receptor 1
SEQ ID
SEQ ID
SEQ ID

NO: 420
NO: 421
NO: 422

PBX1
177
pre-b-cell leukemia transcription factor 1
SEQ ID
SEQ ID
SEQ ID

NO: 423
NO: 424
NO: 425

GLI3
178
gli-kruppel family member gli3 (greig
SEQ ID
SEQ ID
SEQ ID

cephalopolysyndactyly syndrome)
NO: 426
NO: 427
NO: 428

ILF1
179
interleukin enhancer binding factor 1
SEQ ID

SEQ ID

NO: 429

NO: 408

ESTs H24628
184

Homo sapiens aminoacylase 1 (ACY1).
SEQ ID
SEQ ID

& H24592

NO: 435
NO: 436

EST H28056
185

Homo sapiens E74-like factor 1 (ets domain
SEQ ID

transcription factor) (ELF1) |
NO: 437

TABLE 9B

Table 9B

Gene
SET

symbol
No
Name
Seq3′
Seq5′
Ref

GATA3
51
gata-binding protein 3
SEQ ID

SEQ ID

NO: 122

NO: 78

CRABP2
64
cellular retinoic acid-binding protein 2
SEQ ID
SEQ ID
SEQ ID

NO: 156
NO: 157
NO: 158

ANG
81
angiogenin, ribonuclease, rnase a family, 5

SEQ ID
SEQ ID

NO: 194
NO: 195

EGF
83
epidermal growth factor (beta-urogastrone)
SEQ ID

SEQ ID

NO: 199

NO: 200

THBS1
91
thrombospondin 1
SEQ ID

SEQ ID

NO: 216

NO: 217

SMARCA2
99
swi/snf related, matrix associated, actin dependent
SEQ ID
SEQ ID
SEQ ID

regulator of chromatin, subfamily a, member 2
NO: 235
NO: 236
NO: 237

EGF
110
epidermal growth factor (beta-urogastrone)
SEQ ID

SEQ ID

NO: 262

NO: 200

BIRC4
116
baculoviral iap repeat-containing 4
SEQ ID

SEQ ID

NO: 273

NO: 274

BCL2
142
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 337
NO: 338
NO: 117

BS69
144
adenovirus 5 ela binding protein
SEQ ID
SEQ ID
SEQ ID

NO: 342
NO: 343
NO: 344

MYB
149
v-myb avian myeloblastosis viral oncogene homolog

SEQ ID
SEQ ID

NO: 354
NO: 355

XBP1
162
x-box binding protein 1
SEQ ID
SEQ ID
SEQ ID

NO: 385
NO: 386
NO: 387

BCL2
167
b-cell cll/lymphoma 2
SEQ ID
SEQ ID
SEQ ID

NO: 399
NO: 400
NO: 117

ILF1
171
interleukin enhancer binding factor 1
SEQ ID
SEQ ID
SEQ ID

NO: 406
NO: 407
NO: 408

ARHGDIA
172
rho gdp dissociation inhibitor (gdi) alpha
SEQ ID
SEQ ID
SEQ ID

NO: 409
NO: 410
NO: 411

C4A
173
complement component 4a
SEQ ID

SEQ ID

NO: 412

NO: 413

ESR1
176
estrogen receptor 1
SEQ ID
SEQ ID
SEQ ID

NO: 420
NO: 421
NO: 422

PBX1
177
pre-b-cell leukemia transcription factor 1
SEQ ID
SEQ ID
SEQ ID

NO: 423
NO: 424
NO: 425

GLI3
178
gli-kruppel family member gli3 (greig
SEQ ID
SEQ ID
SEQ ID

cephalopolysyndactyly syndrome)
NO: 426
NO: 427
NO: 428

ILF1
179
interleukin enhancer binding factor 1
SEQ ID

SEQ ID

NO: 429

NO: 408

ESTs H24628
184

Homo sapiens aminoacylase 1 (ACY1).
SEQ ID
SEQ ID

& H24592

NO: 435
NO: 436

EST H28056
185

Homo sapiens E74-like factor 1 (ets domain
SEQ ID

transcription factor) (ELF1) |
NO: 437

So, a preferred DNA array comprises at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequences indicated in Table 9A and at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequences indicated in Table 9B.

Such DNA arrays are particularly useful to distinguish patients having a high risk (bad result) from those having a good prognosis (good result).

REFERENCES

1. DeRisi, J., Penland, L., Brown, P. O., Bittner, M. L., Meltzer, P. S., Ray, M., Chen, Y., Su, Y. A., and Trent, J. M. (1996) Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet, 14, 457-460.

2. Jordan, B. R. (1998) Large-scale expression measurement by hybridization methods: from high-density membranes to “DNA chips”. J. Biochem (Tokyo), 124, 251-258.

3. Nguyen, C., Rocha, D., Granjeaud, S., Baldit, M., Bernard, K., Naquet, P., and Jordan, B. R. (1995) Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones. Genomics, 29, 207-216.

4. Bertucci, F., Van Hulst, S., Bernard, K., Loriod, B., Granjeaud, S., Tagett, R., Starkey, M., Nguyen, C., Jordan, B., and Birnbaum, D. (1999) Expression scanning of an array of growth control genes in human tumor cell lines. Oncogene, 18, 3905-3912.

5. Bertucci, F., Bernard, K., Loriod, B., Chang, Y. C., Granjeaud, S., Birnbaum, D., Nguyen, C., Peck, K., and Jordan, B. R. (1999) Sensitivity issues in DNA array-based expression measurements and performance of nylon microarrays for small samples [In Process Citation]. Hum Mol Genet, 8, 1715-1722.

6. Ross, J. S. and Fletcher, J. A. (1999) The HER-2/neu oncogene: prognostic factor, predictive factor and target for therapy. Semin Cancer Biol, 9, 125-138.

7. Scorilas, A., Trangas, T., Yotis, J., Pateras, C., and Talieri, M. (1999) Determination of c-myc amplification and overexpression in breast cancer patients: evaluation of its prognostic value against c-erbB-2, cathepsin-D and clinicopathological characteristics using univariate and multivariate analysis. Br J Cancer, 81, 1385-1391.

8. Fox, S. B., Smith, K., Hollyer, J., Greenall, M., Hastrich, D., and Harris, A. L. (1994) The epidermal growth factor receptor as a prognostic marker: results of 370 patients and review of 3009 patients. Breast Cancer Res Treat, 29, 4F-49.

9. Heimann, R., Lan, F., McBride, R., and Hellman, S. (2000) Separating favorable from unfavorable prognostic markers in breast cancer: the role of E-cadherin. Cancer Res, 60, 298-304.

10. Guerin, M., Sheng, Z. M., Andrieu, N., and Riou, G. (1990) Strong association between c-myband oestrogen-receptor expression in human breast cancer. Oncogene , 5, 131-135.

11. Lim, K. C., Lakshmanan, G., Crawford, S. E., Gu. Y., Grosveld, F., and Douglas Engel, J. (2000) Gata 3 loss leads to embryonic lethality due to noradrenaline deficiency of the sympathetic nervous system. Nat Genet, 25, 209-212.

12. Mills, K. J., Vollberg, T. M., Nervi, C., Grippo, J. F., Dawson, M. I., and Jetten, A. M. (1996) Regulation of retinoid-induced differentiation in embryonal carcinoma PCC 4.azalR cells: effects of retinoid-receptor selective ligands. Cell Growth Differ, 7, 327-337.

13. Easty, D. J., Hill, S. P., Hsu, M. Y., Fallowfield, M. E., Florenes, V. A., Herlyn, M., and Bennett, D. C. (1999) Up-regulation of ephrin0A1 during melanoma progression. Int J Cancer, 84, 494-501.

14. Shim, C. Zhang, W., Rhee, C. H., and Lee, J. H. (1998) Profiling of differentially expressed genes in human primary cervical cancer by complementary DNA expression array. Clin Cancer Res, 4, 3045-3050.

15. Tsou, A. P., Wu, K. M., Tsen, T. Y., Chi, C. W., Chiu, J. H., Lui, W. Y., Flu, C. P., Chang, C., Chou, C. K., and Tsai, S. F. (1998) Parallel hybridization analysis of multiple protein kinase genes: identification of gene expression patterns characteristic of human hepatocellular carcinoma. Genomics. 50, 331-340.

16. Schummer, M., Ng, W. V., Bumgamer, R. E., Nelson, P. S., Schummer, B., Bednarski, D. W., Hassell, L., Baldwin, R. L., Karlan, B. Y., and Hood, L. (1999) Comparative hybridization of an array of 21,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas. Gene, 238, 375-385.

17. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 96, 6745-6750.

18. Moth, H., Schraml, P., Bubendorf, L., Mirlacher, M., Kononen, J., Gasser, T., Mihatsch, M. J., Kallioniemi, O. P., and Sauter, G. (1999) High-throughput tissue microarray analysis to evaluate genes uncovered by cDNA microarray screening in renal cell carcinoma. Am J Pathol, 154, 981-986.

19. Rhee, C. H., Hess, K., Jabbur, J., Ruiz, M., Yang. Y., Chen, S., Chenchik, A., Fuller, G. N., and Zhang, W. (1999) cDNA expression array reveals heterogeneous gene expression profiles in three glioblastoma cell lines. Oncogene, 18, 2711-2717.

20. Huang, F., Adelman, J., Jiang, H., Goldstein, N. I., and Fisher, P. B. (1999) Identification and temporal expression pattern of genes modulated during irreversible growth arrest and terminal differentiation in human melanoma cells. Oncogene, 18, 3546-3552.

21. Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E. Hendrix, M., Radmacher, M. Simon, R. Yakhini, Z., Ben-Dor, A., Sampas, N., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Carpten, J., Gillanders, E., Leja, D., Dietrich, K., Beaudry, C., Berens, M., Alberts, D., and Sondak, V. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406, 536-540.

22. Khan, J., Simon, R., Bittner, M., Chen, Y., Leighton, S. B., Pohida, T., Smith, P. D., Jiang, Y., Gooden, G. C. Trent, J. M., and Meltzer, P. S. (1998) Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res, 58, 5009-5013.

23. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.

24. Alizadeh, A. A., Eisen, M. B., Davis, It. E., Ma, C., Lossos, I. S, Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, J., Jr., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., and Staudt, L. M. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [In Process Citation]. Nature, 403, 503-511.

25. Hoch, R. V., Thompson, D. A., Baker, R. J., and Weigel, R. J. (1999) GATA-3 is expressed in association with estrogen receptor in breast cancer. Int J Cancer, 84, 122-128.

26. Hilsenbeck. S. G., Friedrichs, W. E., Schiff, R., O. degree. Connell, P., Hansen, R. K., Osborne, C. K., and Fuqua, S. A. (1999) Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J Natl Cancer Inst, 91, 453-459.

27. Martin, K. J., Kritzman, B. M., Price, L. M., Koh, B., Kwan, C. P., Zhang, X., Mackay, A., O'Hare, M. J., Kaelin, C. M., Mutter, G. L., Pardee, A. B., and Sager, R. (2000) Linking gene expression patterns to therapeutic groups in breast cancer. Cancer Res, 60, 2232-2238.

28. Yang, G. P., Ross, D. T., Kuang, W. W., Brown, P. O., and Weigel, R. J. (1999) Combining SSH and cDNA microarrays for rapid identification of differentially expressed genes. Nucleic Acids Res, 27, 1517-1523.

29. Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., Pergamenschikov, A., Williams, C. F., au, S. X., Lee, J. C., Lashkari, D., Shalon, D., Brown, P. O., and Botstein, D. (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA, 96, 9212-9217.

30. Nacht, M., Ferguson, A. T., Zhang, W., Petroziello, J. M., Cook, B. P., Gao, Y. H., Maguire, S., Riley, D., Coppola, G., Landes, G. M., Madden, S. L., and Sukumar, S. (1999) Combining serial analysis of gene expression and array technologies to identify genes differentially expressed in breast cancer. Cancer Res, 59, 5464-5470.

31. Sgroi, D. C., Teng, S., Robinson, G., LeVangie, R., Hudson, J. R., Jr., and Elkahloun, A. G. (1999) In vivo gene expression profile analysis of human breast cancer progression. Cancer Res, 59, 5656-5661.

32. Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale, A. L., Brown, P. O., and Botstein, D. (2000) Molecular portraits of human breast tumours. Nature, 406, 747-752.

33. Hahnel, E., Harvey, J. M., Joyce, R., Robbins, P. D., Sterrett, G. F., and Hahnel, R. (1993) Stromelysin-3 expression in breast cancer biopsies: clinico-pathological correlations. Int J. Cancer. 55, 771-774.

34. Skoog, L., Humla, S., Klintenberg, C., Pasqual, M., and Wallgren, A. (1985) Receptors for retinoic acid and retinol in human mammary carcinomas. Eur J Cancer Clin Oncol, 21, 901-906.

35. Thor, A. D., Moore, D. R, I I, Edgerton, S. M., Kawasaki, E. S., Reihsaus, E., Lynch, H. T., Marcus, J. N., Schwartz, L., Chen, L. C., Mayall, B. H., and et al. (1992) Accumulation of p 53 tumor suppressor gene protein: an independent marker of prognosis in breast cancers. J Natl Cancer Inst, 84, 845-855.

36. Allred, D. C., Harvey, J. M., Berardo, M., and Clark, G. M. (1998) Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod Pathol , 11, 155-168.

37. Spencer, K. S., Graus-Porta, D., Leng, J., Hynes, N. E., and Klemke, R. L. (2000) ErbB 2 is necessary for induction of carcinoma cell invasion by ErbB family receptor tyrosine kinases. J Cell Biol, 148, 385-397.

38. Behrens, J. (1993) The role of cell adhesion molecules in cancer invasion and metastasis. Breast Cancer Res Treat, 24, 175-184;

39. Roberts, D. D. (1996) Regulation of tumor growth and metastasis by thrombospondin-1. Faseb J, 10, 1183-1191.

40. Taylor-Papadimitriou, J., Burchell, J., Miles, D. W., and Dalziel, M. (1999) MUCI and cancer. Biochim Biophys Acta, 1455, 301-313.

41. Sneath. R. J. and Mangham, D. C. (1998) The normal structure and function of CD44 and its role in neoplasia. Mol Pathol, 51, 191-200.

42. Iyer, V. R., Eisen, M. B., Ross, D. T., Schuler, G., Moore, T., Lee, J. C. F., Trent, J. M., Staudt, L. M., Hudson, J., Jr., Boguski, M. S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P. O. (1999) The transcriptional program in the response of human fibroblasts to serum. Science, 283, 83-87.

43. Theillet, C., Adelaide, J., Louason, G., Bonnet-Dorion, F., Jacquemier, J., Adnane, J., Longy, M., Katsaros, D., Sismondi, P., Gaudray, P., and et al. (1993) FGFRI and PLAT genes and DNA amplification at 8p12 in breast and ovarian cancers. Genes Chromosomes Cancer, 7, 219-226.

44. Granjeaud, S., Nguyen, C., Rocha, D., Luton, R., and Jordan, B. R. (1996) From hybridization image to numerical values: a practical, high throughput quantification system for high density filter hybridizations. Genet Anal, 12, 151-162.

45. Eisen, M. B., Spellman, P. T., Brown, P. 0., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA , 95, 14863-14868.

46. Ferrari, S., Battini, R., and Cossu, G. (1990) Differentiation-dependent expression of apolipoprotein A-I in chicken myogenic cells in culture. Dev Biol, 140, 430-436.

Sequence CWU 0

	Number	Date	Country
Parent	10007926	Dec 2001	US
Child	12903594		US

GENE EXPRESSION PROFILING OF PRIMARY BREAST CARCINOMAS USING ARRAYS OF CANDIDATE GENES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

RELATED APPLICATIONS

Provisional Applications (1)

Divisions (1)