This disclosure relates to polynucleotide analysis and, in particular, to polynucleotide expression profiling of carcinomas using arrays of candidate polynucleotides.
Pathologists and clinicians in charge of the management of breast cancer patients are facing two major problems, namely the extensive heterogeneity of the disease and the lack of factors—among conventional histological and clinical features—predicting with reliability the evolution of the disease and its sensitivity to cancer therapies. Breast tumors of the same apparent prognostic type vary widely in their responsiveness to therapy and consequent survival of the patient. New prognostic and predictive factors are needed to allow an individualization of therapy for each patient.
Great hope is currently being placed on molecular studies, which address the problem in a global fashion. Methods such as cytogenetics, comparative genomic hybridization, and whole-genome allelotyping have addressed the issue at the genome level. Currently, the modifications that take place in human tumors at the level of transcription can also be studied in a large, unprecedented scale, using new methods such as cDNA arrays that allow quantitative measurement of the mRNA expression levels of many genes simultaneously. Thus, it would be advantageous to provide a means to assess the capacity of cDNA array testing-in clinical practice to better classify an heterogeneous cancer into tumor subtypes with more homogeneous clinical outcomes, and to identify new potential prognostic factors and therapeutics targets.
We provide a method for the molecular characterization of a carcinoma comprising the steps of:
(i) detecting in tumor cells corresponding to breast tumor cells at least one polynucleotide selected from a first group comprising: -EST T89980 (SEQ ID No: 16), -SOX 4 (SEQ ID No: 22, SEQ ID No: 23, SEQ ID No: 24), -ENPP2 (SEQ ID No: 39, SEQ ID No: 40, SEQ ID No. 41), -MUC 1 (SEQ ID No: 57, SEQ ID No: 58), -GATA3 (SEQ ID No: 76, SEQ ID No: 77, SEQ ID No: 78), -TOP2B (SEQ ID No: 82. SEQ ID No: 83), -IL2RB (SEQ ID No: 97, SEQ ID No: 98, ID No: 99), -ERBB2 (SEQ ID No: 118, SEQ ID No: 119), -EGFR (SEQ ID No: 135, SEQ ID No: 136, SEQ ID No: 137), -THBS1 (SEQ ID No: 216, SEQ ID No: 217), -PPP2R2C (SEQ ID No: 238, SEQ ID No: 239), -ATF3 (SEQ ID No: 250, SEQ ID No: 251, SEQ ID No: 252), -KIAA1075 (SEQ ID No: 322, SEQ ID No: 323), -CDH1 (SEQ ID No: 326, SEQ ID No: 327, SEQ ID No: 328); -ZNF144 (SEQ ID N6: 329, SEQ ID No: 330), -GSTP1 (SEQ ID No: 334. SEQ ID No: 335, SEQ ID No: 336), -CD44 (SEQ ID No: 374, SEQ ID No: 375, SEQ ID No: 376), -GZMA (SEQ ID No: 402, SEQ ID No: 403), -EST T80406 (SEQ ID No: 430), and -ESTs H30141 & H27466 (SEQ ID No: 438, SEQ ID No: 439) determining the expression level of the at least one polynucleotide from the first group to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell;
(ii) detecting in tumor cells corresponding to breast tumor cells at least one polynucleotide selected from a second group comprising: -SOX4 11 (SEQ ID No: 22, SEQ ID No: 23, SEQ ID No: 24), -CSF1 (SEQ ID No: 48, SEQ ID No: 49, SEQ ID No: 50), -VIL2 (SEQ ID No: 51, SEQ ID No: 52, SEQ ID No: 53), -IGF2 (SEQ ID No: 59, SEQ ID No: 60, SEQ ID No: 61), -KIAA0427 (SEQ ID No: 65, SEQ ID No: 66, SEQ ID No: 67), -MYC (SEQ ID No: 73, SEQ ID No: 74, SEQ ID No: 75), -GATA3 (SEQ ID No: 76, SEQ ID No: 77, SEQ ID No: 78), -TOP2B (SEQ ID No: 82, SEQ ID No: 83), -ERBB2 (SEQ ID No: 118, SEQ ID No: 119), -EGFR (SEQ ID No: 135, SEQ ID No: 136, SEQ ID No: 137), -CRABP2 (SEQ ID No: 156, SEQ ID No: 157, SEQ ID No: 158), -GZMB 73 (SEQ ID No: 178, SEQ ID No: 179), -IGKC (SEQ ID No: 186), -ANG (SEQ ID No: 194, SEQ ID No: 195), -EFNA1 (SEQ ID No: 226, SEQ ID No: 227), -MYBL2 (SEQ ID No: 308, SEQ ID No: 309, SEQ ID No: 310), CDH1 (SEQ ID No: 326, SEQ ID No: 327, SEQ ID No: 328), -MST1 (SEQ ID No: 331, SEQ ID No: 332, SEQ ID No: 333), -MYB (SEQ ID No: 354, SEQ ID No: 355), -XBP1 (SEQ ID No: 385, SEQ ID No: 386, SEQ ID No: 387), -SRF (SEQ ID No: 391, SEQ ID No: 392, SEQ ID No: 393), -SOX9 (SEQ ID No: 394, SEQ ID No: 395), and -ESTs H21879 & H21880 (SEQ ID No: 433, SEQ ID No: 434) determining the expression level of the at least one polynucleotide from the second group to distinguish tumors sensitive to anthracycline from tumors insensitive to anthracycline;
(iii) detecting in tumor cells corresponding to breast tumor cells at least one polynucleotide selected from a third group comprising: -CTSB (SEQ ID No: 30, SEQ ID No: 31), -VIL2 (SEQ ID No: 51, SEQ ID No: 52, SEQ ID No: 53), -MUC1 (SEQ ID No: 57, SEQ ID No: 58), -EMR1 (SEQ ID No: 62, SEQ ID No: 63, SEQ ID No: 64), -KIAA0427 (SEQ ID No: 65, SEQ ID No: 66, SEQ ID No: 67), -GATA3 (SEQ ID No: 76, SEQ ID No: 77, SEQ ID No: 78), -PRLR 39 (SEQ ID No: 94, SEQ ID No: 95, SEQ ID No: 96), -GATA3 (SEQ ID No: 100, SEQ ID No: 101, SEQ ID No: 78), -TC21 (SEQ ID No: 106, SEQ ID No: 107, SEQ ID No: 108), -BCL2 (SEQ ID No: 115, SEQ ID No: 116, SEQ ID No: 117), -CRABP2 (SEQ ID No: 156, SEQ ID No: 157, SEQ ID No: 158), -ANG (SEQ ID No: 194, No: 195), -EGF (SEQ ID No: 199, SEQ ID No: 200), -THBS 1 (SEQ ID No: 216, SEQ ID No: 217), -EDNRA (SEQ ID No: 228, SEQ ID No: 229), -SMARCA2 (SEQ ID No: 235, SEQ ID No: 236, SEQ ID No: 237), ABCB1 (SEQ ID No: 257, SEQ ID No: 258), -BIRC4 (SEQ ID No: 273, SEQ ID No: 274), -DAPS (SEQ ID No: 275, SEQ ID No: 276), -GNRH1 (SEQ ID No: 277, SEQ ID No: 278), -EST 897218 (SEQ ID No: 296, SEQ ID No: 297), -BS69 (SEQ ID No: 342, SEQ ID No: 343, SEQ ID No: 344), -MYB (SEQ ID No: 354, SEQ ID No: 355), -CTSB (SEQ ID No: 361, SEQ ID No: 31), -MLANA (SEQ ID No: 362, SEQ ID No: 363, SEQ ID No: 364), -APR-1 (SEQ ID No: 365, SEQ ID No: 366, SEQ ID No: 367), -CDKN3 (SEQ ID No: 377, SEQ ID No: 378, SEQ ID No: 379), -XBP1 (SEQ ID No: 385, SEQ ID No: 386, SEQ ID No: 387), -CDH15 (SEQ ID No: 396, SEQ ID No: 397, SEQ ID No: 398), -EST W73386 168 ests (SEQ ID No: 401), -ILF1 (SEQ ID No: 406, SEQ ID No: 407, SEQ ID No: 408), -ARHGDIA (SEQ ID No: 409, SEQ ID No: 410, SEQ ID No: 411), -C4A (SEQ 1D No: 412, SEQ ID No: 413), -ESR1 (SEQ ID No: 420, SEQ ID No: 421, SEQ ID No: 422), -PBX1 (SEQ ID No: 423, SEQ ID No: 424, SEQ ID No: 425), -GLI3 (SEQ ID No: 426, SEQ ID No: 427, SEQ ID No: 428), -ESTs 1-124628 & H24592 (SEQ ID No: 435, SEQ ID No: 436), and -EST H28056 (SEQ ID No: 437) determining the expression levels of the at least one polynucleotide from the third group to classify good and poor prognosis primary breast tumors.
In the context of this disclosure, a number of terms shall be utilized.
The term “polynucleotide” refers to a polymer of RNA or DNA that is single-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. A polynucleotide in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.
The term “subsequence” refers to a sequence of nucleic acids that comprises a part of a longer sequence of nucleic acids.
The term “immobilized on a support” means bound directly or indirectly thereto including attachment by covalent binding, hydrogen bonding, ionic interaction, hydrophobic interaction or otherwise.
Breast cancer is characterized by an important histoclinical heterogeneity that currently hampers the selection of the most appropriate treatment for each case. This problem could be solved by the identification of new parameters that better predict the natural history of the disease and its sensitivity to treatment. An important object of this disclosure relates to a large-scale molecular characterization of breast cancer that could help in prediction, prognosis and cancer treatment.
An important aspect of this disclosure relates to the use of cDNA arrays, which allows quantitative study of mRNA expression levels of 188 candidate genes in 34 consecutive primary breast carcinomas in three areas: comparison of tumor samples, correlations of molecular data with conventional histoclinical prognostic features and gene correlations. The experimentation evidenced extensive heterogeneity of breast tumors at the transcriptional level. Hierarchical clustering algorithm identified two molecularly distinct subgroups of tumors characterized by a different clinical outcome after chemotherapy. This outcome could not have been predicted by the commonly used histoclinical parameters. No correlation was found with the age of patients, tumor size, histological type and grade. However, expression of genes was differential in tumors with lymph node metastasis and according to the estrogen receptor status; ERBB2 (SEQ ID No: 119) expression was strongly correlated with the lymph node status (p≦0.0001) and that of GA TA 3 (SEQ ID No: 78) with the presence of estrogen receptors (p≦0.001). Thus, experimental results identified new ways to group tumors according to outcome and new potential targets of carcinogenesis. They show that the systematic use of cDNA array testing holds great promise to improve the classification of breast cancer in terms of prognosis and chemosensitivity and to provide new potential therapeutic targets.
DNA arrays consist of large numbers of DNA molecules spotted in a systematic order on a solid support or substrate such as a nylon membrane, glass slide, glass beads, a membrane on a glass support, or a silicon chip. Depending on the size of each DNA spot on the array, DNA arrays can be categorized as microarrays (each DNA spot has a diameter less than 250 microns) and macroarrays (spot diameter is greater than 300 microns). When the solid substrate used is small in size, arrays are also referred to as DNA chips. Depending on the spotting technique used, the number of spots on a glass microarray can range from hundreds to thousands.
DNA microarrays serve a variety of purposes, including gene expression profiling, de novo gene sequencing, gene mutation analysis, gene mapping and genotyping. cDNA microarrays are printed with distinct cDNA clones isolated from cDNA libraries. Therefore, each spot represents, an expressed gene, since it is derived from a distinct mRNA.
Typically, a method of monitoring gene expression involves (1) providing a pool of sample polynucleotides comprising RNA transcript(s) of one or more target gene(s) or nucleic acids derived from the RNA transcript(s); (2) reacting, such as hybridizing the sample polynucleotide to an array of probes (for example, polynucleotides obtained from a polynucleotide library) (including control probes) and (3) detecting the reacted/hybridized polynucleotides. Detection can also involve calculating/quantifying a relative expression (transcription) level.
We provide a polynucleotide library useful in the molecular characterization of a carcinoma, said library comprising a pool of polynucleotide sequences or subsequences thereof wherein said sequences or subsequences are either underexpressed or overexpressed in tumor cells, further wherein said sequences or subsequences correspond substantially to any of the polynucleotide sequences set forth in any of SEQ ID Nos: 1-468 in annex or the complement thereof.
Obviously, complementary sequences (“complements”) having a great degree of homology with the above sequences could also be used to realize our molecular characterization, namely when those sequences present one or a few punctual mutations when compared with any one of the sequences represented by SEQ ID Nos: 1-468.
A particular embodiment of this disclosure relates to a polynucleotide library of sequences or subsequences corresponding substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets 1 to 188 as defined in Table 4.
A polynucleotide sequence library useful for our realization can comprise also any sequence comprised between 3′end and 5′end of each polynucleotide sequence set as defined in Table 4, allowing the complete detection of the implicated gene.
We also provide a polynucleotide library useful to differentiate a normal cell from a cancer cell wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequences sets indicated in Table 5, useful in differentiating a normal cell from a cancer cell.
Preferably the polynucleotide library useful to differentiate a normal cell from a cancer cell corresponds substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5A, and of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5B.
The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 5A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 5B allows distinction between normal patients and patients suffering from tumor pathology.
We further provide a polynucleotide library useful to detect a hormone-sensitive tumor cell wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6.
Preferably the polynucleotide library useful to detect a hormone-sensitive tumor cell correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6B.
The detection of an overexpression of genes identified with sets of polynucleotides sequences defined in Table 6A, together with detection of an underexpression of genes identified with sets of polynucleotides sequences defined in Table 6B allows distinction between patients having a hormone-sensitive tumor and patients having a hormone-resistant tumor.
We also provide a polynucleotide library useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one or predefined polynucleotide sequence sets defined in Table 7.
Preferably, the polynucleotide library useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7B.
The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 7A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 7B allows distinction between patients having a tumor in which a lymph node has been invaded by a tumor cell and patients having a tumor in which a lymph node has not been invaded by a tumor cell.
We further provide a polynucleotide library useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8.
Preferably, the polynucleotide library useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8B.
The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 8A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 8B allows distinction between patients having an anthracycline-sensitive tumor from patients having an anthracycline-insensitive tumor.
We provide a polynucleotide library useful to classify good and poor prognosis primary breast tumors wherein the pool of polynucleotide sequences or subsequences correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9.
Preferably, the polynucleotide library useful to classify good and poor prognosis primary breast tumors correspond substantially to any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9B.
The detection of an overexpression of genes identified with sets of polynucleotide sequences defined in Table 9A, together with detection of an underexpression of genes identified with sets of polynucleotide sequences defined in Table 9B allows to classify patients having good or poor prognosis primary breast tumors.
In a preferred embodiment, the tumor cell presenting underexpressed or overexpressed sequences from our polynucleotide library are breast tumor cells.
In a particular embodiment our polynucleotides of the polynucleotide library are immobilized on a solid support in order to form a polynucleotide array, and said solid support is selected from the group consisting of a nylon membrane, nitrocellulose membrane, glass slide, glass beads, membranes on glass support or a silicon chip.
Another object of ours concerns a polynucleotide array useful for prognosis or diagnosis of a tumor bearing at least one immobilized polynucleotide library set as previously defined.
We also provide a polynucleotide array useful to differentiate a normal cell from a cancer cell bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5, useful in differentiating a normal cell from a cancer cell.
Preferably the polynucleotide array useful to differentiate a normal cell from a cancer cell bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5A, and of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets indicated in Table 5B.
This disclosure relates also to a polynucleotide array useful to detect a hormone-sensitive tumor cell bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6.
Preferably the polynucleotide array useful to detect a hormone-sensitive tumor cell bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 6B.
We also provide a polynucleotide array useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has not been invaded by a tumor cell bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7.
Preferably, the polynucleotide array useful to differentiate a tumor in which a lymph node has been invaded by a tumor cell from a tumor in which a lymph node has been invaded by a tumor cell bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 7B.
We also provide a polynucleotide array useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8.
Preferably, the polynucleotide array useful to differentiate anthracycline-sensitive tumors from anthracycline-insensitive tumors bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 8B.
This disclosure concerns also a polynucleotide array useful to classify good and poor prognosis primary breast tumors bearing any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence set defined in Table 9.
Preferably, the polynucleotide array useful to classify good and poor prognosis primary breast tumors bears any combination of at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9A together with at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequence sets defined in Table 9B.
We also provide a method for detecting differentially expressed polynucleotide sequences that are correlated with a cancer, said method comprising:
obtaining a polynucleotide sample from a patient;
reacting the polynucleotide sample obtained in step (a) with a probe immobilized on a solid support wherein said probe comprises any of the polynucleotide sequences of the libraries previously defined or an expression product encoded by any of the polynucleotide sequences of the libraries previously defined; and
detecting the reaction product of step (b).
Preferably, the polynucleotide sample obtained at step (a) is labeled before its reaction at step (b) with the probe immobilized on a solid support.
The label of the polynucleotide sample is selected from the group consisting of radioactive, colorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels.
In a particular embodiment the reaction product of step (c) is quantified by further comparison of said reaction product to a control sample.
In a first embodiment, the polynucleotide sample isolated from the patient and obtained at step (a) is either RNA or mRNA.
In another embodiment the polynucleotide sample isolated from the patient is cDNA is obtained by reverse transcription of the mRNA.
Preferably the reaction step (b) of the method for detecting differentially expressed polynucleotide sequences comprises a hybridization of the sample RNA issued from patient with the probe.
Preferably the sample RNA is labeled before hybridization with the probe and the label is selected from the group consisting of radioactive, calorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels.
This method for detecting differentially expressed polynucleotide sequences is particularly useful for detecting, diagnosing, staging, monitoring, predicting, preventing or treating conditions associated with cancer, and particularly breast cancer.
The method for detecting differentially expressed polynucleotide sequences is also particularly useful when the product encoded by any of the polynucleotide sequence or subsequence set is involved in a receptor-ligand reaction on which detection is based.
This disclosure is also related to a method for screening an anti-tumor agent comprising the above-depicted method for detecting differentially expressed polynucleotide sequences wherein the sample has been treated with the anti-tumor agent to be screened.
In a particular embodiment the method for screening an anti-tumor agent comprises detecting polynucleotide sequences reacting with at least one library of polynucleotides or polynucleotide sequence set as previously defined or of products encoded by said library in a sample obtained from a patient.
To avoid any bias of selection as to the type and size of the tumors, the RNAs to be tested were prepared from unselected samples. Samples of primary invasive breast carcinomas were collected from 34 patients undergoing surgery at the Institute Paoli-Calmette. After surgical resection, the tumors were macrodissected: a section was taken for the pathologist's diagnosis and an adjacent piece was quickly frozen in liquid nitrogen for molecular analyses. The median age of patients at the time of diagnosis was 55 years (range 39, 83) and most of them were post-menopausal. Tumors were classified according to the WHO histological typing of breast tumors in: 29 ductal carcinomas, 2 lobular carcinomas, 1 mixed ductal and lobular carcinoma, and 2 medullar carcinomas. They had various sizes, inferior or equal to 20 mm (n=13), between 20 and 50 mm (n=18) or superior to 50 mm (n=3), axillary's lymph node status (negative: 19 tumors, positive: 15 tumors), SBR grading (I: 3 tumors, II: 20 tumors, III: 10 tumors, not evaluable: 1 tumor), and estrogen receptor status (ER) evaluated by immunohistochemical assay (23 ER-positive. 11 ER-negative). ER positivity cutoff value was 10%. Adjuvant treatment with radiotherapy and when necessary multi-agent anthracycline-based chemotherapy (n=16) was given to patients according to local practice.
Total RNA was extracted from tumor samples by standard methods (43). Total RNA from normal breast tissue was obtained from Clontech (Palo Alto, Calif.): RNA was isolated from 8 tissue specimens from Caucasian females, age range 23-47. RNA integrity was controlled by denaturing formaldehyde agarose gel electrophoresis and Northern blots using a 28S-specific oligonucleotide.
cDNA Arrays Preparation
Gene expression was analyzed by hybridization of arrays with radioactive probes. The arrays contained PCR products of 5 control clones, and 180 IMAGE human cDNA clones selected with practical criteria (3′ sequence of mRNA, same cloning vector, host bacteria and insert size). This represented 176 genes (4 genes were represented by 2 different clones): 121 with proven or putative implication in cancer and 55 implicated in immune reactions. Their identity was verified by 5′ tag-sequencing of plasmid DNA and comparison with sequences in the EST (dbEST) and nucleotide (GenBank) databases at the NCBI. Identity was confirmed for all but 14 clones without significant gene similarity, which were referenced by their GenBank accession number. The control clones were: Arabidopsis thaliana cytochrome c554 gene (used for hybridization signal normalization), 3 poly(A) sequences of different sizes and the vector pT7T3D (negative controls).
PCR amplification, purification and, robotical spotting of PCR, products: onto Hybond-N+membranes (Amersham) were done according to described protocols (4). All PCR products were spotted in duplicate. For normalization purpose, the c554 gene was spotted 96-fold scattered over the whole membrane.
cDNA Array Hybridizations
Hybridizations were done successively with a vector oligonucleotide (to precisely determine the amount of target DNA accessible to hybridization in each spot), then after stripping of vector probe, with complex probes made from the RNAs (4). Each complex probe was hybridized to a distinct filter. Probes were prepared from total RNA with an excess of oligo(dT25) to saturate the poly(A) tails of the messengers, and to insure that the reverse transcribed product did not contain long poly(T) sequences. A precise amount of c554 mRNA was added to the total RNA before labeling to allow normalization of the data.
Five ng of total RNA (−100 ng of mRNA) from tissue samples were used for each labeling. Probe preparation and hybridization of the membranes were done according to known procedures (http:/tagc.univ-mrs.fr/pub/Cancer/). Hybridization was done in excess of target (−15 ng of DNA in each spot) and binding of cDNAs to the targets was linear and proportional to the quantity of cDNA in the probe.
Detection and Quantification of cDNA Array Hybridization Signals
Quantitative data were obtained using an imaging, plate device. Hybridization signal detection with a FUJI BAS 1500 machine and quantification with the HDG Analyzer software (Genomic Solutions, Ann Arbor, Mich.) were done as previously described. Quantification was done by integrating all spot pixel intensities and subtracting a spot background value determined in the neighboring area. Spots were located with a LaPlacian transformation. Spot background level was the median intensity of all the pixels present in a small window centered on the spot and which were not part of any spot (44). Quantified data were normalized in three steps and expressed as absolute gene expression levels (i.e. in percentage of abundance of individual mRNA with respect to mRNA within the sample), as described (4).
Before analysis of the results, the reproducibility of the experiments was verified by comparing duplicate spots, or one hybridization with the same probe on two independent arrays, or two independent hybridizations with probes prepared from the same RNA. In every case, the results showed good reproducibility with respective correlation coefficients of 0.95, 0.98 and 0.98 (data not shown). Moreover, genes represented by two different clones on the array, such as CDK4 (SEQ ID No: 288) or ETV5 (SEQ ID No: 300), displayed similar expression profiles for the two clones in all samples. This reproducibility was sufficient to consider a 2-fold expression difference as significantly differential.
For graphical representation, data were displayed as absolute expression levels (
Subsequent analysis was done using Excel software (Microsoft) and statistical analyses with the SPSS software. Metastasis-free survival and overall survival were measured from diagnosis until the first metastatic relapse or death respectively. They were estimated with the Kaplan-Meier method and compared between groups with the Log-Rank test. Correlations of gene pairs based on expression profiles were measured with the correlation coefficient r. The search for genes with expression levels correlated with tumor parameters was done in several successive steps.
First, genes were detected by comparing their median expression level in the two subgroups of tumors discordant according to the parameter of interest. The median values rather than the mean values were used because of the high variability of the expression levels for many genes, resulting in a standard deviation of expression level similar or superior to the mean value and making comparisons with means impossible. Second, these detected genes were inspected visually on graphics, and finally, an appropriate statistical analysis was applied to those that were convincing to validate the correlation. Comparison of GATA3 (SEQ ID No: 78) expression between ER-positive tumors and ER-negative tumors was validated using a Mann-Witney test. Correlation coefficients were used to compare the gene expression levels to the number of axillary nodes involved.
Seventy-nine breast tumors, including 22 of the 34 tested on the arrays, were analyzed for GATA3 (SEQ ID No: 78) expression by Northern blot hybridization. RNA extraction from tumor samples and Northern blots were done as previously described (43). The GATA3 probe was prepared from the IMAGE cDNA clone 129757 (SEQ ID No: 78), which corresponds to the 3′ region (from +843 to +1689) of the GATA3 cDNA sequence (GenBank accession no. X55122). The insert (846 bp) was obtained by digestion of the clone with EcoRI and PacI enzymes. Northern blots were stripped and re-hybridized using an â-actin probe (46).
The crude results of all hybridizations were processed to be presented either as absolute or relative values in schematic figures. The normalization procedure allowed display of absolute values expressed in percent of abundance of mRNA in the probe as shown in
As shown in
In the group A of 15 samples, three samples (normal breast and two tumors) were different from each other and from the other 12 samples. The latter constituted two subgroups of tumors, A1 (n=6) and A2 (n=6), which could be further separated by clustering as shown in
Genes responsible for group A substructure were searched. These are potentially relevant to the prognosis and the sensitivity to chemotherapy in these tumors. Thirty-two genes out of 188 were identified by comparing their median expression level in A1 vs A2. Then, the 12 tumors were reclustered using the expression profiles of these genes as shown in
Differential Gene Expression between Normal Breast and Breast Tumor's
To identify genes differentially expressed between breast tumors (T) and normal breast (NB), the NB value for each gene was compared to its expression level in each tumor. When the expression level of a gene in NB was undetectable, only qualitative information could be deduced and the mRNA was considered as differentially expressed if the signal intensity in the tumor was superior to the reproducibility threshold (0.002% of mRNA abundance). In the other cases, differential expression was defined by an at least 2-fold expression difference. Also, the number of tumors where it was over- or underexpressed was measured. Table 2 shows a list of the top 20 over- and underexpressed genes. For these genes, the T/NB ratio is reported, where T represented their median expression value in the 34 tumors. This ratio ranged from 2.70 (ABCC5; (SEQ ID No: 325) to 17.76 (GATA3; (SEQ ID No: 78) for the overexpressed genes, and from 0.00 (desmin, (SEQ ID No: 170) to 0.29 (APC; (SEQ ID No: 56) for the underexpressed genes.
High expression of mucin 1 (SEQ ID No: 58), NM23, ERBB2 (SEQ ID No: 119), FGFR1 (SEQ ID No: 182) and FGFR2 (SEQ ID No: 15), MYC (SEQ ID No: 75), stromelysin3 (SEQ ID No: 346), cathepsin D (SEQ ID No: 128) and downregulation of FOS (SEQ ID No: 318), APC (SEQ ID No: 56), RBL2, FAS, BCL2 (SEQ ID No: 117) were found, reflecting what is known about their biology in cancer. GATA3 (SEQ ID No: 78), which codes for a member of the GATA family of zinc finger transcription factors, and CRABP2 (SEQ ID No 158), encoding one of the two cellular retinoic acid-binding proteins, showed high expression of mRNA, extending previous results on cDNA arrays (4).
Differential Gene Expression Among Various Breast Tumors and Correlation with Histoclinical Prognostic Parameters
To search for potential prognostic markers in breast cancer, genes with expression levels correlated with conventional histoclinical prognostic parameters were looked for: age of patients, axillary node status, tumor size, histological grade and ER status. No significant correlation was found with age, tumor size and histological grade. However, the expression profiles of some genes correlated with ER status and axillary node involvement.
To identify genes potentially relevant to the hormone-responsive phenotype, the gene expression profiles in ER-positive breast cancers (n=23) versus ER-negative breast cancers (n=11) were compared. Sixteen clones displayed a median intensity of 0 in both groups. Twenty-five presented a fold change superior to 2. Table 3a displays the top 10 over- and underexpressed genes. Among them, the most differentially expressed was GATA3 (SEQ ID No: 78) with a median intensity ratio ER+/ER− of 28.6 and a value for the first quartile of ER-positive tumors superior (5-fold) to the value of the third quartile of the ER-negative tumors as shown in
To search for genes whose expression profile was correlated with axillary lymph node status, a strong prognostic factor in breast cancer, the group of node-negative tumors (n=19) was compared with the group of tumors with massive axillary extension (10 or more positive nodes). Furthermore, because survival decreases with the increase of the number of tumor-involved lymph nodes and because the expression measurements were quantitative, correlation between the expression levels of these genes and the number of tumor-involved nodes (quantitative variables) was determined. Table 3b shows a list of the top 10 over- and underexpressed genes between these 2 groups. Most of these genes have not been previously reported as associated with node status, but some of these results are in agreement with literature data. The gene encoding the tyrosine kinase receptor ERBB2 (SEQ ID No: 119) was the most significantly overexpressed gene in node-positive tumors and displayed the highest correlation coefficient (r=0.68; p≦0.0001).
Gene clustering from
Although in human cancer the proportion of changes that is reflected at the RNA level is not known, monitoring gene expression patterns appears as a very promising way of increasing the knowledge of the disease. Several different types of cancer have been investigated using cDNA arrays: cervical (14), hepatocellular (15), ovarian (16), colon (17) and renal carcinomas (18), glioblastomas (19), melanomas (20) (21), rhabdomyosarcomas (22), acute leukemias (23) and lymphomas (24). In breast cancer, pioneering studies have yielded the first expression patterns (4, 25-31). They have in particular addressed the important issue of molecular differences in hormone-responsive and non-responsive breast tumors. Thus, Yang et al. (28) and Hoch et al. (25) compared expression profiles of breast carcinoma cell lines known to represent these two categories and identified a few genes with differential expression. One of these genes was GATA3. In these studies, cell lines were mostly used and tumor samples were rarely tested and generally in small numbers. The first study analyzing the expression profiles of a large series of breast cancers was published recently (32), but no correlation with clinical outcome was mentioned.
Several interesting points can be made based on the present experimentation. First, the differences in expression patterns among the tumors provided molecular transcriptional evidence of the histoclinical heterogeneity of breast cancer. This diversity was multifactorial, linked to many different genes, highlighting the interest of high throughput analysis in this context. It was possible, with a hierarchical clustering program integrating the expression profiles, to separate normal breast tissue from most tumors and, moreover, to identify two different groups of tumors. Most importantly, two different subgroups of tumors with a very distinct clinical outcome that could not be predicted with classical prognostic factors have been identified by clustering. Indeed, all these tumors had a theoretically bad prognosis as evaluated by current histoclinical tools. All these patients would be at the present time treated with adjuvant chemotherapy, but without the capacity for the physicians to identify patients who will benefit from this treatment and those who will not benefit.
Gene expression profiles were able to make this discrimination. Such predictive tools have important therapeutic implications. Patients with features of poor prognosis are candidates for other treatment than standard chemotherapy, avoiding loss of time and toxicities related to first-line chemotherapy. These results suggest that the histoclinical category of poor prognosis breast cancer, currently treated with adjuvant anthracycline-based chemotherapy, groups together at least two molecularly distinct subgroups of tumors with different outcome which would require distinct chemotherapy regimens. Expression profiles could thus provide a new and more accurate way of classifying breast tumors of poor prognosis and managing patients.
Similarly, despite molecular heterogeneity, significant correlations between the expression level of genes (GATA3 (SEQ ID No: 78), ERBB2 (SEQ ID No: 119)) and histological tumor parameters were identified. The ER-positivity in breast cancer has been correlated with tumor differentiation, low proliferating rate, favorable prognosis and response to hormonal therapy. The relation between hormone sensitivity of breast cancer and ER status is not perfect, and it is possible that some genes related to ER expression are more important than ER to characterize the hormone-sensitive phenotype. These genes could serve as predictive factors to guide the therapy.
GATA3 mRNA expression was highly correlated with ER status. GATA3, which is not estrogen-regulated (25), is a transcription factor that could regulate the expression of genes involved in the ER-positive phenotype. Among the other genes that were found associated with ER status during the experimental work leading to our disclosure, some, such as MYB (SEQ ID No: 355) (10), stromelysin 3 (SEQ ID No: 346) (33), and CRABP2 (SEQ ID No: 158) (34), have been previously reported expressed at high levels in ER-positive breast tumors. The higher levels of TP53 MnRNA in ER-positive tumors studied were surprising, although in agreement with a recent study (27). Most studies concerning TP53 expression analyzed the protein level rather than the mRNA level, and TP53 protein levels are classically negatively correlated with the ER status (35). The high expression of CRABP2 could be related to the better differentiated status of the ER-positive tumors. The low expression of the three immunity-related genes IL2RB (SEQ ID No: 99), IL2RG (SEQ ID No: 281) and CD3G (SEQ ID No: 416) may be related to the low lymphoid infiltration in these well differentiated tumors. ERBB2 high expression in breast cancer has been associated with a poor prognosis and some resistance to hormonal therapy and chemotherapy (36). It is involved in the regulation of cellular differentiation, adhesion, and motility. The motility-enhancing activity of ERBB2 (37) could be responsible for the increased metastatic potential and the unfavorable prognosis of the breast tumors that overexpress ERBB2. The low expression of E-cadherin (SEQ ID No: 328) and thrombospondin 1 (SEQ ID No: 217) in node-positive tumors are consistent with their putative role in different steps of metastatic spread: E-cadherin is an epithelial cell adhesion molecule whose disturbance is a prerequisite for the release of invasive cells in carcinomas (38) and thrombospondin 1 inhibits angiogenesis (39). Similarly, the high expression of the molecule surface antigen Mucin 1 in node-positive tumors (40) can reduce cell-cell interactions facilitating cell detachment and metastasis. CD44 (SEQ ID No: 376), encoding a transmembrane glycoprotein involved in cell adhesion and lymph node homing (41) was expressed at high levels in node-positive tumors as well as GSTP1 (SEQ ID No: 336) (Glutathione-S-Transferase Pi), recently reported associated with increased tumor size (27).
Second, there were a number of genes with highly correlated expression patterns. Gene correlations have already been reported with larger series of genes, essentially under dynamic experimental conditions (42) and recently in steady states (17). Here, correlations were based on expression profiles of a relatively small but selected series of genes and in steady states represented by different breast tumors. Gene correlations are potentially useful tools for cancer research in two ways: i) they can provide information about the general regulation circuitry of a cancerous cell, allowing the identification of regulatory elements controlling expression networks; ii) they offer the possibility of reducing the complexity of the system analyzed by replacing, for example, the intensities of a large number of genes present in a gene cluster by their respective mean intensities.
Finally, these results highlight the great potential of cDNA array in cancer research. The gene expression profiles confirmed the heterogeneity of breast cancer, and most importantly allowed us to identify, among a series of poor prognosis breast tumors, two subtypes of the disease not yet recognized with usual histoclinical parameters but with a different clinical outcome after adjuvant chemotherapy. Furthermore, this disclosure allows detection of genes of which expression was correlated with classical prognostic factors.
Table 4 displays a library of polynucleotides SEQ ID No: 1 to SEQ ID No: 468 corresponding to a population of polynucleotide sequences underexpressed or overexpressed in cells derived from tumors, more particularly breast tumors, and their respective complements.
S. cerevisiae)-like 1 (HRMTlLl)
Tables 5 hereunder displays subpopulations of polynucleotide sequences interesting to distinguish a person without cancer from a cancer patient.
Homo sapiens E74-like factor 1 (ets domain
Tables 5A and 5B hereunder displays two subpopulations corresponding to the 5 top overexpressed and to the 5 top underexpressed polynucleotide sequences particularly interesting to distinguish a person without cancer from a cancer patient.
Table 6 hereunder relates to subpopulations of polynucleotide sequences interesting to detect hormone-sensitive tumors allowing distinction between ER+ and ER− samples.
Tables 6A and 6B hereunder relate to two subpopulations of polynucleotide sequences particularly interesting to detect hormone-sensitive tumors allowing distinction between ER+ and ER− samples
Tables 7 hereunder relates to subpopulations of polynucleotide sequences interesting to distinguish tumors in which a lymph node has been invaded by a tumor cell from tumors in which a lymph node has not been so invaded.
Homo sapiens selectin P
Tables 7A and 7B hereunder relate to two subpopulations of polynucleotide sequences particularly interesting to distinguish tumors in which a lymph node has been invaded by a tumor cell from tumors in which a lymph node has not been so invaded.
Table 8 hereunder relates to subpopulations of polynucleotide sequences particularly interesting to distinguish tumors sensitive to anthracycline from tumors insensitive to anthracycline.
Homo sapiens plasminogen activator (PLAT)
Tables 8A and 8B hereunder relate to two subpopulations of polynucleotide sequences particularly interesting to distinguish tumors sensitive to anthracycline from tumors insensitive to anthracycline.
Tables 9, 9A and 9B hereunder relate to subpopulations of polynucleotide sequences particularly interesting in classifying good and poor prognosis primary breast tumors.
Homo sapiens aminoacylase 1 (ACY1).
Homo sapiens E74-like factor 1 (ets domain
Homo sapiens aminoacylase 1 (ACY1).
Homo sapiens E74-like factor 1 (ets domain
Homo sapiens aminoacylase 1 (ACY1).
Homo sapiens E74-like factor 1 (ets domain
So, a preferred DNA array comprises at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequences indicated in Table 9A and at least one polynucleotide sequence selected among those included in each one of predefined polynucleotide sequences indicated in Table 9B.
Such DNA arrays are particularly useful to distinguish patients having a high risk (bad result) from those having a good prognosis (good result).
This is a divisional application of U.S. Ser. No. 10/007,926, filed Dec. 7, 2001, which is based on U.S. Ser. No. 60/254,090 filed Dec. 8, 2000, which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60254090 | Dec 2000 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10007926 | Dec 2001 | US |
Child | 12903594 | US |