Breast cancer is a major health concern and one of the most prevalent forms of cancer in woman. Breast cancer has the second highest mortality rate of cancers and about 15% of cancer-related deaths in women are do to breast cancer (SEER Cancer Statistics Review 1975-2005, NCI, Ries, L. A. G., et al., (eds) (2008)). It has been estimated that about 13% of women born in the United States will be diagnosed with breast cancer in their lifetime (SEER Cancer Statistics Review 1975-2005, NCI, Ries, L. A. G., et al., (eds) (2008)). Currently, techniques to diagnosis, in particular, to identify women at an increased likelihood of recurrence of breast cancer, methods of treating breast cancer and methods to monitor progress of treatment regimens for breast cancer include the presence of certain tumor markers in breast tissue biopsies. However, such techniques may be inaccurate in detecting breast cancer and assessing therapy options. Thus, there is a need to develop new, improved and effective methods of identifying a woman having an increased likelihood of recurrence of breast cancer, which may determine a course of therapy selection and prognosis.
The present invention related to methods of optimizing treatment of a human having an estrogen-receptor positive breast cancer.
In an embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA and SLC43A3 in a breast cancer tissue sample from the human, wherein underexpression of GABRP, RABEP1, SLC39A6 and PTP4A2 in the sample in combination with overexpression of ESR1, TCEAL1, ATAD2, LRBA and SLC43A3 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer.
In another embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of GABRP, TRIM29, RABEP1, SLC39A6, TCEAL1, PLK1 and CX3CL1 in a breast cancer tissue sample from the human, wherein underexpression of GABRP, TRIM29, RABEP1 and SLC39A6 in the sample in combination with overexpression of TCEAL1, PLK1 and CX3CL1 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer.
In a further embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of TBC109, RABEP1, SLC39A6, FUT8 and PTP4A2 in a breast cancer tissue sample from the human, wherein underexpression of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer.
In still another embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of RABEP1, SLC39A6, FUT8 and PTP4A2 in a breast cancer tissue sample from the human, wherein underexpression of RABEP1, SLC39A6, FUT8 and PTP4A2 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer and thereby increase the likelihood of survival of the human.
In an additional embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA and SLC43A3 in a breast cancer tissue sample from the human, wherein overexpression of GABRP, RABEP1, SLC39A6 and PTP4A2 in the sample in combination with underexpression of ESR1, TCEAL1, ATAD2, LRBA and SLC43A3 in the sample identifies a human that has a decreased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to treat the breast cancer.
In yet another embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of GABRP, TRIM29, RABEP1, SLC39A6, TCEAL1, PLK1 and CX3CL1 in a breast cancer tissue sample from the human, wherein overexpression of GABRP, TRIM29, RABEP1 and SLC39A6 in the sample in combination with underexpression of TCEAL1, PLK1 and CX3CL1 in the sample identifies a human that has a decreased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to treat the breast cancer.
An additional embodiment of the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in a breast cancer tissue sample from the human, wherein overexpression of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in the sample identifies a human that has a decreased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to treat the breast cancer.
The methods of the invention can be employed to optimize treatment of breast cancer in a human. Advantages of the claimed invention include, for example, relatively rapid determination of changes in gene expression on small amounts of tissue (e.g., fresh or frozen biopsies) by detecting changes in relatively few genes (e.g., 10, 9, 7, 5 or 4) which can improve the accuracy of identifying humans with an increased risk of recurrence of the breast cancer. The claimed methods can be employed in optimizing treatment of breast cancer, thereby avoiding recurrence of the disease, serious illness consequent the disease and death.
The features and other details of the invention, either as steps of the invention or as combinations of parts of the invention, will now be more particularly described and pointed out in the claims. It will be understood that the particular embodiments of the invention are shown by way of illustration and not as limitations of the invention. The principle features of this invention can be employed in various embodiments without departing from the scope of the invention.
The methods described herein are generally directed to methods of optimizing treatment of a human with breast cancer. Recurrence of breast cancer in a human can lead to prolonged illness, unknown clinical outcome and mortality. The methods described herein can facilitate critical and careful clinical management of optimal treatment of humans with breast cancer, which decreases the likelihood of recurrence of the breast cancer and death consequent to the breast cancer.
In an embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA and SLC43A3 in a breast cancer tissue sample from the human, wherein underexpression of GABRP, RABEP1, SLC39A6 and PTP4A2 in the sample in combination with overexpression of ESR1, TCEAL1, ATAD2, LRBA and SLC43A3 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer.
In another embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of GABRP, TRIM29, RABEP1, SLC39A6, TCEAL1, PLK1 and CX3CL1 in a breast cancer tissue sample from the human, wherein underexpression of GABRP, TRIM29, RABEP1 and SLC39A6 in the sample in combination with overexpression of TCEAL1, PLK1 and CX3CL1 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer.
In a further embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in a breast cancer tissue sample from the human, wherein underexpression of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer.
In still another embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of RABEP1, SLC39A6, FUT8 and PTP4A2 in a breast cancer tissue sample from the human, wherein underexpression of RABEP1, SLC39A6, FUT8 and PTP4A2 in the sample identifies a human that has an increased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer and thereby increase the likelihood of survival of the human.
In an additional embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA and SLC43A3 in a breast cancer tissue sample from the human, wherein overexpression of GABRP, RABEP1, SLC39A6 and PTP4A2 in the sample in combination with underexpression of ESR1, TCEAL1, ATAD2, LRBA and SLC43A3 in the sample identifies a human that has a decreased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to treat the breast cancer.
In still another embodiment, the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of GABRP, TRIM29, RABEP1, SLC39A6, TCEAL1, PLK1 and CX3CL1 in a breast cancer tissue sample from the human, wherein overexpression of GABRP, TRIM29, RABEP1 and SLC39A6 in the sample in combination with underexpression of TCEAL1, PLK1 and CX3CL1 in the sample identifies a human that has a decreased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to treat the breast cancer.
An additional embodiment of the invention is a method of optimizing treatment of a human having breast cancer, comprising the step of measuring a level of expression of genes selected from the group consisting of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in a breast cancer tissue sample from the human, wherein overexpression of TBC1D9, RABEP1, SLC39A6, FUT8 and PTP4A2 in the sample identifies a human that has a decreased likelihood of recurrence of the breast cancer that would potentially benefit from a therapy to treat the breast cancer.
“Optimizing treatment,” as used herein, means identifying a therapy (e.g., chemotherapy, radiation therapy or any combination of therapies) that has the greatest chance of eliminating the breast cancer or causing remission of the breast cancer as detected by, for example, the presence of breast cancer cells in biopsies, and preventing metastasis of the breast cancer. Malignant breast tumors can form metastases to non-breast tissues and organs by entering the systemic circulatory system (arteries, veins) or lymphatic circulatory system. The methods described herein can be employed to optimize treatment to prevent or minimize metastases of a malignant breast tumor.
“Would potentially benefit,” as used herein, means that the breast cancer may go into remission, is substantially eliminated or palliative remediation of the disease in the human.
“An increased likelihood of recurrence of breast cancer,” as used herein, means that the human had at least one incident of a diagnosis of breast cancer and has an elevated probability of having the breast cancer return. For example, in a meta-analysis (from seven different studies) of more than about 3,500 patients who had received some type of post-surgical adjuvant therapy for breast cancer, risk of cancer recurrence was greatest during the first two years following surgery. After this period, the research showed a steady decrease in the risk of recurrence until year five when the risk of recurrence declined slowly and averaged about 4.3% per year (Saphner T, et al., J Clin Oncol. 14:2738-2746 (1996)). Some proportion of breast cancer recurrences seen in this study occurred more than about five years after surgery, between about six to about 12 years after surgery, even in patients who typically would be considered at low risk for recurrence because their cancer had not spread to the lymph nodes at the time of diagnosis (node-negative). This study shows that through at least about 12 years of follow-up, the risk of breast cancer recurrence remains appreciable and even some patients considered low risk have some risk of the cancer recurring.
“Increased likelihood of survival,” as used herein, means that the human that had at least one incident of a diagnosis of breast cancer has an elevated probability of living.
Expression of the genes in the methods of the invention can be identified by detecting mRNA for the genes or the protein product of the gene (see, for example, U.S. Patent Application Nos. US 2005/0095607, US 2005/0100933 and US 2005/0208500, the teachings of all of which are hereby incorporated by reference in their entirety). In an embodiment, expression of the genes described herein can be assessed by measuring the messenger RNA (mRNA) of the gene in the breast cancer sample. Techniques to identify mRNA are known in the art and include, for example, qPCR, as described infra.
Expression of the genes in the methods described herein can be assessed by Northern Blot analyses. Expression of genes in the methods described here may also be assessed by amplifying a nucleic acid sequence of the gene and detecting the amplified nucleic acid by well-established methods, such as the polymerase chain reaction (PCR), including quantitative PCR (qPCR), reverse transcription PCR (RT-PCR), and real-time PCR (including as a means of measuring the initial amounts of mRNA copies for each sequence in a sample), real-time RT-PCR or real-time Q-PCR. Exemplary techniques to employ such detection methods would include the use of one or two primers that are complementary to portions of a gene of interest, as described herein, where the primers are used to prime nucleic acid synthesis. The newly synthesized nucleic acids are optionally labeled and may be detected directly or by hybridization to a gene or mRNA. The newly synthesized nucleic acids may be contacted with polynucleotides of a breast tissue sample under conditions which allow for their hybridization. Additional methods to detect the expression of genes in the methods described herein include RNAse protection assays, including liquid phase hybridizations and in situ hybridization of cells.
Quantitative polymerase chain reaction (qPCR), also known as real-time PCR, is a modification of the PCR technique that is used to measure the quantity of a specific RNA molecule present in a sample with a high degree of sensitivity (Ding, C., et al. J. Biochem Mol. Biol., 37(1):1-10 (2004)). This is accomplished by first reverse transcribing the RNA to complementary DNA (cDNA), and then amplifying the gene of interest with target specific primers. The amount of DNA is measured after each cycle of PCR by use of fluorescent markers, such as TAQMAN® probes (Applied Biosystems), Sybr green, or molecular beacons. QPCR is one of the most widely used methods of studying specific gene expression in a variety of organisms, tissues, and cells.
Competitive PCR, which utilizes a DNA standard containing a point mutation to differentiate it from the gene of interest, can also be employed to assess expression of genes in the methods of the invention. The point mutation either creates or removes a restriction site, allowing the standard to be distinguished from the target gene. Both the cDNA and DNA standard are co-amplified in the PCR reaction. Resulting products are treated with a restriction enzyme and either subjected to gel electrophoresis, ion pair reversed phase high performance liquid chromatography (IP-RP-HPLC), or matrix assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS) Ding, C., et al., J. Biochem Mol Biol., 37(1):1-10 (2004). Since the amount of DNA standard is known, the concentration of cDNA target can be calculated.
Many genomic questions utilize discovery-based tools to determine global genomic differences between two or more test groups. One of the most widely used methodologies has been microarray gene chips, which span an organism's genome in order to study various aspects, such as gene copy number, single nucleotide polymorphisms (SNPs), comparative genomic hybridization (CGH), and, most commonly, variations in gene expression. Although each type of microarray is designed to study a particular aspect of genomics, they function by similar means. They contain thousands of probes directed at sequences spanning the genome. When a test sample is hybridized with the chip, it can be detected with fluorescence of Cy5/Cy3 or biotin/streptavidin-conjugated to fluorescent compound. With the development of this complicated and powerful technology, there was a great need for bioinformatics tools to analyze the vast amounts of information obtained. Software programs, such as GeneSpring GX (Agilent Technologies), GENECHIP™ (Affymetrix), and Partek GS (Partek Incorporated), have been developed to help decipher the massive data sets obtained from global gene expression studies.
Gene expression for use in the methods described herein can also be assessed by differential display. In this technique mRNA is reverse transcribed using three anchored oligo(dT) primers that differ in the base adjacent to the poly(dT) sequence. The resulting cDNA is then further amplified with short (about 13 bp) random primers. The resulting PCR products are labeled with either radioisotopes or fluorescent dyes and separated by polyacrylamide gel electrophoresis (PAGE). When two cDNA samples are displayed on the gel side-by-side, changes in gene expression can be detected (Ding, C., et al., J Biochem Mol Biol., 37(1):1-10 (2004)). By utilizing laboratory automation technologies, the entire genome can be covered with a few hundred reactions. Another technique is serial analysis of gene expression (SAGE), which utilizes double stranded cDNA sequences made with biotinylated oligo(dT) primers. These are then digested with a restriction enzyme, and the 3′ ends are recovered with streptavidin beads. The cDNA is then ligated to linker sequences containing a specific restriction site which cleaves 14 by downstream of the site. This yields a linker attached to a 10 base gene-specific tag, which is then cloned into a plasmid and sequenced. The frequencies of gene-specific tags are utilized to estimate the gene expression levels.
Increases (up-regulation of expression, also referred to an “overexpression”) and decreases (down-regulation of expression, also referred to a “underexpression”) of genes in the method described herein may be expressed in the form of a ratio between expression in a cancerous breast cell or a Universal Human Reference RNA (Stratagene, La Jolla, Calif.) (also referred to herein as a “control”). For example, a gene can be considered up-regulated if the median expression value relative to a control, such as a Universal Human Reference RNA, is above one (1). Likewise, a gene can be considered down-regulated if the median expression value relative to a control, such as a Universal Human Reference RNA, is less than one (1), as described herein. Expression levels can be readily determined by quantitative methods as described herein, such as nucleic acid amplification assays. The methods described herein can identify over-expression (increases) or under-expression (decreases) of genes compared to a Universal Human reference RNA control. Over-expression or under-expression can be correlated with patient characteristics (e.g., age, menopausal stage, disease-free) and breast cancer characteristics (e.g., grade stage, estrogen receptor status, progesterone receptor status).
Over and under expression of genes described herein can be assessed by determining the Hazard Ratio (HR) by the methods described herein. HR less than one (1) indicates that the gene is overexpression and HR over one (1) indicates that the gene is underexpressed.
Expression of the genes described herein can be assessed as a ratio of the expression of the gene in a breast tissue sample from the mammal and a control tissue sample, such as from another mammal with breast cancer, from a sample of the same mammal from a previous breast cancer incident, or a mammal without breast cancer (also referred to herein as “normal” or “non-cancerous”). For example, an increase in the ratio of expression of the gene in the breast tissue sample from the mammal compared to a non-cancerous sample, may indicate an increased likelihood of recurrence of the breast cancer. The ratios of increased expression can be about 1.1, about 1.2, about 1.3, about 1.4, about 1.5, about 1.6, about 1.7, about 1.8, about 1.9, about 2, about 2.5, about 3, about 3.5, about 4, about 4.5, about 5, about 5.5, about 6, about 6.5, about 7, about 7.5, about 8, about 8.5, about 9, about 9.5, about 10, about 15, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 150, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900 or about 1000. For example, a ratio of 2 is a 100% (or a two-fold) increase in expression. Likewise, a decrease in gene expression can be indicated by ratios of about 0.9, about 0.8, about 0.7, about 0.6, about 0.5, about 0.4, about 0.3, about 0.2, about 0.1, about 0.05, about 0.01, about 0.005, about 0.001, about 0.0005, about 0.0001, about 0.00005, about 0.00001, about 0.000005 or about 0.000001, which may indicate a decreased likelihood of recurrence of breast cancer in the mammal.
Similarly, increases and decreases in expression of the genes described herein can be expressed based upon percent or fold changes over expression in non-cancerous cells or a control, such as a Universal Human Reference RNA. Increases can be, for example, about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 120, about 140, about 160, about 180 or about 200% relative to expression levels in non-cancerous cells or a control. Alternatively, fold increases may be of about 1, about 1.5, about 2, about 2.5, about 3, about 3.5, about 4, about 4.5, about 5, about 5.5, about 6, about 6.5, about 7, about 7.5, about 8, about 8.5, about 9, about 9.5 or about 10 fold over expression levels in non-cancerous cells. Likewise, decreases may be of about 10, about 20, about 30, about 40, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 98, about 99 or 100% relative to expression levels in non-cancerous cells or a control.
Exemplary methods to assess relative gene expression analyses include employing the ΔΔCt method, in which the threshold cycle number (CT value) is the cycle of amplification at which the OCR instrument system recognizes an increase in the signal (e.g., SYBR® green florescence) associated with the exponential increase of the PCR product during the log-linear phase of nucleic acid amplification. These CT values are compared to those of a housekeeping gene, such as glyceraldehyde phosphate dehydrogenase (GAPDH) or β-actin to obtain the ΔCt value, which is used to normalize for variation in the amount of RNA between different samples. The ΔCt value of each gene is then compared to that present in a calibrator, such as Universal Human Reference RNA (Stratagene, La Jolla, Calif.), in order to obtain a ΔΔCt value. Since each cycle of amplification doubles the amount of PCR product, the expression level of a target gene relative to that of the calibrator is calculated from 2−ΔΔCt, expressed as relative gene expression.
In one embodiment, the breast tissue sample is a laser capture microdissection (LCM) breast tissue sample. LCM is known in the art and is described herein. LCM can result in collections of varying cell types (e.g., epithelial, stromal, smooth muscle) in varying numbers, such as about 100 cells, about 1000 cells, about 2000 cells or about 5000 cells. LCM can be employed to prepare a breast tissue sample that includes relatively pure populations of a single cell type, such as an epithelial cell, a stroma cell or a smooth muscle cell.
Systems include the PIXCELL IIe™ LCM System and Image Archiving Workstation (Arcturus Bioscience, Inc.), which utilizes a thermal-sensitive film that is placed over the cells of interest. When the infra-red laser is fired from above, the film is melted onto the cells of interest and resolidifies encapsulating those cells. Sluka P, et al., Prog Histochem Cytochem; 42(4):173-201 (2008).
The P.A.L.M. (P.A.L.M. Microlaser Technologies, Bernried, Germany) instrument utilizes both laser microdissection and pressure catapulting (Burgemeister, R., J. Histochem. Cytochem 53(3):409-412 (2005)). This is performed by an ultraviolet laser firing from below the tissue to cut through the region containing the cells of interest, with a second firing that catapults the cells up off the slide. The Leica (Wetzlar, Germany) AS laser microdissection (LMD) instrument does not utilize a glass slide, and the dissected cells drop into a collection tube. Molecular Machine & Industries (MMI, Glattbrugg, Switzerland) has developed two instruments, the mmi CELLCUT™ and the mmi SMARTCUT™. These instruments both allow microdissection of single cells or groups of cells collected using an adhesive cap rather than by catapulting. The VERITAS™ is a relatively new instrument from Molecular Devices (Sunnyvale, Calif.), combines the technologies of laser capture and laser cutting and utilizes both an ultraviolet and infrared laser to perform the microdissection [46].
In another embodiment, the breast tissue sample is an intact tissue section breast tissue sample. Intact tissue section can be prepared employing established techniques. For example, an intact tissue section can be prepared by freezing a breast tissue sample obtained from a biopsy in O.C.T. (Optimum Cutting Temperature) and cryo-sectioning the intact breast tissue sample. The frozen intact tissue section is then placed on a glass slide and stained with hematoxylin and eosin to assess structural integrity. Additional frozen intact tissue sections are prepared for total RNA extraction, purification and analyzed by quantitative polymerase chain reaction (qPCR), as described infra.
The breast tissue sample can be a biopsy sample that includes at least one member selected from the group consisting of breast epithelial cells, breast stromal cells, breast smooth muscle cells, which can include breast cancer cells of these tissue types. The breast tissue sample can be a breast biopsy that includes a carcinoma (ductal, lobular, medullary and/or tubular carcinoma). The breast tissue sample can be a breast biopsy that includes stroma. The breast tissue sample can be subjected to laser capture microdissection (LCM) in which relatively pure populations of carcinoma cells (cancerous cells of breast epithelium) and/or relatively pure populations of stromal cells are obtained. “Relatively pure,” as used herein in reference to a carcinoma or stromal breast tissue sample, means that the sample is about 95%, about 98%, about 99% or about 100% one cell type (e.g., carcinoma or stroma).
The breast tissue sample employed in the methods described herein can include homogenates of breast cancer biopsies, which include populations of different cell types (e.g., epithelial, stromal, smooth muscle).
The breast cancer tissue sample can be from a pre-menopausal human or a post-menopausal human.
The breast cancer tissue sample employed in the methods of the invention can be a breast cancer tissue sample, such as a primary breast cancer tissue sample, from a human that is lymph node negative (i.e., the breast cancer has not spread to the lymph node) and the breast cancer is estrogen receptor positive; or can be a breast cancer tissue sample from a human that is lymph node positive breast cancer (i.e., the breast cancer has spread to the lymph node) and the breast cancer is estrogen receptor positive.
The breast cancer tissue sample can be from a human with stage 1 (I), 2 (II), 3 (III) or 4 (IV) estrogen-receptor breast cancer or a human with stage 1, 2, 3 or 4 estrogen-receptor positive and progesterone-receptor positive breast cancer.
The American Joint Committee on Cancer (AJCC) staging of breast cancer is based on a scale of 0-4, with 0 having the best prognosis and 4 having the worst. There are multiple sub-classifications within each Stage classification (Robbins and Cotran, Pathological Basis of Disease, 7th ed., Kumar, V., et al. (eds), Elsevier Saunders (2005)). Patients that present with ductal carcinoma in situ (DCIS) or lobular carcinoma in situ (LCIS) are considered stage 0. An invasive carcinoma of less than about 2 cm in the greatest dimension and no lymph node involvement is considered Stage I. An invasive carcinoma of less than about 5 cm in the greatest dimension and about 1 to about 3 positive lymph nodes is considered Stage II. Stage III refers to an invasive carcinoma of less than about 5 cm in the greatest dimension and four or more axillary lymph nodes involved or to an invasive carcinoma no greater than about 5 cm in the greatest dimension with nodal involvement or to an invasive carcinoma with at least about 10 axillary lymph nodes involved or invasive carcinoma with involvement of ipsilateral internal lymph nodes or invasive carcinoma with skin involvement, chest wall fixation or inflammatory carcinoma. Stage 1V refers to a breast carcinoma with distant metastases (Robbins and Cotran Pathological Basis of Disease, 7th Edition, eds. V. Kumar, et al., A. K. Abbas and N. Fausto, Elsevier Saunders (2005)).
Clinical staging of breast cancer is an estimate of the extent of the cancer based on the results of a physical exam, imaging tests (e.g., x-rays, CT scans) and often biopsies of affected areas. Blood tests can also be used in staging.
Pathological staging can be done on patients who have had surgery to remove or explore the extent of the cancer, which can be combined with clinical staging (e.g., physical exam, imaging tests). In some cases, the pathological stage may be different from the clinical stage. For example, surgery may reveal that the cancer has spread beyond that predicted from a clinical exam.
In an embodiment, the methods of the invention measure expression of genes in breast cancer sample is from a human that has an estrogen-receptor positive breast cancer (referred to herein as “ER+”). In a further embodiment, the breast cancer sample is from a human that has a progesterone-receptor positive breast cancer (referred to herein as “PR+”). In still another embodiment, the breast cancer sample is from a human that has an estrogen-receptor positive and a progesterone-receptor positive (referred to herein as “ER+/PR+”) breast cancer. Estrogen Receptor (ER) is also referred to herein as “ESR.” Progesterone Receptor is also referred to herein as “PGR” or “PR.”
The ESR measured can be expression of at least one member selected from the group consisting of ESR1 (also referred to as “estrogen receptor alpha”) gene expression and ESR2 (also referred to as “estrogen receptor beta”) gene expression.
“Estrogen-receptor positive breast cancer,” as used herein, means that the levels of estrogen receptor protein in the breast cancer sample or biopsy are greater than about 10 fmol/mg protein (e.g., about 10 fmol/mg protein by ligand binding assay or about 15 fmol/mg protein by EIA) by established techniques, such as at least one member selected from the group consisting of radioligand binding, Enzyme ImmunoAssay (EIA) and semi-quantitative immunohistochemical assay (see, for example, Wittliff, J. L., et al., Steroid and Peptide Hormone Receptors: Methods, Quality Control and Clinical Use. In: K. I. Bland and E. M. Copeland III (eds.), The Breast: Comprehensive Management of Benign and Malignant Diseases, Chapter 25, pp. 458-498, Philadelphia, Pa.: W. B. Saunders Co. (1998)).
“Progestin-receptor positive breast cancer,” as used herein, means that the levels of progestin receptor protein in the breast cancer sample or biopsy measure greater than about 10 fmol/mg protein (e.g., about 10 fmol/mg protein by ligand binding assay or about 15 fmol/ng by EIA) by established techniques, such as at least one member selected from the group consisting of radioligand binding, EIA and semi-quantitative immunohistochemical assay (see, for example, Wittiff, J. L., et al., Steroid and Peptide Hormone Receptors: Methods, Quality Control and Clinical Use. In: K. I. Bland and E. M. Copeland III (eds.), The Breast: Comprehensive Management of Benign and Malignant Diseases, Chapter 25, pp. 458-498, Philadelphia, Pa.: W. B. Saunders Co. (1998)).
Humans whose treatment is optimized by the methods described herein can have an estrogen-receptor positive breast cancer that is a primary estrogen-receptor positive breast cancer (i.e., cancer arising from breast tissue, such as epithelial tissue) or a secondary estrogen-receptor positive breast cancer (i.e., cancer arising from an organ other than breast tissue that metastases to breast tissue).
The methods described herein can further include the step of treating the human with a therapy that decreases the likelihood of recurrence of the breast cancer. The therapy may increase the likelihood of survival of the human. The selection of therapy will depend on, for example, the stage of the breast cancer, the expression of particular genes, age of the human, overall health status, current treatment, ER status of the breast cancer and PR status of the breast cancer. Therapies can include at least one member selected from the group consisting of surgery radiation therapy, chemotherapy and, for ER+, PR+ or ER+/PR+ breast cancers, endocrine therapy. For example, polychemotherapy with at least 4 cycles of one member selected from the group consisting of cyclophosphamide in combination with methotrexate and fluorouracil (CMF); doxorubicin in combination with fluorouracil and cyclophosphamide (FAC); and fluoruracil in combination with epirubicin and cyclophosphamide (see, for example, Early Breast Cancer Trialists' Collaborative Group (EBCTCG), Lancet 365(9472):1687-717 (2005)) may be used as a therapy to optimize treatment of humans with ER+ and PR+ breast cancers. Chemotherapy may be combined with radiation therapy and/or endocrine therapy. Endocrine therapy, such as treatment with at least one member selected from the group consisting of at least one estrogen receptor antagonist, at least one aromatase inhibitor and at least one selective estrogen receptor modulator (“SERM”), could be employed in humans having ER positive breast cancer. Alternatively, to optimize treatment of the breast cancer, chemoendocrine therapies may be employed in combination with endocrine adjuvant therapies, for example, in humans identified by the methods of the invention that have lymph node negative breast cancers.
“Selective estrogen receptor modulator (SERM),” as used herein, refers to nonsteroidal and steroidal compounds that interact with the estrogen receptor to thereby affect or mediate the action of estrogens, such as 17β-estradiol. The administration of a SERM may provide the benefits of estrogens without the potentially adverse risk of increased cell proliferation in estrogen-responsive tissues, such as breast and uterine epithelium. Selective estrogen receptor modulator, such as a 2-[4-[1,2-di(phenyl)but-1-enyl]phenoxy]-N,N-dimethylethanamine therapy (e.g., TAMOXIFEN™ therapy), can be employed alone or in combination with other treatments (e.g., chemotherapy, radiation therapy) when the methods of the invention identify a human that has an increased likelihood of recurrence and have or had an ER positive breast cancer.
Radiation therapy, has generally be employed as a treatment for relatively large breast cancer tumors and breast cancers from humans with at least four (4) positive lymph nodes. Humans identified by the methods described herein that can potentially benefit from a therapy to decrease the likelihood of recurrence of the breast cancer, in particular cancers that are from lymph node-negative humans (also referred to herein as “patients”) may have optimized therapies that include more aggressive therapy, such as radiation even if the clinical profile, for example, small tumor, low lymph node involvement, would not otherwise lead itself to radiation therapy.
For ER+ breast cancers, the methods of the invention can identify humans with increased risks of recurrence of the breast cancer can result in treatments that are customized to the patient and may be more clinically aggressive than patients who do not have an increased likelihood of recurrence of the breast cancer. Thus, treatment of humans having an increased likelihood of recurrence of the breast cancer can be a more aggressive therapy.
The methods described herein can further include the step of administering at least one alternative therapy to the human alone or in combination with the 2-[4-[1,2-di(phenyl)but-1-enyl]phenoxy]-N,N-dimethylethanamine therapy, thereby treating the human for the estrogen-receptor positive breast cancer. An exemplary alternative therapy can include at least one aromatase inhibitor (Mauri, D., et al., J. Natl. Cancer Inst. 98:1285-1291 (2006)) (e.g., Anastrozol, Arimidex™, 2-[3-(1-cyano-1-methyl-ethyl)-5-(1H-1,2,4-triazol-1-ylmethyl) phenyl]-2-methyl-propanenitrile). Selective estrogen receptor modulator, for example, 2-(para-((Z)-4-chloro-1,2-diphenyl-1-butenyl)phenoxy)-N,N-dimethylethylamine, IUPAC designation) (Pagani, O., et al., Ann. Oncol. 15:1749-1759 (2004)) (TOREMIFENE™) and [6-hydroxy-2-(4-hydroxyphenyl)-1-benzothiophen-3-yl]-4-(2-piperidin-1-ium-1-ylethoxy)phenyl]methanone chloride (RALOXIFENE™, EVISTA® IUPAC designation (2-(4-Hydroxyphenyl)-6-hydroxybenzo(b)thien-3-yl)(4-(2-(1-piperidinyl)ethoxy)phenyl)methanone may be considered.
“Alternative therapy,” as used herein, means a treatment other than treatment with 2-[4-[1,2-di(phenyl)but-1-enyl]phenoxy]-N,N-dimethylethanamine therapy (i.e., TAMOXIFEN™ IUPAC designation (Z)-2-(para-(1,2-Dephenyl-1-butenyl)phenoxyl)-N,N-dimethylamine) also referred to as NOLVADEX™. “Alternative therapy,” is also referred to herein a “therapy that is alternative to.” The alternative therapy can be administered alone or in combination (e.g., before, during or after) with chemotherapy, radiation therapy and therapy with estrogen-receptor antagonists, such as 2-[4-[1,2-di(phenyl)but-1-enyl]phenoxy]-N,N-dimethylethanamine.
Optimization of treatment of human by the methods described herein that have ER+ and/or PR+, lymph node-negative breast cancers may include the use of TAMOXIFEN™ alone as a maintenance therapy after surgical removal of the tumor or a course of adjuvant chemotherapy (e.g., CMF, FAC, FEC).
Employing the methods described herein, a patient can be identified that has a “high risk” of recurrence (i.e., the breast cancer sample has an expression profile of a particular gene subsets as described herein), indicating that the patient should receive more aggressive therapies (terms used by oncologists to describe, for example, dose escalations). Thus, a patient with the lymph node-negative cancer would be a candidate for therapy regimens selected for patients with lymph node-positive cancer, which include multiple courses of polychemotherapy and/or external beam radiation therapy. Various polychemotherapy regimens are used at the discretion of the oncologist depending upon the collective characteristics of the lesion, the patient parameters and health status and other features and would be within the knowledge and medical expertise of one skilled in the art. The regimens could include TAC (docetaxel plus doxorubicin and cyclophosphamide).
Thus, the methods of the invention can be employed to identify patients who are less likely to have a recurrence of a breast cancer.
In addition, humans having lymph node-positive cancers, that can include breast cancers that are ER+ and/or PR+, and expression profiles of genes employed in the methods described herein may indicate that the human has a “low risk” of recurrence. Thus, even though the patient is lymph node-positive, they may benefit from a less aggressive treatment (e.g., polychemotherapy alone or radiation therapy alone).
Thus, the expression of the genes described herein may predict the survival and prognosis of the human. For example, the methods described herein identify a human who has an increased likelihood of recurrence of breast cancer, which may indicate an increased likelihood of death. Likewise, employing the methods described herein, a human may be identified who has a relatively low likelihood of recurrence of breast cancer, which may indicate increased survival.
The methods of the invention can be employed to predict, for example, local recurrence of primary breast carcinoma and regional or distant metastases from primary breast carcinoma, which may provide prognostic evaluation of overall survival probabilities at time of diagnosis for primary breast carcinoma. The methods of the invention can be employed to optimize therapeutic regiments for treatment of the breast cancer, which would be customized to the patient by one of skill in the art based on factors such as age, health history, other disease and family history. The gene expression profiles described herein may provide biomarkers assessing disease progression and response in human cancers other than breast (e.g., ovarian, uterine, colon).
Several methods to predict the likelihood of recurrence of breast cancer have been described, including ONCOTYPE DX™, MAMMA PRINT®, BREAST BIOCLASSIFIER™. However, such tests are based on samples obtained for analysis from various methods (e.g., cell lines, fixed tissues) and assess relatively large number of genes (e.g., 21 genes, 97 genes) and, thus, are not suitable for routine screening.
The methods described herein provide clinically relevant subset of genes in a tissue biopsy that predicts breast cancer behavior (gene subset of about 10, 9, 7, 5 or 4 genes is commercially feasible for development of a molecular diagnostic acceptable to clinicians, pathologists and laboratory medicine specialists. The methods of the invention may be performed quickly on tissue biopsies, and the entire panel of genomic biomarkers may be measured simultaneously in conventional formats, e.g., qPCR or hybridization arrays.
Few genomic tests are currently available in the clinical laboratory setting, and few technical staff have experience in the isolation, purification and amplification of labile mRNA for technologies such as qPCR and microarray. Use of molecular diagnostic technologies can provide for standardized methods for tissue collection that preserve the integrity of the biological macromolecules (DNA, RNA, protein) with the cells, allowing for more accurate detection.
“Breast cancer behavior,” as used herein, means, for example, whether the breast cancer will result in an increased likelihood of recurrence of the breast cancer, whether the human has increased likelihood of survival or death and a selection of a course of treatment for the breast cancer.
The methods described herein may be used in combination with other methods of diagnosing breast cancer to thereby more accurately identify a mammal at an increased risk for recurrence of breast cancer. For example, the methods described herein may be employed in combination or in tandem with assessments of the presence or absence of Ki-67, an antigen that is present in all stages of the cell cycle except GO and can be employed as a marker for tumor cell proliferation, and prognostic markers (including oncogenes, tumor suppressor genes, and angiogenesis markers) like p53, p27, Cathepsin D, pS2, multi-drug resistance (MDR) gene, and CD31. Alone or in combination with other clinical correlates of breast cancer, the methods described here may increase the accuracy of detection of breast cancer, in particular, in mammals who have had at least one or more incidents of breast cancer, thereby optimizing treatment of the breast cancer to decrease likelihood of recurrence of the breast cancer.
In an additional embodiment, the invention is an immobilized collection (microarray) of the genes, such as a gene chip for ease of processing in the methods described herein. The gene chips that include the genes described herein can permit high throughput screening of numerous breast tissue samples. The genes identified in the methods described herein can be chemically attached to locations on an immobilized collection, such as a coated quartz surface. Nucleic acids from breast tissue samples can be prepared as described herein and hybridized to the genes and expression of the genes identified.
In another embodiment, the invention includes kits to perform the methods described herein.
The teachings of all patents, published applications and references cited herein; and U.S. patent application Ser. No. 12/630,212 (Publication No: 2010/0112592) and Patent Cooperation Treaty Applicant No: PCT/US2009/060506 (WO 2010/045234) are incorporated by reference in their entirety.
RNA was isolated from tissue sections of 126 de-identified frozen biopsies of invasive ductal carcinoma using the RNeasy® Mini kit (Qiagen) and analyzed for quality and quantity using the BIOANALYZER (Agilent). cDNA for qPCR measurements was prepared in Tris-HCl buffer containing KCl, MgCl2, DTT (Invitrogen), dNTPs (Invitrogen), RNasin® (Promega) and Superscript® RT III (Invitrogen). qPCR reactions were performed using Power Sybr® Green PCR Master Mix (Applied Biosystems), forward/reverse primers and cDNA obtained from the reverse transcription reaction. Relative gene expression was calculated with the ddCt method, using β-actin as the reference gene and Universal Human Reference RNA (Stratagene) as a calibrator. qPCR reactions were performed in triplicate with duplicate wells in each 384-well plate, to ensure reproducibility.
Gene expression results from qPCR were correlated with disease-free and overall survival outcome data. Expression of TBC1D9, RABEP1, SLC39A6, FUT8, and PTP4A2 correlated with disease-free survival using univariate Cox proportional hazards analyses (P<0.05). Expression of RABEP1, SLC39A6, FUT8, and PTP4A2 appeared to be related to overall survival using univariate analysis (P<0.05). Multivariate analyses were performed with backwards stepwise selection to predict disease-free survival using expression levels of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA, and SLC43A3. ROC curves were composed to illustrate the sensitivity and specificity of the model for disease-free and overall survival with areas under the curves equal to 0.78 and 0.76, respectively. Consideration of additional parameters, e.g., estrogen and progestin receptor status, menopausal status and lymph node involvement, did not improve the model.
A molecular signature was identified consisting of expression profiles of candidate genes, in a multivariate Cox proportional hazards model of breast cancer recurrence. The model also predicted overall survival.
Use of SPSS statistical software enabled the use of multivariate Cox regressions (using forward and backward stepwise selection) to obtain an optimal model for predicting patient survival (i.e., clinical outcome of breast cancer patients).
Survival analyses of individual genes of both carcinoma and stromal subsets revealed over-expression of TBC1D9 and TPBG in the carcinoma cells were associated with decreased disease-free and overall survival.
Individual expression levels of TBC1D9, CENPA, MELK, ATAD2, MCM6, YBX1, GMPS, and CKS2 in the stromal cells were associated with poor prognosis of breast cancer. These results indicate that over-expression of each of these 8 genes in stromal cells is correlated with an increased likelihood of death due to breast cancer.
Over-expression of TBC1D9 in either LCM-procured carcinoma cells or surrounding stromal cells appears to be associated with poor survival.
Each of the 32 candidate genes was evaluated using clinical follow-up and microarray results from LCM-procured carcinoma cell preparations from 247 patient specimens. Examination of the entire 22,000 gene microarray results from carcinoma cells revealed that individual expression levels of twelve genes in the “stromal subset” (e.g., FUT8, MELK, PFKP, PLK1, ATAD2, XBP1, BUB1, GATA3, MAPRE2, GMPS, CKS2, and SLC43A3) were independently associated with disease-free or overall survival.
Examination of these same results from the entire 22,000 gene microarray results from LCM-procured carcinoma cells) revealed that individual expression levels of ten genes in the “cancer subset” (e.g., EVL, NAT1, ESR1, TBC1D9, SCUBE2, IL6ST, RABEP1, TPBG, TCEAL1, and DSC2) were independently associated with disease-free or overall survival of breast cancer patients.
Expression levels of seven genes (NAT1, ESR1, SCUBE2, FUT8, PTP4A2, LRBA, and MAPRE2) appear to be highly correlated with other genes in the 32 gene set. Each of these seven genes exhibited expression levels related to those of another gene when examined as gene pairs (Pearson correlation used as statistic). Each of the seven genes correlated as pairs with more than 20 of the other genes in the 32 gene set. Expression levels of estrogen and progestin receptor mRNA were highly correlated with ER and PR protein levels of these known tumor markers using Pearson correlations and linear regressions.
When genes were individually stratified by median expression level and individually analyzed by Kaplan-Meier survival plots, SCUBE2 exhibited a median expression level that significantly stratified patients into good and poor prognosis groups for disease-free survival, while GABRP, TBC1D9, SLC39A6, MELK, MCM6, and PTP4A2 associate with disease-free and overall survival (P value less than 0.10).
Several genes (GABRP for nodal status; NAT1, CENPA, and BUB1 for tumor grade; ESR1, SCUBE2, RABEP1, SLC39A6, TCEAL1, and XBP1 for ER status; SLC39A6 and PTP4A2 for PR status) appear to distinguish good and poor prognosis groups in specific patient populations better than in the entire population.
Expression of TBC1D9, RABEP1, SLC39A6, FUT8, and PTP4A2 correlated independently with disease-free survival using univariate Cox Regression analyses (P less than 0.05).
Expression of RABEP1, SLC39A6, FUT8, and PTP4A2 appeared related to overall survival using univariate analysis (P less than 0.05).
Multivariate Cox proportional hazards models, performed with backwards stepwise selection in the entire population, predicted disease-free survival using expression levels of nine genes (ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA, and SLC43A3). ROC curves indicated the sensitivity and specificity of the model for disease-free and overall survival.
Results described herein identified small, biologically significant and clinically relevant gene sets that form the basis for a commercial test for assessing risk of breast cancer recurrence. The small number of genes in the clinically relevant subsets and the availability of technology for constructing an instrument for measuring gene expression, allows development of a readily available test to predict risk of recurrence of breast cancer at the time of surgical removal of the primary cancer. The ability to determine a gene expression profile in a hospital laboratory setting avoids the necessity for a “send-out test.”
Gene sets, identified in previous studies distinguishing subtypes, are too complex for routine use in breast cancer management. To assess clinical relevance, smaller sets of 32 candidate genes were identified. Procedures, refined for processing human tissue biopsies for microgenomics, revealed gene expression levels measured by qPCR were similar in LCM-procured carcinoma cells compared to those of intact tissue. However, LCM appeared essential when studying gene expression in stromal cells, since greater differences were observed compared to intact tissue. Survival analyses revealed that over-expression of each of eight genes in stromal cells correlated with decreased patient survival.
Examination of microarray results from carcinoma cells indicated that expression of twelve genes in the “stromal subset” were also clinically relevant, suggesting importance of measuring gene expression in both carcinoma and stromal cells. After qPCR validation, distribution and expression levels of each gene were determined by qPCR in 126 breast carcinoma specimens. Although 7 genes exhibited bimodal distribution, it was insignificant in survival analyses. Expression levels of seven genes were correlated with more than 20 other genes suggesting pathway associations. Expression of TBC1D9, RABEP1, SLC39A6, FUT8, and PTP4A2 correlated independently with disease-free survival using univariate Cox Regression analyses, while that of RABEP1, SLC39A6, FUT8, and PTP4A2 appeared related to overall survival. Several genes, individually stratified by median expression level and Kaplan-Meier analysis, distinguished good and poor prognosis groups in specific patient populations better than in the entire population. Multivariate Cox proportional hazards models predicted disease-free survival using expression levels of nine genes (ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA, and SLC43A3). ROC curves illustrated sensitivity and specificity of the model for disease-free and overall survival. Small, clinically relevant gene sets are being developed as a commercial test for assessing risk of breast cancer recurrence. Prediction of risk of recurrence at the time of surgical removal of the primary breast cancer will facilitate treatment planning and disease surveillance resulting in improved clinical care.
Breast cancer represents a prevalent disease in which genomic approaches have been employed, with the hope of improving the understanding, treatment and prevention of the disease. This has become a major health concern, because it is the most prevalent form of cancer in women in the United States. The American Cancer Society estimates that about 192,370 new cases of breast cancer will be diagnosed in 2009, and about 15 percent of cancer deaths (estimated at 40,170) in women will be due specifically to breast cancer in 2009, which is the second highest mortality of all cancer types. It is estimated that about 13.4 percent of women born in the United States today will be diagnosed with breast cancer at some point in their lives.
There are many different prognostic and predictive factors utilized when assessing breast cancer patients, since the outcome varies significantly. Generally, the prognosis is based on the pathological attributes of the primary tumor and the axillary lymph nodes. The major prognostic factors include 1) whether the disease is confined to ducts and lobules by the basement membrane (in situ) or invading the surrounding tissues; 2) whether there are distant metastases present in the patient; 3) whether the carcinoma has spread to the lymph nodes; 4) the size of the primary tumor; 5) presence of local advanced disease; and 6) presence of inflammatory carcinoma [8]. These major prognostic factors are the strongest predictors of death from breast cancer and are incorporated into the American Joint Committee on Cancer (AJCC) staging system [9]. The AJCC staging is on a scale of 0-4, with 0 having the best prognosis and 4 having the worst. Patients that present with ductal carcinoma in situ (DCIS) or lobular carcinoma in situ (LCIS) are considered Stage 0. An invasive carcinoma of less than 2 cm in the greatest dimension and no lymph node involvement is considered Stage 1. An invasive carcinoma of less than 5 cm in the greatest dimension and 1-3 positive lymph nodes or greater than 5 cm in the greatest dimension without lymph node involvement is considered Stage 2, Stage 3 refers to an invasive carcinoma of 5 cm or less in the greatest dimension and four or more axillary lymph nodes involved or to an invasive carcinoma greater than 5 cm in the greatest dimension with nodal involvement or to an invasive carcinoma with 10 or more involved axillary lymph nodes or invasive carcinoma with involvement of ipsilateral internal lymph nodes or invasive carcinoma with skin involvement, chest wall fixation or inflammatory carcinoma. Stage 1V refers to any breast carcinoma with distant metastases present (derived from [8; 9]).
There are additional prognostic factors which are used to determine which therapies may best benefit the patient. These include 1) histology of the primary tumor; 2) tumor grade, which assesses the degree of differentiation in the cells within the tumor; 3) presence of estrogen receptors (ER) & progestin receptors (PR) in the tumor, determine whether a patient is a candidate for hormone therapy, such as tamoxifen (NOLVADEX™), anastrozole (ARIMIDEX™), etc.; 4) over-expression of HER-2/neu oncoprotein, determines if a patient is a candidate for antibody therapy, such as Trastuzumab (HERCEPTIN™); 5) lymphovascular invasion; 5) proliferation rate of the cells; and 6) the DNA content in the tumor cells [8; 10-12].
Applying genomic and proteomic approaches to studying human cancer has been complicated by some fundamental problems of tissue collection and handling, as well as reliable methods for extracting, purifying, amplifying and analyzing RNA for gene expression profiling. These problems are also compounded by the cellular heterogeneity of breast tissue biopsies, which are used in the studies, compared to those involving the use of animal models or homogeneous cell lines grown in culture. For example, analysis of the levels or activities of certain tumor markers are currently performed either using biochemical or immunohistochemistry methodologies (e.g., [10; 11]). If the analyte is measured in a biochemical assay, a tissue biopsy consisting of a heterogeneous cell population is homogenized and the final concentration of the analyte from the cancer cells is reduced by the contamination of other proteins released from non-cancerous cells (e.g., surrounding stroma, epithelium and connective tissue cells). Therefore, a bias of the analyte concentration is likely to be observed due to the surrounding cell types, complicating the results obtained. While some tumor markers present in tissue biopsies have been used with ER positive patients with Tamoxifen and the treatment of patients with tumors over-expressing HER-2/neu with HERCEPTIN™, many questions regarding analyte expression in cancer still remain.
Breast carcinoma tissue biopsies are composed of not only of the carcinoma cells themselves, but also of infiltrating endothelial cells, fibroblasts, macrophages and lymphocytes. The stroma surrounding the cancer cells provides the necessary vascular support and extracellular matrix molecules that are required for tumor growth and progression [12]. There has recently been growing evidence in the importance of stromal cell contributions to the developing tumor (e.g., [12-28]). An early investigation of breast tumor stromal and epithelial cell lines derived from human tissues indicated that the enzyme aromatase is present in stroma within breast tumors and suggests estrogen synthesis from within the tumor may modulate growth by a paracrine mechanism [29]. A study investigated differences in gene expression between breast carcinoma cells and the surrounding stromal cells, in which they detected a number of genes which may aid in the understanding of stromal responses to the presence of a nearby tumor [23]. Cancer progression may involve matrix metalloproteinases (MMPs) ability to degrade the basement membrane.
In many solid tumors, MMPs are produced by the surrounding stromal cells, rather than the tumor cells themselves [27]. It has been determined that small differences in either stromal or tumor expression of certain MMPs (MMP-2/TIMP-2 or MMP-14) are associated with cancer progression [30]. Stromal cells have also shown to promote tumor growth and angiogenesis through secreting an elevated amount of SDF-1/CXCL12, which can bind to its cognate receptor CXCR4 expressed on the surface of tumor cells [24].
Experiments were performed to determine optimal yield and analyses of mRNA obtained from small quantities of cancer tissues. This included tissue preparation, techniques in LCM, RNA extraction, purification and amplification, as well as development of quality control analyses at each step in the procedure.
To evaluate differences between cell types, either whole tissue specimens or isolate the cells of interest by LCM was extracted for DNA, RNA or protein analyses [37; 38; 135-137].
Before the handling of any patient encoded information or results, Collaborative Institutional Training Initiative (CITI) training and Health Insurance Portability and Accountability Act (HIPAA) certification were obtained. All specimens and follow-up information were de-identified and encoded in the Tumor Marker™ database, and no identifiers were used in any part of this research as indicated in Institutional Review Board (IRB) protocols #334.05 and 583.06. Proper tissue procurement, specimen handling and cryopreservation were essential for the collection of quality information from these analyses (e.g., [11; 135]). As described by Wittliff and Erlander [38], archival biopsy specimens used in this study were expeditiously removed without trauma during the surgical procedure. Specimens were chilled on ice, and then trimmed of obvious necrotic tissue, leaving normal tissue present with the lesion in question. Tissue specimens were either frozen on dry ice in the pathology suite within 20-30 min of collection or rapidly transported chilled in a Petri dish or plastic bag immersed in ice prior to cryopreservation and frozen section preparation in the LCM laboratory, to retain the biological integrity of macromolecules [38]. Procedures avoiding RNase and DNA contamination were employed, i.e., cleaning of bench area and utensils with RNase Away (Molecular BioProducts) or RNase Zap (Ambion). With the sensitive technologies of genomics and proteomics requiring nondestructive isolation of pure cell populations, new surgical pathology approaches and methods have been developed as recommended by Cole et al. [34] and Wittliff et al. [11; 37; 38].
Specimens were processed according to accepted biohazard policies in clean rooms/benches prepared to reduce RNase and DNA contamination and frozen in Optimum Cutting Temperature (O.C.T.). compound (TISSUETEK® OCT medium, VWR Scientific Products Corp.) and stored at −86° C. until sectioning and microdissection. At that time, frozen sections were collected on sterile, uncharged microscope slides that were retained frozen until use.
Frozen sections mounted on uncoated glass slides were handled according to established procedures depending upon the type of staining reagent (e.g., [37; 38; 71; 138]). The intercalating dye, ToPro3 (Molecular Probes, Inc., Eugene, Oreg.), which binds to double stranded nuclei acids and exhibits a peak fluorescence at 661 nm, has been used in previous studies to assess the integrity of DNA in vivo in LCM-procured cells [38].
Prior to analyses in an RNase-free setting, the structural status of the tissue was evaluated after sectioning and staining with hematoxylin and eosin (H & E), using a modified staining protocol (Table 1) [38; 138]. This modified protocol was used to shorten the time required, and thus reduce RNA degradation, while adequately staining the sections for visualization of cell types. The slides were prepared for the LCM process by dehydration with absolute ethanol, and coating of the tissue sections with xylenes, which helped prevent re-hydration. In an H & E stained tissue section from a representative breast cancer specimen, where a prevalence of carcinoma cells invaded the adjacent stroma, the structural integrity of the tissue section indicated that the biopsy was acceptable to proceed with LCM and gene expression analyses. Immunohistochemistry (IHC) of protein analytes (e.g., estrogen receptor, progestin receptor, HER-2/neu and epidermal growth factor (EGF) receptor) has been performed in previous studies [38] of invasive ductal carcinoma using mouse monoclonal antibodies TAB250 and AB10 (Clone 111.6) against HER-2/neu protein and EGF Receptor, respectively, to guide selection of cells exhibiting particular protein analytes of clinical interest H &E staining of either analyte occurs primarily at the cell membrane of carcinoma cells. HISTOGENE™ Frozen Tissue Staining Kit (Arcturus Bioscience) and an LCM Staining Kit (Ambion, Austin, Tex.) have been specially developed to aid visualization of cells, while minimizing degradation of RNA for laser capture [139].
Analysis of the intact tissue section is vitally important to ensure extraction of high-quality RNA of sufficient quantity prior to the tedious LCM process. For these quality control studies, tissue was processed in an RNase-free manner and stained by H & E with a protocol identical to that employed for tissue sections used for LCM. This quality control step ensures there is no difference attributable to the staining step in the extent of RNA degradation in each of the sample preparations, i.e., intact tissue section and microdissected cells. However, H & E staining may alter the quantity of RNA extracted relative to that of unstained sections.
Gene expression analyses of intact tissue sections was warranted. Two methods of preparing intact tissue sections from frozen biopsies were refined [38]. The first involved preparation of frozen tissue sections in the cryostat (−20° to −25° C.) without the use of a glass slide. As a tissue section was cut (7-25 μm), it formed a “curl” which was placed directly into an RNase-free microcentrifuge tube for nucleic acid or protein extraction. This simple procedure has the advantage of allowing collection and storage at −80° C. of multiple samples from the same tissue specimen. Additionally, samples from a multitude of specimens may be prepared and stored in order to process them simultaneously for RNA or protein extraction to ensure uniform handling. The other method involved the collection of frozen tissue sections (5-10 μm) on RNase-free, uncharged glass slides in the cryostat (−20° to −25° C.), which were then stored at −80° C. without cover-slips. To ensure there was no contact between frozen tissue sections, slides were stored in 100-count slide boxes.
Maintaining the integrity of labile mRNA is paramount to obtaining high-quality results from qPCR and microarray analyses. When using frozen tissue “curls,” 350 μl of extraction buffer (RLT with β-mercaptoethanol) from the QIAGEN (Valencia, Calif.) RNEASY® RNA isolation kit was added to the microcentrifuge tube and incubated on ice for 5 min and mixed briefly using a VORTEX GENIE™, before centrifugation to sediment the cell debris and O.C.T. embedding compound. These and all subsequent RNA isolation and characterization steps were conducted in an RNase-free setting.
As in the procedure for extracting frozen tissue “curls,” it was unnecessary to utilize H & E staining for tissue sections collected on uncharged slides. However, when preparing RNA from tissue sections collected on uncharged slides, the sections were fixed in 70% ethanol for 1 min at 25° C. prior to removing the O.C.T. embedding compound by dipping briefly in RNase-free water. In the absence of H & E staining, the slides were then transferred stepwise into 95% ethanol, then four separate transfers into separate tubes of 100% ethanol before brief exposure to 100% xylene in 2 separate tubes. After drying the slide at room temperature for 2-3 min, the fixed, unstained tissue section was ready for preparation of “scraped” samples.
In contrast to RNA preparation from “curls,” fixed tissue sections from frozen samples collected on slides were “scraped” from the slide surface by placing a small amount (175 μl) of the same extraction buffer onto the tissue section, then scraping the section with an RNase-free pipet tip to loosen it from the slide, while drawing the tissue suspension into the pipet tip. This step was repeated with the same volume of extraction buffer to remove any tissue fragments remaining on the slide.
Using either extraction technique, RNA was extracted using the QIAGEN RNEASY® RNA isolation kit, which included spin columns, a DNase treatment step, a series of washes and an elution to purify the RNA from the samples. Typically, 10-200 ng total RNA were isolated from a single 7 μm gross tissue section (Table 2). If only a small amount of RNA (e.g., less than 1 ng for downstream microarray analyses, or less than 10 ng for downstream qPCR analyses) remained intact in this assessment of sample quality, then subsequent LCM procedures were not warranted.
Quality of RNA was evaluated by a variety of procedures, including with the Agilent RNA 6000 Nano or Pico Kits and the BIOANALYZER™ Instrument (Agilent Technologies). The BIOANALYZER™ can provide a numerical RNA Integrity Number (RIN) of the total RNA after electrophoretic separation, which utilizes 18S and 28S rRNA profiles to provide a quantitative assessment of the quality of RNA in the sample [140]. In general, a RIN value of greater than 7 is correlated with high quality RNA acceptable for genomic analyses.
The NANODROP™ (Nanodrop Technologies, Wilmington, Del.) Instrument determines RNA quantity and purity based on absorbance at 260 nm and 280 nm with the added feature that only 1 ul of sample is required. Analysis of intact RNA can also be performed using reverse transcription and qPCR. Since fragment gene sequences contained in degraded mRNA will not amplify, an estimate of total intact RNA can be determined from a standard curve of Universal Human Reference RNA (Stratagene, La Jolla, Calif.).
Serial sections of each frozen tissue biopsy were either stained with H & E or left unstained. Total RNA was extracted as described in using each pair of sections and the mRNA quantity for each preparation was determined by qPCR, using β-actin as a reference gene, to evaluate the influence of H & E staining. Results are representative of the range of RNA recoveries in the H & E stained sections compared to those of unstained sections.
Cells of interest were microdissected using the PIXCELL IIe™ with CAPSURE™ LCM Caps (Molecular Devices), which permitted collection of intact cells on the surface transfer film of the cap. For documentation purposes, a “Map” image was taken at 10× magnification, while LCM was performed at 20× magnification. The complete removal of carcinoma or stromal cells by LCM, were deposited on the surface of the LCM cap
Carcinoma and stromal cells were removed independently from heterocellular regions and procured cleanly for retention on the LCM caps. If necessary, CAPSURE™ Pads were utilized to remove cellular debris from the CAPSURE™ LCM Caps prior to nucleic acid extraction. CAPSURE™ pads (Arcturus Bioscience) were used to eliminate contaminating cells and debris during LCM. Stromal cells were transferred loosely bound to the LCM cap during collection of carcinoma cells.
The stromal cells adhered to the LCM-procured carcinoma cells bound to the film surface, showing that only carcinoma cells were retained on the cap surface after treatment of the specimen with a CAPSURE™ Pad.
RNA Isolation and Characterization from LCM-Procured Cells
Total RNA from laser captured cells was isolated using the PICOPURE® RNA Isolation kits (Molecular Devices), which were optimized for cells procured by LCM. This procedure utilizes a DNase (Qiagen) digestion step to eliminate DNA contamination. Typically, 1-6 ng of total RNA were extracted from LCM-procured cells using 50 μl XB BUFFER™ (Arcturus), compared to 10-200 ng total RNA from a single 7 μm intact tissue section, in agreement with earlier studies [37; 38]. To demonstrate the yield and integrity of RNA obtained from either tissue sections or LCM, serial sections of a single specimen of representative invasive ductal carcinoma of the breast were prepared and one section was left unstained, while another was stained with H & E (Table 3). The third section was subjected to LCM for procurement of cancer cells only (2221 laser pulses). The representative results shown in Table 3 are typical of the greatest differences observed between total RNA quantities extracted from H & E stained sections compared to unstained sections. As predicted, the quantity of total RNA in the LCM-procured cell preparation varied with the number of cells captured. Other kits designed for isolation of total RNA from small samples (e.g., those obtained by LCM) are also commercially available, including RNAQUEOUS™-MicroKit (Ambion), ARRAYPURE™ (Epicentre, Madison, Wis.), PURELINK™ (Invitrogen, Carlsbad, Calif.) and CELLSDIRECT™ (Invitrogen). Although their use was explored, the PICOPURE® kits provided optimal and reproducible results. After total RNA was isolated from the sample, characterization analyses (e.g., quality and quantity) were performed before proceeding to gene expression analyses, such as qPCR or microarray.
Serial sections of a single specimen of representative invasive ductal carcinoma of the breast were prepared and one section was left unstained, while another was stained with H & E. The third section was subjected to LCM for procurement of cancer cells only (2221 laser pulses). The representative results are typical of the greatest differences observed between total RNA quantities extracted from H & E stained sections compared to unstained sections. The quantity of total RNA in the LCM-procured cell preparation varied with the number of cells captured.
In order to analyze gene expression by qPCR, cDNA must be reverse transcribed from the isolated total RNA. Two types of primers may be utilized for reverse transcription reactions: random hexamers or oligo (dT) primers (e.g., [84]). Random hexamers amplify most RNA species, including mRNA, tRNA and rRNA, while oligo (dT) primers preferentially amplify mRNA due to the presence of poly (A) tails [84]. A study by Hembruff et al. [84] found that oligo (dT) primers were superior to random hexamers after RNA isolation by the RNEASY® method, because of less variability in expression of the S28 reference gene that is independent of the method of qPCR detection (i.e., Sybr green or TAQMAN® probes). Oligo (dT) primers were utilized with LCM procured cells because of the need for linear amplification prior to microarray [37; 38].
Total RNA extracted from either the intact tissue section or LCM-procured cells was reverse transcribed in a solution of 250 mM Tris-HCl buffer, pH 8.3 containing 375 mM KCl, and 15 mM MgCl2 (Invitrogen), 0.1 M DTT (dithiothreitol, Invitrogen), 10 mM dNTPs (Invitrogen), 20 U/reaction of RNASIN™ ribonuclease inhibitor (Promega, Madison, Wis.) and 200 U/REACTION OF SUPERSCRIPT™ III RT (reverse transcriptase, Invitrogen) with 5 ng T7 primers. The cDNA obtained from this reverse transcription reaction was diluted 10-fold in 2 ng/ul polyinosinic acid and used in qPCR reactions. Other commercial kits for cDNA synthesis: ISCRIPT™ (Biorad), TRANSCRIPTOR™ (Roche Diagnostics, Indianapolis, Ind.) and MONSTERSCRIPT™ (Epicentre) were explored. A methodology designed by Miltenyi Biotech, which utilizes a magnetic bead-based isolation of RNA and reverse transcription reaction (μMACS™), provides cDNA in a simple procedure over a significantly shorter period of time. However, SUPERSCRIPT™ III RT (Invitrogen) provided the greatest latitude in preparation and use of cDNA for a variety of applications.
qPCR Analyses of Gene Expression The qPCR reactions were performed in either a 96-well plate using a total volume of 25 μl/well or in a 384-well plate using a total volume of 10 μl/well. The reactions contained POWER SYBR™ Green PCR Master Mix (Applied Biosystems, Foster City, Calif.), forward primer, reverse primer and diluted cDNA obtained from the reverse transcription reaction. SYBR green is a fluorophore that binds to double-stranded DNA that is produced during each cycle of amplification [84]. Many other SYBR Green master mixes are also commercially available, such as FASTSTART™ (Roche Diagnostics), ISCRIPT™ (BioRad) and TAQURATE™ (Epicentre). Reactions can also be performed utilizing fluorescent probes, such as TAQMAN® (Applied Biosystems), which provide a high degree of sensitivity and specificity. However, studies performed by Hembruff et al. determined that the sensitivity of Sybr green was sufficiently high and was the preferred method of product detection due to its lower cost [84]. Although primers used in these investigations were designed with PRIMER EXPRESS™ (Applied Biosystems), both primers and probes were purchased pre-designed from a commercial source, such as Applied Biosystems. Primers were designed for sequences closer to the 3′ end of the transcript when using a T7 (oligo (dT)) primer in the reverse transcription reaction, due to degradation which may occur near the 5′ terminus.
The threshold cycle number (Ct value) was the cycle of amplification at which the qPCR system recognizes an increase in the signal (i.e., Sybr green) associated with the exponential growth of the PCR product during the log-linear phase. These Ct values were compared to those of a reference gene, such as glyceraldehyde phosphate dehydrogenase (GAPDH) or β-actin (ACTB), to obtain a ΔCt value [141; 142]. Amplification of the reference gene also serves as a positive control for efficiency of the qPCR reaction. Expression of the gene of interest (as a ΔCt value) was then compared to that of the same gene in the calibrator, i.e., Universal Human Reference RNA (Stratagene, La Jolla, Calif.), in order to obtain a ΔΔCt value. This ΔΔCt value is then converted to a relative expression level for the gene of interest (relative gene expression=2−ΔΔCt). This method of analyses is known as the ΔΔCt method of calculating relative gene expression [141].
In preparation for genomics studies utilizing LCM-procured cells, RNA yield and integrity analyses of the cognate intact tissue section must be performed. If a direct comparison is to be made between LCM-procured cells and intact tissue, the specimens should be treated identically, including the thickness of the tissue section and staining protocol. However, if gene expression is to be determined only on intact tissue sections, it is preferable to use “tissue curls” as described maintaining consistent procedures with each tissue biopsy. Although considerable variation was noted in the cellular content and contaminating elements of the various human breast carcinoma biopsies investigated, using the tissue preparation and processing protocols appeared to enhance the reproducibility of the results.
Since there are many techniques for determining quality and quantity of total RNA, experiments were conduced to select the optimal method (Table 4) to obtain the minimal yield of RNA of high quality necessary for downstream application, e.g., qPCR. As shown in Table 4, measurements of quantity and quality of RNA obtained from eleven different representative breast tissue specimens were performed using three independent methods: Agilent BIOANALYZER™, NANODROP™, and qPCR with a known Universal Human Reference RNA (Stratagene). A comparison of these methods gave highly variable results, as expected with these completely different technologies (Table 4). However, there appeared to be greater agreement in the estimates of total RNA using the Agilent BIOANALYZER™ compared to those from the NANODROP™ Instrument. Values obtained from qPCR were much lower, apparently due to the fact that only mRNA that has been reverse transcribed is measured. For the examples shown in Table 4, 8 of the 11 samples evaluated had sufficient intact RNA (about >10 ng/ul estimated by the BIOANALYZER) for either qPCR analysis of specific genes, amplification for microarray hybridization, or proceeding to LCM and RNA extraction.
Since results from the BIOANALYZER™ provide a reproducible estimate of RNA quality and quantity in a sample, unlike those of the NANODROP™ instrument, and use of the BIOANALYZER™ is considerably less expensive and time consuming compared to qPCR, the Bioanalyzer was employed in the standardized protocol. Representative BIOANALYZER™ profiles from analyses of total RNA extracted from tissue sections of four different human breast carcinoma specimens showed varying yields and quality. For example, one extract produced a low RNA yield (10 ng/ul) of high quality (28S/18S=1.1), a second produced a low RNA yield (12 ng/ul) of poor quality (28S/18S=0.0), a third produced a high RNA yield (195 ng/ul) of the highest quality (28S/18S=1.0) RNA, and a fourth produced a high RNA yield (157 ng/ul) that was degraded (28S/18S=0.3). A similar instrument, EXPERION (BioRad, Hercules, Calif.), also provides a rapid, and reproducible separation and analysis of protein and nucleic acid samples, and provides similar data analyses including a concentration, 28S/18S ratio, and a RQI (RNA quality indicator) value.
If the yield of RNA is low or of marginal quality, additional tissue sections or LCM-procured cells may be processed from serial tissue sections in different regions of the O.C.T. block, and the RNA extracted may be pooled. Using this approach, few human breast carcinoma specimens have been rejected. If necessary, the isolated RNA may be concentrated using a SPEEDVAC™ (Savant), or similar product.
Assessment of Yield and Integrity of RNA from LCM-Procured Cells
The ability to procure homogeneous cell sub-populations of normal stromal and malignant cell types, and to generate genomic and proteomic results from each cell type advances the understanding of the underlying causes of tumor formation. Furthermore, this approach permits the tracking of cell progression into a metastatic phenotype at the molecular level. To examine gene expression in carcinoma and stromal cells from a breast cancer biopsy, frozen tissue blocks were processed as serial 7 μm sections as shown in
Although the PIXCELL IIe™ LCM System and Image Archiving Workstation (Arcturus Bioscience, Inc.) was employed because it was the only instrument available, other systems have been developed for cell collection from tissue sections. The P.A.L.M. (P.A.L.M. Microlaser Technologies, Bernried, Germany) instrument utilizes both laser microdissection and pressure catapulting. Molecular Machine & Industries (MMI, Glattbrugg, Switzerland) has developed two instruments, the mmi CELLCUT™ and the mmi SMARTCUT™, which procure cells of quality similar to that of the PIXCELL IIe™. A new generation LCM instrument, the VERITAS™, was developed by Arcturus Bioscience to combine the technologies of laser capture and laser cutting, utilizing both an ultraviolet and infrared laser [46]. Each of these instruments allows microdissection of either single cells or groups of cells [45].
Gene Expression Analyses by qPCR
The choice of a reference gene is vitally important for normalizing data obtained in qPCR reactions. The reference gene chosen must be evenly expressed across samples and amplify with the same efficiency as the genes of interest, in order to ensure that differences observed in the genes of interest reflect the biological status of the specimen. Although an investigation [142] reported that greater than 90% of published gene expression studies in high impact journals prior to 1999 utilized GAPD, ACTB, 18S and 28S rRNA as single genes for normalization, other investigators question whether any single gene is ideal (e.g., [84; 143]). Their suggestions include the use of total RNA or panels of reference genes. Although most studies focused on identification of genes whose expression levels remained constant in a variety of cell types, use of a single tissue or cell type suggests the reference gene should remain constant in that particular tissue (e.g., [143]). This may be confirmed by analyses of several tissue samples, each with known RNA concentrations, as suggested by Suzuki [142]. In order to assess this quality, the following study was performed. Each of eight RNA samples of a breast tissue panel was diluted to the same concentration and re-quantified by spectroscopy (NANODROP™) to confirm the concentrations. The RNA was reverse transcribed and subjected to qPCR for the reference gene of interest, such as ACTB (Table 6). Results from these eight samples gave an average Ct value of 18.58 with a standard deviation of 0.54, indicating a relatively low amount of variation of ACTB expression among samples. Thus ACTB was employed as the reference gene in the standardized protocol for breast tissue.
To ensure accuracy of gene expression measurements, genes of interest should have similar amplification efficiencies. Representative standard curves (
Another validation of qPCR results was performed using a dissociation curve analysis. At the conclusion of PCR amplification of target genes, an additional anneal and melt cycle was performed on the PCR products over an extended period of time with fluorescence measured over the entire cycle. The presence of a single peak in fluorescence indicated a single PCR product), while multiple peaks) indicated formation of products, such as primer dimerization or non-specific products as suggested by Bookout [144].
It is widely accepted that many investigations of genomics and proteomics of human tissues utilized biopsy specimens collected, stored, and processed using a variety of conditions, many of which were unstandardized. The concern is of such a magnitude that the National Cancer Institute has established “Best Practices for Biospecimen Resources” focusing on collection of human tissue specimens and associated data for research purposes. In the current investigation, procedures and conditions were refined [37; 38] for processing de-identified human tissue biopsies in preparation for microgenomic-based investigations in an RNase-free setting. These include the establishment of standardized protocols for RNA purification and amplification using both frozen tissue sections and LCM-procured cells.
It was demonstrated that the total RNA extracted from either thin tissue sections of individual cell populations (e.g., carcinoma or stromal cells) was of high quality providing meaningful results. Furthermore, standardized conditions were developed to improve RNA yields from LCM-procured cells, as well as from thin (7-10 μm) intact tissue sections, such that microgenomic analyses could be performed reproducibly. Results were obtained demonstrating that ACTB was a valid reference gene for normalization of qPCR results, since its expression levels remained constant among a wide variety of human breast carcinomas, and its efficiency of amplification was similar to those of target genes. Nucleic acid dissociation curve analyses confirmed the quality of PCR products formed for analysis of gene expression. Collectively, these results confirm that the procedures for tissue and cell processing for subsequent isolation of intact mRNA were applicable for assessing the expression of candidate genes.
Global gene expression using microarrays has been explored as a means to determine molecular profiles reflecting breast cancer behavior (e.g., [41; 47; 48; 50-73]). Expression profiles are proposed to provide a more accurate prediction of the clinical course of breast cancers than indicated by conventional tumor markers. However, there is great variation in methods and platforms utilized to obtain these gene expression profiles of cancer, including the use of breast cancer cell lines (e.g., [55; 134]), whole tissue extraction (e.g., [65; 73]), and LCM-procured cells (e.g., [41; 57; 70; 71]). In an attempt to identify a small, clinically relevant gene set, numerous “molecular signatures” of breast cancer reported to be related to clinical behavior were investigated (e.g., [41; 47; 48; 54; 55; 62-65; 67; 70]).
The eleven gene signatures described supra, without bias of gene selection, were investigated to derive a subset of candidate genes for development of a predictive test of risk of breast cancer recurrence.
GenBank Accession numbers (NCBI) of genes from studies of interest [47; 48; 54; 55; 62-64; 67; 70; 71; 75] were entered into the UniGene database (NCBI), which separates the GenBank sequences into a non-redundant set of gene-oriented clusters. There are 123,891 sequence entries for Homo sapiens. Each UniGene Cluster contains sequences that represent a unique gene, which has a specific identifier. Once the appropriate UniGene identifier is known, the gene sets can be sorted by the UniGene identifier and analyzed. For example, epidermal growth factor receptor (EGFR) has a GenBank Accession number of NM—201284. Entry of this Accession number into the UniGene database identifies UniGene Cluster Hs.488293 Homo sapiens Epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian) (EGFR). Twenty-six mRNA sequences have been entered including NM—201284. In addition 335 expressed sequence tag (EST) sequences have been entered. Using this approach, one may identify a variety of sequences associated with a single gene (Table 7).
To illustrate the sequence relationship described in the three independent studies, GenBank Accession Numbers or gene IDs were matched to the cognate gene. Once the UniGene identifiers were compiled into a Microsoft Excel spreadsheet, they were imported into Microsoft Access, where they were analyzed collectively. A Tier 1 level of comparison identified any gene that appeared in at least two molecular signatures, while a Tier 2 comparison identified any gene that appeared in at least three signatures. To identify genes that appear most relevant in breast carcinoma cells compared to those of surrounding stromal cells, the Tier 2 genes were separated into two groups. One group contains genes which appeared in that gene sets described by Wittliff and co-workers [41; 70] using only carcinoma cells procured by LCM, while another group, derived by elimination, was composed of genes that did not appear their “cancer” gene sets. This latter group of genes, which was tentatively assigned to stromal cells, was explored for their contribution to breast cancer behavior.
Comparisons of the 12 molecular signatures [47; 48; 54; 55; 62-65; 67; 70] reporting 2604 total Unigene sequences were analyzed. While 354 genes appeared in at least two of the signatures reported to be clinically relevant, only 32 genes appeared in at least three of these signatures (Table 8). Of the 32 genes present in at least three signatures, only 14 were reported in studies utilizing LCM-procured carcinoma cells (Table 9), while 18 were not (Table 10). This supports the suggestion that cells surrounding a malignant lesion are important in cancer progression (e.g., [12-30; 32]), since the 18 genes were identified as clinically relevant in at least three independent investigations using intact tissue. Some of these genes are reported (e.g., [11; 148-152]) to play a role in tumorigenesis or progression (e.g., ESR1 and NAT1), while others appear to be genes that are not associated with tumorigenesis (Table 11).
To investigate relationships of genes with known biological pathways and functions, the gene lists were imported into INGENUITY® (Ingenuity Systems), which is a software package that builds relevance networks based on published literature. The list of 32 genes was divided into 3 networks of biological interactions. The first network has pathways involved in cancer, respiratory disease and cell death, and includes 13 genes (BUB1, CKS2, EVL, FUT8, GATA3, GMPS, LRBA, PFKP, PTP4A2, RABEP1, SLC43A3, TBC1D9, and TRIM29) out of the 32 gene set. The other genes appearing in this network (CASP3, CLEC4E, CTSC, EGFR, IL6, IL13, JAKMIP2, LPAR3, MIA2, NR3C1, NSMAF, PDGF-CC, RB1, SBNO2, SCGB3A1, SLC16A6, SLC39A14, SLC7A7, TGFB1, TIMD4, TNS4, and TPST2) may be additional candidates for future investigations. Interestingly, IL6 appears in this network, but its receptor IL6ST, which is in the 32 gene set, does not.
The second network involves pathways associated with cellular growth and proliferation, the hematological system, development and function, and hematopoiesis, and includes 12 genes (ATAD2, CENPA, CX3CL1, ESR1, IL6ST, MAPRE2, MCM6, MELK, NAT1, PLK1, ST8SIA1, and XBP1) of the 32 gene set. This network also includes NFkB and the proteasome, which are known to be involved in tumorgenesis [229; 230]. The additional components of this network (5430435G22RIK, APOBEC3G, BCL2L14, CARD10, CDC25B, Cdc25B/C, DOK5, ERK, FSH, HSPA13, IL1F8, IL1F9, MAPK6, MT3, NFkB (complex), PIF, PRKX, Proteasome, RAB33B, SLC12A7, STK10, STK24, and TFF2) may be additional candidates for investigation.
Network 3 includes pathways associated with cancer, cellular compromise, and genetic disorders, and includes 7 genes (DSC2, GABRP, SCUBE2, SLC39A6, TCEAL1, TPBG, and YBX1) of the 32 gene set. The other genes appearing in this network (AATK, ATP6V1F, BAI2, C22ORF28, CD1B, DHRS3, DUSP11, FMR1, GABRE, HECW2, HNF4A, LAD1, MIRN18A, N4BP2L2, OAS3, PEMT, RBM7, RTP3, SCUBE1, SHISA5, TMEM49, TMEM176B, TNF, TP73, TRIM15, ZBTB11, ZNF175, and ZNF318) may be candidates for future investigations.
It was determined that 21 of the 32 genes (ATAD2, BUB1, CENPA, CKS2, CX3CL1, ESR1, GABRP, GATA3, GMPS, IL6ST, MELK, PFKP, PLK1, RABEP1, SCUBE2, SLC39A6, ST8SIA1, TBC1D9, TPBG, XBP1, and YBX1) had known associations with cancer in general, and several were associated with specific cancer types, including six genes (ESR1, GATA3, PLK1, SCUBE2, SLC39A6, and TBC1D9) associated with breast cancer (Table 12). Associations of genes with various cellular functions involved with cancer progression were also determined (Table 13). Six genes
(ESR1, GABRP, IL6ST, PLK1, ST8SIA1, and XBP1) were involved in growth, while six genes (ATAD2, CKS2, ESR1, IL6ST, PLK1, and ST8SIA1) were found to be involved in proliferation pathways. There were four genes (CKS2, ESR1, PLK1, and XBP1) associated with cell cycle progression, two genes (ESR1 and ST8SIA1) associated with development, and two genes (IL6ST and ST8SIA1) involved in cell morphology-related functions. Additionally there were associations with cellular processes that are negative regulators of cancer progression, such as differentiation (ESR1, IL6ST, and ST8SIA1) and apoptosis (ESR1, PLK1, XBP1, and YBX1).
Several reports of the published molecular signatures of breast cancer utilized in development of this 32 gene set also performed pathway analysis of their molecular signatures (e.g., Jansen et al. [54] and Wang et al. [67]) to identify relationships between those gene sets and other published works. Utilization of this pathway analysis software revealed that a number of the genes from the signatures were involved in similar pathways, e.g., cell death, cell cycle, and proliferation, although different genes in the pathways were identified in different molecular signatures. Collectively, this information provides insight into cellular mechanisms by which these genes interact, while providing candidate molecular targets and pathways for devising therapeutic approaches.
Thus the gene signatures described herein were investigated collectively, without bias in gene selection, to derive a subset of candidate genes in order to test their utility as a predictive test of risk of breast cancer recurrence.
In order to evaluate the clinical relevance of gene sets described above, the expression results of those genes were first analyzed for reproducibility to ensure the quality of data used for clinical correlations. Gene expression was measured in intact tissue sections for both levels and distributions, before proceeding to investigate the two gene sets representative of the corresponding cell types procured by LCM [231].
Reproducibility of qPCR Analyses
The technique of real-time quantitative polymerase chain reaction (qPCR) using the ABI Prism 7900HT system (Applied Biosystems) was utilized for quantitative examination of the gene transcripts of interest. Cells from preparations of either intact tissue sections or LCM-procured cells were lysed, and extracts were examined for transcription of candidate genes. RNA from each cell type was extracted and isolated with the Arcturus PICOPURE™ (LCM-procured cells) or QIAGEN RNEASY™ RNA isolation kit (intact tissue section analyses) following procedures described in herein.
After isolation from the LCM-procured cells, the RNA was evaluated with the Agilent RNA 6000 Pico Kit and the BIOANALYZER™ Instrument (Agilent Technologies) for quality and quantity before proceeding to reverse transcription and qPCR. Multiple microdissections (2-3 LCM caps) from a tissue section were pooled to obtain a greater quantity of RNA, so that a linear amplification step was unnecessary prior to qPCR. To accomplish this, the amount of total RNA required from LCM-procured cells for a qPCR reaction was 10 ng from carcinoma cells and 1 ng from stromal cells. Total RNA was then reverse transcribed to cDNA and analyzed by qPCR. The concentration of the calibrator (i.e., cDNA obtained from reverse transcription of Universal Human Reference RNA (Stratagene)), for ΔΔCt calculations was adjusted to be similar to that of the experimental reactions in the qPCR plate.
Extensive quality control experiments were performed to assess reproducibility of the qPCR results. Four serial tissue sections from each of three specimens were prepared and processed concurrently, through scraping, RNA isolation, reverse transcription and qPCR analyses of the genes in the cancer subset. The qPCR reactions were performed in triplicate with duplicate wells in each 384-well plate. A second quality control evaluation involved RNA extraction and qPCR analyses of three tissue sections of each of six different specimens, each section processed and evaluated independently on different days to ascertain inter-assay variation. Furthermore, each specimen was analyzed in triplicate by qPCR with duplicate wells in each 384-well plate.
T-tests and analysis of variance (ANOVAs) were performed either in MICROSOFT® Excel or GRAPHPAD PRISM® Version 4 (GraphPad Software, La Jolla, Calif.). Univariate cox regressions were performed with SPSS® 17.0 statistical package (SPSS Inc., Chicago, Ill.). This software package is a comprehensive system of advanced statistics and is widely used to extract information from large amounts of population-based data. Survival calculations were performed using log2 transformations of relative gene expression data.
Intra- and Inter-Assay Reproducibility of qPCR Results
Before undertaking analyses of gene expression in numerous tissue specimens with valuable clinical follow-up, extensive quality control experiments were performed as described herein. The qPCR reactions gave the levels of reproducibility illustrated in
The coefficient of variation (CV) was calculated for expression of each gene (standard deviation divided by the mean and expressed as a percent) to identify the relative variability (Table 14). The majority of genes analyzed showed less than 50% CV, which illustrates acceptable levels of relative variability for results from this complex platform [233-235]. The results exhibiting greater CV values generally were from genes with low levels of expression, so that any difference measured created a greater CV value. For the representative specimen shown, an average CV of 42% was determined for each of the 14 genes (Table 14). These analyses, which were repeated in two additional breast specimens with similar results exhibiting average CV results of 55% and 33% across the genes examined (data not presented).
Another level of quality control by undertaken by qPCR analyses of three serial tissue sections of each of six different specimens, each section processed and evaluated independently on different days to ascertain inter-assay variation. RNA from each specimen was analyzed by qPCR as described in Methods and Materials. These data were then evaluated and compared between tissue sections (
The breast carcinoma specimens selected for this critical study were representative of the biopsies received in a typical hospital pathology laboratory. Specifically, tissues exhibiting a broad range of carcinoma to non-carcinoma elements were examined to insure test development was not biased by cellular composition of the specimen (Table 15).
In order to evaluate expression of the 14 genes in the carcinoma subset and 18 genes in the stromal subsets, tissues containing a variety of cell types were selected for LCM (Table 15). The quantity of each cell type within a tissue section (expressed as a percent) was estimated after H & E staining and light microscopy. The average quantity of carcinoma cells present in the tissues evaluated was 61% of the total cells (range of 10-95% carcinoma cells). The average quantity of stromal cells present in the tissues evaluated was 22% of the total cells (range of 5-50% stroma). Expression levels of the genes in the carcinoma subset are predicted to be similar between intact tissue sections and LCM-procured carcinoma cells if the tissue section contained 95% carcinoma. Similarly, if expression of a gene from the stromal gene subset is indeed principally from the stromal cells, its expression level should be greatly enriched by LCM procurement compared to its levels in the intact tissue section.
Specifically, specimen “u” from Table 15, contained 10% carcinoma cells, 50% stromal cells, and 40% fibrous stroma.; specimen “w,” contained 50% carcinoma cells, 5% inflammatory cells, 40% stromal cells, and 5% fibrous stroma.; specimen “y,” contained 30% carcinoma cells, 15% stromal cells, and 55% fibrous stroma; and specimen “ad,” contained 90% carcinoma cells, 5% inflammatory cells, and 5% stromal cells.
To investigate these relationships, gene subsets were analyzed using LCM-procured cell populations. Thirty-three samples of LCM-procured carcinoma cells were obtained for OCR analyses of the carcinoma gene subset, and 23 samples of LCM-procured stromal cells were collected for qPCR analyses of the stromal gene subset. Gene expression levels of the two subsets of the intact tissue sections were compared with those of the LCM-procured cell populations (representative specimens shown in
Results from a representative specimen (
As shown in
As a result of preliminary observations that gene expression levels of intact tissue compared to that of LCM-procured cells were highly variable among specimens with differing cell contents, the following studies were performed. To evaluate differences in gene expression of intact tissue compared to either LCM-procured carcinoma or stromal cells, a wide variety of breast tissue specimens reflecting the clinical reality were evaluated. Welch t-tests were performed comparing relative expression of the 14 gene subset in intact tissue section with that of the LCM-procured carcinoma cells for 33 specimens in three separate qPCR experiments (Table 16). Welch t-tests were also performed comparing the relative expression of the 18 gene subset in intact tissue section with that of the LCM-procured stromal cells for 23 specimens in three separate qPCR experiments (Table 17). The number of specimens exhibiting a significant difference (P<0.05) in relative gene expression between the intact breast tissue section and that of the LCM-procured cells is shown. Fold change was calculated as the expression of the gene in the LCM-procured cells compared to that of the intact tissue, such that a positive fold change indicates greater expression in the LCM-procured cells. The average and ranges of fold change observed in all samples analyzed are presented in Tables 16 and 17 to illustrate the large range of values observed.
Overall, 21% of the breast biopsies exhibited significant differences in expression of the 14 genes in the carcinoma subset when intact tissue was compared to those of LCM-procured carcinoma cells. In contrast, 46% of the breast tissues exhibited significant differences in expression of the 18 genes in the stromal subset when intact tissue was compared to LCM-procured stromal cells. This implies there is a greater requirement for procuring stromal cells by LCM when comparing their gene expression patterns to those of intact tissue sections than for making the same comparison with carcinoma cells. Noting that tissue specimens utilized in this investigation are representative of those from clinical pathology laboratories, these differences in gene expression may be due to a lower content of stromal cells in biopsies removed to diagnose cancer (Table 15). The complexity of examining gene expression profiles in stromal cells is considerably greater than that of carcinoma cells, apparently due to the differences in ratios of total cell volume (size) to nuclear volume (size). In addition, nuclei from breast carcinoma cells are larger compared to those of stromal cells, usually resulting in a greater quantity of RNA per collection. Of the possible explanations, changes in gene expression relationships observed appear to be directly related to the heterogeneous cell composition of a tissue section.
Another interesting observation from the cancer gene subset is that the average fold change of individual gene expression between LCM-procured carcinoma cells and intact tissue was variable (6 positive and 8 negative relationships). In contrast, this relationship was negative for each of the 18 genes in the stromal subset. These surprising results suggest that either the decreased expression of all 18 genes is simply due to their down-regulation in stromal cells surrounding carcinoma cells, or that the 18 “stromal” genes are highly expressed in the other cell types, particularly in the carcinoma cells of an intact tissue section.
In order to address whether changes in the expression patterns observed in the genes of the carcinoma and stromal subsets are directly related to the cell content of the tissue, distributions of fold change in gene expression between LCM-procured cells and intact tissue were evaluated based on percent cell type present in the tissue specimen (Table 15 and
Collectively, these data suggest that expression of ST8SIA1 in carcinoma cells and PLK1 in stromal cells is directly related to the cell type. In addition, expression of TRIM29 (P value=0.07) and IL6ST in carcinoma cells (P value=0.09), as well as PFKP in stromal cells (P value=0.06) approached significance based on t-test analyses (Table 18), suggesting these genes may also be specific to their respective subset. No statistically significant differences were observed in fold changes for a number of genes using this type of analysis (e.g., MELK, MCM6, GATA3 in Table 18), suggesting LCM procurement of specific cell types did not enhance the expression results. However, analyses of other genes in the subsets revealed LCM collection of specific cell types influenced measurements of gene expression, e.g., CENPA, BUB1, YBX1 (Table 18). Gene expression in specific cell types provide a more direct interpretation of their genomic activity in a tissue section, with the exception of tissue sections composed primarily of cells of a single type.
A subset of 14 genes was selected as candidates in carcinoma cells, while a subset of 18 genes was predicted to reflect expression in stromal cells. As described in Table 8, the genes evaluated were derived from 12 molecular signatures from 11 studies. The majority of the reports did not indicate if the individual expression level was elevated or diminished. Furthermore, few reports have been published regarding the expression of genes in specific cell types outlined in this Dissertation (e.g., [41; 57; 70]), nor of comparisons of gene expression in specific cell types with intact tissue. In these investigations, expression of each gene in both the putative cancer and stromal subsets was analyzed by qPCR using 12 individual breast tissue specimens to prepare an intact tissue section, LCM-procured carcinoma cells and LCM-procured stromal cells from each. These 12 tissue specimens were representative of the variety of biopsies observed in the clinical setting. Selective results for these analyses are presented using the same three representative breast cancer biopsies described earlier in
0.03
0.04
Using the biopsy from a 31 year old patient with invasive ductal carcinoma, tissue sections were prepared which contained 95% carcinoma cells and 5% stromal cells. A comparison of relative expression of each gene in the entire 32 gene set was performed using RNA extracted from intact tissue, LCM-procured carcinoma cells, and LCM-procured stromal cells (
0.014
0.047
0.005
0.010
0.005
0.014
0.005
0.048
0.012
0.038
0.047
0.038
0.006
0.005
0.008
0.003
0.034
0.026
0.031
0.007
0.007
0.001
0.027
0.043
0.040
0.022
0.007
0.019
0.025
0.006
0.001
0.000
0.002
0.011
0.001
0.000
0.014
0.004
0.003
0.049
0.001
0.008
0.024
0.000
0.038
0.001
0.002
Interestingly, when expression levels in the 14 cancer gene subset were determined in LCM-procured stromal cells compared to those of intact tissue (
In the final analyses of the cancer gene subset, expression was compared in the LCM-procured populations of carcinoma and stromal cells. Expression of 11 of the 14 genes (EVL, NAT1, ESR1, ST8SIA1, TRIM29, SCUBE2, IL6ST, RABEP1, SLC39A6, TPBG, and TCEAL1) gave a statistically significant difference (P value less than 0.05, Table 19). Each of these genes was over-expressed in the carcinoma cells compared to the stromal cells (3.0, 13.9, 2.7, 3.9, 2.0, 4.4, 34.0, 3.3, 5.7, 4.4, and 3.7-fold, respectively) as predicted.
For the stromal gene subset, expression levels of seven genes (PFKP, ATAD2, XBP1, YBX1, CX3CL1, MAPRE2, and SLC43A3) was statistically different comparing the LCM-procured carcinoma cell population to the intact tissue (
Expression levels of the stromal gene subset was determined in LCM-procured stromal cells compared to those of intact tissue (
In the final analysis of this breast tissue specimen, expression of the 18 stromal gene subset was compared in the LCM-procured populations of carcinoma and stromal cells. Expression levels of 10 of the 18 genes (FUT8, CENPA, PLK1, ATAD2, XBP1, MCM6, GATA3, MAPRE2, CKS2, and SLC43A3) were statistically different (P value less than 0.05, Table 19) in the two cell types. Nine of these genes were over-expressed in the carcinoma cells compared to the stromal cells (5.0, 2.6, 2.4, 3.5, 4.5, 3.7, 36.9, −2.0, 9.0, and 5.1-fold, respectively). This observation indicates that the genes of the stromal gene subset are under-expressed in the stromal cells, which may be of clinical relevance.
Using a biopsy specimen from a 44 year old patient with invasive ductal carcinoma, serial tissue sections were prepared which contained 60% carcinoma cells and 30% stromal cells. A comparison of relative expression of each gene in the entire 32 gene set was performed with RNA from intact tissue, LCM-procured carcinoma cells, and LCM-procured stromal cells (
When expression levels of the 14 cancer gene subset were determined in LCM-procured stromal cells compared to those of intact tissue (
0.011
0.012
0.043
0.037
0.022
0.005
0.034
0.001
0.000
0.000
0.026
0.023
0.004
0.003
0.015
0.034
0.032
0.049
0.006
0.033
0.047
0.020
0.047
0.003
0.001
0.027
0.037
0.000
0.021
0.003
0.000
0.029
0.001
0.045
0.003
0.046
0.000
0.013
0.002
0.028
0.022
0.000
0.029
0.023
0.026
0.000
0.016
0.008
0.011
0.001
In order to determine differences in relative gene expression between intact tissue and LCM-procured cells, t-tests were performed with the results shown. The first 14 genes listed are from the carcinoma subset, while the remaining 18 genes are from the stromal subset. Values shown in bold indicate a P value of less than 0.05, and fold change observed for each gene is also shown. (* indicates expression was undetected)
In the final analyses of the cancer gene subset in this tissue specimen, expression was compared in the LCM-procured populations of carcinoma and stromal cells. Expression levels of 7 of the 14 genes (EVL, TRIM29, SCUBE2, IL6ST, SLC39A6, TPBG, and DSC2) were statistically different (P value less than 0.05, Table 20). Each of these genes was over-expressed in the carcinoma cells compared to the stromal cells (3.4, 30.3, 4.0, 17.7, 5.5, 3.1, and 1.8-fold, respectively) as predicted.
For the 18 stromal gene subset, expression levels of five genes (ATAD2, YBX1, MAPRE2, CKS2, and SLC43A3) was statistically different (3.4, 1.9, −1.9, 4.4, and 1.5-fold, respectively) comparing the LCM-procured carcinoma cell population to the intact tissue (
Interestingly, when expression levels in the stromal gene subset were determined in LCM-procured stromal cells compared to those of intact tissue (
In the final analyses of this tissue specimen, expression of the stomal gene subset was compared in the LCM-procured carcinoma and stromal cell populations. Eleven of the 18 genes (CENPA, MELK, PLK1, ATAD2, MCM6, BUB1, YBX1, MAPRE2, GMPS, CKS2, and SLC43A3) were statistically over-expressed compared to the stromal cells (23.0, 14.0, 13.6, 14.4, 103.5, 161.0, 7.3, 12.9, 135.3, 112.8, and 7.0-fold, respectively, Table 20).
Using a tissue biopsy from a 69 year old patient with invasive ductal carcinoma, tissue sections were prepared which contained 30% carcinoma cells and 30% stromal cells. A comparison of relative expression of entire 32 gene set was performed with RNA from intact tissue, LCM-procured carcinoma cells, and LCM-procured stromal cells (
When expression levels of the 14 cancer gene subset were determined in LCM-procured stromal cells compared to those of intact tissue (
In the final analyses of this 14 gene subset, expression levels were compared in LCM-procured populations of carcinoma and stromal cells. Expression of 5 of the 14 genes (ESR1, TBC1D9, SCUBE2, IL6ST, and TCEAL1) gave a statistically significant difference (Table 21). Each of these genes was over-expressed in the carcinoma cells compared to the stromal cells (2.1, 8.4, 4.7, 3.1, and 2.2-fold, respectively), as predicted.
Focusing on the 18 stromal gene subset (
0.005
0.036
0.027
0.027
0.032
0.031
0.019
0.042
0.044
0.003
0.038
0.025
0.022
0.002
0.000
0.032
0.021
0.046
0.042
0.001
0.035
0.009
0.004
0.011
0.012
0.022
0.040
0.049
0.041
0.005
0.018
0.005
0.022
When expression levels of the stromal gene subset were determined in LCM-procured stromal cells compared to those of intact tissue (
In the final analyses for this tissue specimen, expression of the 18 stromal gene subset was compared in the LCM-procured populations of carcinoma and stromal cells. Expression of 7 of the 18 genes (FUT8, CENPA, PLK1, XBP1, GATA3, CX3CL1, and MAPRE2) were statistically different (Table 21). Both over- and under-expression of these genes was observed in the carcinoma cells compared to the stromal cells (1.9, −2.7, 1.8, 8.0, 73.4, −11.2, and −4.2-fold, respectively).
In order to evaluate and interpret the vast amount of data collected from these representative specimens and the other tissue sections evaluated, a summary of statistical differences in gene expression among intact tissue, LCM-procured carcinoma cells and stromal cells was composed (Table 22 and 23). Gene expression was compared between the intact tissue section and LCM-procured cell populations corresponding to the cancer and stromal gene subsets, and Welch t-tests were used to identify any gene in which expression was significantly different between the groups. Since genes of the two subsets are expressed differently in each patient specimen, as shown in
Genes of the carcinoma subset were expressed at levels that were statistically different between LCM-procured carcinoma cells and intact tissue in 21.4% of the specimens evaluated. Expression of those same 14 genes were also statistically different in the LCM-procured stromal cells compared to intact tissue in 26.5% of the specimens evaluated (Table 22). The average fold change between the two LCM-procured cell populations and the intact tissue section indicated that in general the genes appear to be down-regulated to a greater extent in the stromal cells (average fold change of −21.6 compared to −0.1 in the carcinoma). A few genes of this subset, e.g., TPBG, which was significantly different in only two of the 33 specimens evaluated, and TCEAL1, which was significantly different in only three of the 33 specimens, did not exhibit significant variation comparing carcinoma cells and intact tissue. Expression of ST8SIA1 and TPBG were statistically different in only one of the 14 LCM-procured stromal cell populations compared to the intact tissue.
A similar evaluation was performed directly comparing the expression of genes in each subset in both LCM-procured carcinoma cells and stromal cells (Table 22). Expression of two of 14 genes of the carcinoma subset (GABRP and ST8SIA1) was statistically different in carcinoma cells compared to that of stromal cells, each in only a single tissue specimen. Thus, 12 of the genes in the cancer subset were differentially expressed in the two LCM-procured cell populations of 13 breast carcinoma specimens. The majority of the genes were over-expressed in the carcinoma cells compared to the stromal cells, which would be predicted from the earlier studies from Wittliff and co-workers [41; 57; 70] using LCM-procured carcinoma cells.
The following investigation of LCM-procured stromal cells represents a unique approach that has never been reported. Genes of the stromal subset were statistically different in expression levels observed when comparing LCM-procured carcinoma cells to intact tissue (33.4% of the tissue specimens evaluated). Those 18 genes were also statistically different in the LCM-procured stromal cells and the intact tissue in 45.7% of the specimens (Table 23). The average fold change in gene expression between the two LCM-procured cell populations and intact tissue shows that most of the genes were down-regulated in stromal cells (average fold change of −5.0 compared to −1.2 in the carcinoma). GMPS and GATA3 genes in this stromal subset were expressed similarly in carcinoma cells and intact tissue in 13 specimens. However, many genes of the stromal subset were expressed at levels significantly different in LCM-procured stromal cell populations compared to the intact tissue (Table 23). In order to directly compare expression of the stromal gene subset in the specific cell types, a direct comparison of LCM-procured carcinoma cells and stromal cells was performed (Table 23). Expression of SLC43A3 was statistically different in carcinoma cells compared to stromal cells in only two of 12 patient specimens. However, expression of the other 17 genes was differentially expressed in many tissue specimens. Carcinoma cells appeared to over-express many of the genes identified in the stromal subset.
Clinical Correlations with Gene Expression in Different Cell Types
In general, the genes of both the carcinoma and stromal subsets appear to be over-expressed in the carcinoma cells compared to the stromal cells. However, it should be noted that if under-expression of a gene in either subset is found to be clinically relevant, it is likely that the gene will be under-expressed to a greater extent in the stromal cell population. In order to address the clinical implications of gene expression in the individual cell types, survival analyses (i.e., Cox proportional hazards model) were performed on the expression levels of genes (Tables 24 and 25).
Cox regression survival analyses identified one gene (TBC1D9) whose expression appeared to be related to disease-free survival using univariate analysis (Table 24). In addition, expression levels of TPBG appeared to be related to overall survival. Over-expression of each of these genes was correlated with an increased likelihood of recurrence or death from breast cancer (HR=1.20 and 1.71, respectively. Hazard ratios of greater than 1 indicate an increased likelihood of an event (i.e., breast cancer recurrence or death due to breast cancer). These correlations with survival indicate expression levels of TBC1D9 and TPBG in the carcinoma cells are associated with the clinical outcome of cancer patients.
Investigation of the expression of 32 candidate genes as single variables in LCM-procured stromal cells gave Cox regressions identifying 6 genes (CENPA, MELK, ATAD2, MCM6, YBX1, and GMPS) that appeared to be related to disease-free survival using univariate analysis (Table 25). Over-expression of each of these genes was correlated with an increased likelihood of recurrence (HR=9.47, 16.30, 3.10, 1.92, 4.39, and 2.02, respectively). Expression levels of 5 genes (TBCID9, MCM6, YBX1, GMPS, and CKS2) appeared to be related to overall survival. Over-expression of each of these genes was correlated with an increased likelihood of death due to breast cancer (HR=1.72, 1.77, 3.52, 2.78, and 1.89, respectively). These correlations with overall survival indicate that expression levels of TBC1D9, CENPA, MELK, ATAD2, MCM6, YBX1, GMPS, and CKS2 in the stromal cells were associated with the clinical outcome of cancer patients. Interestingly, over-expression of TBC1D9, a member of a family of proteins known to stimulate the GTPase activity of RAB proteins [191], in either carcinoma cells or surrounding stromal cells appear to be associated with poor survival. Collectively, these results have refined the selection of genes composing molecular signatures for the individual cell types.
0.04
1.20
0.03
1.71
0.05
1.72
0.05
9.47
0.04
16.30
0.02
3.10
0.02
1.92
0.01
1.77
0.02
4.39
0.01
3.52
0.04
2.02
0.01
2.78
0.01
1.89
Comparison of Results Obtained by qPCR and Microarray
Gene expression in the different cell types was investigated by analyses of both gene subsets using the raw microarray data obtained from the previous LCM studies [41; 57; 70; 71]. While LCM is a technique of considerable use in discovery-based studies (e.g., [37; 40]), the goal of this investigation is to establish a clinically relevant gene subset amenable to development of a commercial laboratory test. An analysis of 86 specimens was performed comparing the gene expression results from qPCR results of intact tissue to those in the microarray data obtained from LCM-procured carcinoma cells (
The expression results of several genes from the stromal cell subset also correlated reasonably well between qPCR analyses of intact tissue and those by microarray of the LCM-procured carcinoma cells. This implies that several genes within the “stromal cell subset” may, in fact, be expressed in both carcinoma and stromal cell types (e.g., qPCR analyses of XBP1, GATA3, and CENPA correlated with microarray data with an r2 value of 0.67, 0.54, and 0.51, respectively). These genes may have been filtered informatically during earlier studies by Wittliff and coworkers [41; 57; 70; 71] resulting in molecular signatures based on the hierarchical clustering and gene filtering algorithms employed.
In general, expression of the genes from the cancer cell subset correlated better with the microarray data than the genes from the stromal cell subset as predicted (Table 26). T-tests of expression levels, performed between correlation coefficients from the genes within the two subsets, provided a P value of 0.001, indicating that there is a significant difference in gene expression between the two groups. T-tests also were performed between slopes of the regression analyses in each gene subset and gave a P value of less than 0.05 suggesting that there is a statistically significant difference between expression of the two gene subsets. The six genes which correlated best with the microarray data are listed in
Additional analyses were performed using microarray data obtained in a previous study of LCM-procured carcinoma cells for analysis of larger sample size of 247 breast cancer patients [41; 57; 70; 71]. Since a large number of patients were evaluated in that study, there should be greater statistical significance within the larger sample population. Table 27 shows the results of these univariate Cox regressions of patients for analyses of disease-free and overall survival. Expression of fourteen genes (EVL, NAT1, TBC1D9, SCUBE2, TPBG, TCEAL1, DSC2, MELK, PFKP, PLK1, XBP1, GATA3, MAPRE2, and GMPS) were statistically significant (P value less than 0.05) for disease-free survival. Analyses of overall survival determined that expression levels of 21 genes (EVL, NAT1, ESR1, TBC1D9, SCUBE2, IL6ST, RABEP1, TPBG, TCEAL1, DSC2, FUT8, MELK, PLK1, ATAD2, XBP1, BUB1, GATA3, MAPRE2, GMPS, CKS2, and SLC43A3) were statistically significant (Table 27).
Since the gene expression results discussed in Table 27 were obtained in microarray studies using LCM-procured cancer cells, results illustrating the statistical significance of genes from the “stromal subset” lead to a conclusion that several of these genes (e.g., FUT8, MELK, PFKP, PLK1, ATAD2, XBP1, BUB1, GATA3, MAPRE2, GMPS, CKS2, and SLC43A3) are clinically relevant in the carcinoma cells and are not specific to the surrounding stromal cells.
In general, gene expression levels of the candidate genes appeared to be similar in LCM-procured populations of carcinoma cells compared to those of intact tissue. This is likely due to a number of factors, including the observation that most of the carcinoma specimens utilized in these studies were composed of increased numbers of cancer cells compared to other cells types (Table 15). Each of the specimens examined in these investigations was collected as biopsy tissue for assessing the clinical pathology of the specimen to aid in diagnosis and treatment management. In addition, it is accepted (e.g., [8]) that carcinoma cells exhibit increased replication rates leading to an increase in the amount of mRNA present compared to other cell types. Many breast carcinomas are aneuploid or polyploidy and often exhibit larger nuclear to total cell volume ratios than non-cancerous cells. The observation that there are greater gene expression differences in the stromal cells compared to intact tissue implies a requirement for LCM when studying gene expression in stromal cells. However, once a molecular signature is defined from experiments using individual carcinoma cells, use of the intact tissue section is warranted.
Survival analyses of individual genes of both carcinoma and stromal subsets revealed over-expression of TBC1D9 and TPBG in the carcinoma cells were associated with clinical behavior of breast cancer in that disease-free and overall survival were diminished. It was also discovered that individual expression levels of TBC1D9, CENPA, MELK, ATAD2, MCM6, YBX1, GMPS, and CKS2 in the stromal cells were associated with poor prognosis of breast cancer. These results represent a unique finding in that over-expression of each of these 8 genes in stromal cells was correlated with an increased likelihood of death due to breast cancer. Interestingly, over-expression of TBC1D9 in either carcinoma cells or surrounding stromal cells appears to be associated with poor survival. Surprisingly, expression profiles of individual genes had predictive value although the number of samples should be increased to verify the level of confidence necessary for a single gene test.
In order to test the clinical validity of each of the 32 candidate genes validated by qPCR studies of this investigation, two approaches were undertaken. In the first, each of the 32 candidate genes was evaluated using clinical follow-up and microarray results from LCM-procured carcinoma cell preparations from 247 patient specimens [41; 57; 70; 71]. Examination of the entire 22,000 gene microarray results from carcinoma cells revealed expression levels of twelve genes in the “stromal subset” (e.g., FUT8, MELK, PFKP, PLK1, ATAD2, XBP1, BUB1, GATA3, MAPRE2, GMPS, CKS2, and SLC43A3) were clinically relevant. Thus it appears that expression of these genes is not limited to the stromal cells surrounding the carcinoma cells. Gene expression profiles of stromal cells, in addition to those of carcinoma cells, may be assessed. Hence, a molecular signature containing genes from both cell types elevates the power of prediction of clinical behavior of breast carcinoma.
0.012
0.83
0.003
0.78
0.003
0.89
0.002
0.87
0.025
0.93
0.005
0.87
0.002
0.85
0.040
0.92
0.020
0.90
0.019
0.74
0.018
0.78
0.003
0.77
0.002
0.73
0.040
0.86
0.008
0.80
0.038
1.13
0.001
1.26
0.007
0.76
0.018
1.20
0.004
1.28
0.046
1.18
0.001
1.68
0.038
1.45
0.009
1.30
0.028
0.86
0.008
0.81
0.037
1.24
0.007
0.86
0.002
0.83
0.017
1.26
0.002
1.41
0.015
1.34
0.001
1.57
0.020
1.27
0.019
1.40
P values represent the level of significance of expression for each gene, as a continuous variable. Expression of EVL, NAT1, TBC1D9, SCUBE2, TPBG, TCEAL1, DSC2, MELK, PFKP, PLK1, XBP1, GATA3, MAPRE2, and GMPS appear to be related to disease-free survival using univariate analysis. Expression of EVL, NAT1, ESR1, TBC1D9, SCUBE2, IL6ST, RABEP1, TPBG, TCEAL1, DSC2, FUT8, MELK, PLK1, ATAD2, XBP1, BUB1, GATA3, MAPRE2, GMPS, CKS2, and SLC43A3 appear to be related to overall survival.
Using the IRB-approved Biorepository and Database of the Hormone Receptor Laboratory, de-identified specimens of primary invasive ductal carcinoma were examined. Tissue-based properties (e.g., pathology of the cancer, grade, size, and tumor marker expression) and encoded patient-related characteristics (e.g., age, race, smoking status, menopausal status, stage, and nodal status) were utilized to examine the relationships between gene expression results and clinical parameters. One hundred twenty six tissue specimens from biopsies of invasive ductal carcinoma were selected for investigation as described in Table 28. The length of clinical follow-up and use of primary invasive breast carcinoma, as well as a significant division of patients with recurrent disease and disease-free were taken into consideration when selecting tissue specimens for studies predicting risk of recurrence. Tissue sections from breast cancer biopsies utilized for analyses of gene expression contained a median of about 60% carcinoma cells (range of about 10% to about 95%) and about 25% stromal cells (range of about 5% to about 65%).
Levels of mRNA expression were analyzed, while estrogen and progestin receptor protein levels were determined using either enzyme immunoassay (EIA) or ligand binding assay (LBA) and recorded in the Hormone Receptor Laboratory's Database. Briefly, both methods utilized chilled/frozen specimens that were sliced carefully with a scalpel on a Petri dish chilled on a frozen ice pack to maintain receptor integrity and then homogenized with a mass-to-buffer ratio of 1 g wet weight per 10 ml buffer containing 40 mM Tris-HCl, pH 7.4, containing 1.5 mM EDTA, 10% glycerol, 10 mM sodium molybdate, 10 mM monothioglycerol and 1 mM PMSF [11; 135]. Extracts were prepared by centrifugation at 100,000×g for 30 min. The total protein concentration of the extract is determined with the Bradford method.
A complete ligand binding assay was comprised of duplicates of six increasing concentrations of radiolabeled ligand with and without unlabeled inhibitor [11; 135; 243; 244]. Reactions were incubated overnight (12-18 hours) at 4° C. Unbound ligand was removed by addition of dextran-coated charcoal, incubated for 15 min, and then centrifuged at 3300×g for 15 min at 4° C. Supernatant was removed and radioactivity was detected in a liquid scintillation counter [11; 135; 243; 244]. ER/PR levels, expressed as fmol/mg protein, were recorded using a clinical cutoff value of 10 fmol/mg protein [11; 135; 243; 244].
ER and PR levels were also determined by EIA using a kit formerly distributed by Abbott Laboratories. This protocol utilized beads coated with Anti-ER or Anti-PR monoclonal antibodies, which were incubated with the tissue extracts [11; 135; 245; 246]. Unbound materials were aspirated and washed, before incubation with Anti-receptor antibodies conjugated with horseradish peroxidase. Color was developed and measured with a spectrometer at a wavelength of 492 nm [11; 135; 245; 246]. ER/PR levels, expressed as fmol/mg protein, were recorded using a clinical cutoff value of 15 fmol/mg protein [11; 135; 245; 246].
Kaplan-Meier analyses calculate the fraction of patients without an event (i.e., disease recurrence or death) from the total number of patients in the study over the range of time points [232; 241]. These calculations result in a plot depicting a decreasing step function, where steps occur when an event is recorded [241]. Comparison of survival curves produced from two strata is most commonly carried out using a log-rank test [232; 238]. This test generates a P value testing the null hypothesis that the survival curves are identical in the population as a whole [232].
A Cox proportional hazards model utilizes continuous variables in either univariate or multivariate models and has the added benefit of creating an equation to fit the survival data of a population (i.e., hi(t)=h0(t) eβxi). An advantage of this form of analysis is that a baseline hazard does not need to be known in order to calculate β, which is the coefficient of the variable being examined [238]. The main application of these survival analyses is to stratify patients by outcome and allow for better patient counseling and treatment decisions [242].
Normality tests, expression distribution plots, and Kaplan-Meier plots were performed in GRAPHPAD PRISM® Version 4 (GraphPad Software, La Jolla, Calif.). Pearson correlations, univariate cox regressions, and multivariate cox regressions were performed with SPSS® 17.0 statistical package (SPSS Inc., Chicago, Ill.). Calculations and model development were performed using log2 transformations of relative gene expression data. Five patients that were never disease-free (Table 28) were omitted from Cox regressions of gene expression levels with disease-free survival.
In order to analyze patient survival outcomes with known characteristics of the study population, a percent survival analysis was performed for each category, including race, menopausal status, lymph node involvement, stage of the cancer and tumor grade (
Before gene expression was analyzed for impacting cancer recurrence and survival, known prognostic factors, such as stage, grade and lymph node involvement, were evaluated by Kaplan-Meier survival plots using GRAPHPAD PRISM® software (
Kaplan-Meier analyses were then performed on tumor markers with known importance in breast cancer [20; 22; 24; 94] showing their relationships with disease-free survival (
In order to evaluate the distribution of individual gene expression levels in the biopsies from the patient population, the values were subjected to D'Agostino-Pearson normality tests using GRAPHPAD PRISM® to determine if they were sampled from a Gaussian distribution [232]. Genes with statistically significant P values (less than 0.05) are likely to be expression in a non-Gaussian distribution, while those with larger P values indicate that the gene expression levels were consistent with a Gaussian distribution. Results shown in Table 29 indicate thirteen genes, NAT1, ESR1, GABRP, IL6ST, CENPA, ATAD2, XBP1, MCM6, PTP4A2, LRBA, GATA3, GMPS, and SLC43A3, exhibited distributions consistent with a non-Gaussian population. These genes were then evaluated to determine if their expression exhibited bimodal distributions that identified a clinically relevant cut-off value for survival analyses.
Expression levels and distribution of these thirteen genes from the 32 gene set were analyzed with dot plots [232; 249] using intact tissue sections of 126 invasive ductal carcinomas.
0.013
0.021
0.001
0.016
0.028
0.003
0.022
0.019
0.047
0.033
0.009
0.037
0.001
Gene expression levels were evaluated to determine if they were sampled from a population exhibiting Gaussian distribution using the D'Agostino-Pearson normality test in GRAPHPAD PRISM®. Genes exhibiting statistically significant P values (less than 0.05 and shown in bold) were likely to be from a non-Gaussian distribution, while those with larger P values are consistent with a Gaussian distribution.
Early indications of shared pathways and potential interaction with multiple pathways influencing cancer growth and behavior led us to investigate correlations of expression levels of combinations of genes in the 32 gene set. Previous studies [180; 203] have shown that genes from subsets identified herein (i.e., GATA3 and XBP1) are co-expressed with ESR1, and play an important roles in development of models predicting clinical outcomes. In order to compare expression patterns among genes in the 32 gene set, Pearson correlations, which indicate relationships between gene pairs, were performed with the results shown in Table 30A-30H. Correlation coefficients above zero indicate a positive relationship between the genes of a pair, and a negative coefficient indicates an inverse relationship between gene expression levels (
In order to visualize gene associations, expression levels were graphed to visualize the correlations between gene pairs. Representative correlations of gene expression that were significant from Pearson correlations are shown in
0.62
0.000
0.72
0.000
−0.36
0.000
0.62
0.000
0.75
0.000
−0.44
0.000
0.72
0.000
0.75
0.000
−0.40
0.000
−0.36
0.000
−0.44
0.000
−0.40
0.000
−0.31
0.001
−0.41
0.000
−0.41
0.000
0.65
0.000
0.63
0.000
0.58
0.000
0.65
0.000
−0.39
0.000
−0.29
0.001
−0.26
0.003
−0.37
0.000
0.57
0.000
0.63
0.000
0.75
0.000
0.80
0.000
−0.37
0.000
0.34
0.000
0.37
0.000
0.48
0.000
0.65
0.000
0.62
0.000
0.71
0.000
−0.28
0.002
0.55
0.000
0.61
0.000
0.70
0.000
−0.27
0.003
0.50
0.000
0.60
0.000
0.70
0.000
0.44
0.000
0.61
0.000
0.66
0.000
−0.25
0.004
−0.25
0.006
0.27
0.003
0.57
0.000
0.53
0.000
0.61
0.000
−0.33
0.000
−0.25
0.005
0.24
0.007
−0.35
0.000
−0.24
0.007
0.35
0.000
−0.28
0.001
−0.31
0.001
0.26
0.004
0.63
0.000
0.66
0.000
0.82
0.000
−0.49
0.000
0.45
0.000
0.42
0.000
0.56
0.000
−0.23
0.009
0.34
0.000
0.44
0.000
0.34
0.000
0.43
0.000
−0.35
0.000
0.67
0.000
0.67
0.000
0.83
0.000
−0.44
0.000
−0.33
0.000
−0.32
0.000
0.50
0.000
−0.27
0.003
−0.23
0.010
0.45
0.000
−0.31
0.001
0.63
0.000
−0.29
0.001
0.63
0.000
−0.41
0.000
0.58
0.000
−0.26
0.003
0.75
0.000
−0.41
0.000
0.65
0.000
−0.37
0.000
0.80
0.000
0.65
0.000
−0.39
0.000
0.57
0.000
−0.37
0.000
−0.27
0.003
0.54
0.000
−0.38
0.000
−0.27
0.003
0.57
0.000
0.54
0.000
−0.23
0.010
−0.38
0.000
0.57
0.000
−0.23
0.010
0.80
0.000
0.43
0.000
0.78
0.000
0.63
0.000
−0.24
0.009
0.78
0.000
0.59
0.000
0.61
0.000
0.65
0.000
0.59
0.000
0.61
0.000
0.39
0.000
0.25
0.006
0.48
0.000
−0.28
0.002
0.76
0.000
0.54
0.000
0.23
0.010
−0.32
0.000
0.35
0.000
0.34
0.000
−0.39
0.000
0.31
0.001
0.50
0.000
−0.24
0.008
0.25
0.006
0.32
0.000
−0.46
0.000
0.68
0.000
−0.33
0.000
0.68
0.000
0.40
0.000
0.38
0.000
0.27
0.003
0.31
0.001
−0.25
0.006
0.75
0.000
0.43
0.000
0.36
0.000
0.47
0.000
−0.28
0.002
0.81
0.000
0.37
0.000
−0.42
0.000
0.74
0.000
−0.26
0.004
0.69
0.000
0.56
0.000
0.62
0.000
−0.29
0.001
0.27
0.003
0.41
0.000
0.41
0.000
0.30
0.001
0.35
0.000
0.52
0.000
0.59
0.000
0.34
0.000
0.65
0.000
0.55
0.000
0.50
0.000
0.37
0.000
0.62
0.000
0.61
0.000
0.60
0.000
0.48
0.000
0.71
0.000
0.70
0.000
0.70
0.000
−0.28
0.002
−0.27
0.003
−0.24
0.009
0.80
0.000
0.78
0.000
0.78
0.000
0.61
0.000
0.43
0.000
0.63
0.000
0.59
0.000
0.65
0.000
0.65
0.000
0.68
0.000
0.49
0.000
0.65
0.000
0.78
0.000
0.71
0.000
0.68
0.000
0.78
0.000
0.62
0.000
0.49
0.000
0.71
0.000
0.62
0.000
0.47
0.000
0.67
0.000
0.64
0.000
0.56
0.000
0.45
0.000
0.67
0.000
0.72
0.000
0.69
0.000
0.52
0.000
0.25
0.006
0.50
0.000
0.67
0.000
0.67
0.000
0.57
0.000
0.48
0.000
0.35
0.000
0.40
0.000
0.29
0.001
0.73
0.000
0.70
0.000
0.68
0.000
0.49
0.000
0.79
0.000
0.63
0.000
0.65
0.000
0.40
0.000
0.53
0.000
0.71
0.000
0.70
0.000
0.57
0.000
0.45
0.000
0.41
0.000
0.36
0.000
0.40
0.000
0.41
0.000
0.27
0.003
0.35
0.000
0.44
0.000
−0.25
0.004
0.57
0.000
0.61
0.000
0.53
0.000
−0.25
0.005
0.66
0.000
−0.25
0.006
0.61
0.000
0.27
0.003
−0.33
0.000
0.24
0.007
0.39
0.000
−0.28
0.002
0.59
0.000
0.25
0.006
0.76
0.000
0.48
0.000
0.23
0.010
0.61
0.000
0.54
0.000
−0.32
0.000
0.47
0.000
0.45
0.000
0.67
0.000
0.67
0.000
0.72
0.000
0.64
0.000
0.69
0.000
0.56
0.000
0.52
0.000
0.58
0.000
0.25
0.005
0.58
0.000
0.25
0.005
0.31
0.000
0.73
0.000
0.31
0.000
0.68
0.000
0.56
0.000
0.60
0.000
0.66
0.000
0.47
0.000
0.54
0.000
0.58
0.000
0.40
0.000
0.28
0.002
0.77
0.000
0.49
0.000
0.23
0.010
0.76
0.000
0.43
0.000
0.23
0.010
0.58
0.000
0.33
0.000
0.33
0.000
0.65
0.000
0.62
0.000
0.72
0.000
−0.25
0.005
0.27
0.002
0.30
0.001
0.55
0.000
0.42
0.000
0.34
0.000
0.41
0.000
0.39
0.000
0.52
0.000
0.39
0.000
0.26
0.004
0.66
0.000
0.49
0.000
0.37
0.000
−0.35
0.000
−0.28
0.001
−0.24
0.007
−0.31
0.001
0.35
0.000
0.26
0.004
0.35
0.000
0.31
0.001
0.25
0.006
0.34
0.000
0.50
0.000
0.32
0.000
−0.39
0.000
−0.24
0.008
0.25
0.006
0.31
0.000
0.73
0.000
0.31
0.000
0.68
0.000
0.56
0.000
0.49
0.000
0.70
0.000
0.54
0.000
0.49
0.000
0.39
0.000
0.25
0.006
0.70
0.000
0.39
0.000
0.58
0.000
0.54
0.000
0.25
0.006
0.58
0.000
−0.30
0.001
−0.39
0.000
0.59
0.000
0.35
0.000
0.62
0.000
0.43
0.000
0.73
0.000
0.37
0.000
0.75
0.000
0.55
0.000
0.25
0.004
0.57
0.000
0.36
0.000
0.60
0.000
0.49
0.000
0.30
0.001
0.33
0.000
0.46
0.000
0.33
0.000
0.33
0.000
0.43
0.000
0.59
0.000
0.42
0.000
0.55
0.000
0.44
0.000
0.54
0.000
0.46
0.000
0.37
0.000
0.51
0.000
0.49
0.000
0.54
0.000
0.29
0.001
0.63
0.000
0.45
0.000
0.66
0.000
0.42
0.000
0.82
0.000
0.56
0.000
−0.49
0.000
−0.46
0.000
0.27
0.003
0.68
0.000
0.40
0.000
0.75
0.000
−0.33
0.000
0.38
0.000
0.31
0.001
0.68
0.000
−0.25
0.006
0.43
0.000
0.50
0.000
0.48
0.000
0.29
0.001
0.73
0.000
0.67
0.000
0.35
0.000
0.70
0.000
0.67
0.000
0.40
0.000
0.68
0.000
0.57
0.000
0.49
0.000
0.60
0.000
0.49
0.000
0.47
0.000
0.40
0.000
0.23
0.010
0.66
0.000
0.54
0.000
0.28
0.002
0.76
0.000
0.58
0.000
0.77
0.000
−0.30
0.001
0.59
0.000
0.73
0.000
−0.39
0.000
0.35
0.000
0.37
0.000
0.62
0.000
0.75
0.000
0.25
0.004
0.43
0.000
0.55
0.000
0.62
0.000
0.81
0.000
0.60
0.000
0.81
0.000
0.39
0.000
0.62
0.000
0.60
0.000
0.39
0.000
0.71
0.000
0.71
0.000
0.36
0.000
0.48
0.000
0.53
0.000
0.31
0.000
0.72
0.000
0.84
0.000
0.64
0.000
−0.26
0.003
0.58
0.000
0.51
0.000
0.43
0.000
0.85
0.000
0.71
0.000
0.48
0.000
0.65
0.000
0.71
0.000
0.37
0.000
0.61
0.000
0.59
0.000
0.24
0.008
0.44
0.000
0.67
0.000
−0.23
0.009
0.34
0.000
0.67
0.000
−0.33
0.000
0.43
0.000
0.83
0.000
−0.32
0.000
0.34
0.000
−0.35
0.000
−0.44
0.000
0.50
0.000
0.36
0.000
−0.42
0.000
0.56
0.000
0.81
0.000
0.74
0.000
0.47
0.000
−0.26
0.004
0.62
0.000
−0.28
0.002
0.37
0.000
0.69
0.000
−0.29
0.001
0.79
0.000
0.53
0.000
0.63
0.000
0.71
0.000
0.65
0.000
0.70
0.000
0.40
0.000
0.57
0.000
0.33
0.000
0.62
0.000
−0.25
0.005
0.43
0.000
0.33
0.000
0.27
0.002
0.23
0.010
0.65
0.000
0.72
0.000
0.58
0.000
0.57
0.000
0.33
0.000
0.36
0.000
0.46
0.000
0.60
0.000
0.30
0.001
0.33
0.000
0.49
0.000
0.48
0.000
0.84
0.000
−0.26
0.003
0.71
0.000
0.53
0.000
0.71
0.000
0.31
0.000
0.36
0.000
0.72
0.000
0.64
0.000
0.47
0.000
0.54
0.000
0.54
0.000
0.47
0.000
0.56
0.000
0.42
0.000
0.36
0.000
0.69
0.000
0.44
0.000
0.60
0.000
0.70
0.000
0.58
0.000
−0.27
0.003
−0.23
0.010
0.45
0.000
0.27
0.003
0.52
0.000
0.41
0.000
0.30
0.001
0.41
0.000
0.35
0.000
0.59
0.000
0.45
0.000
0.41
0.000
0.41
0.000
0.27
0.003
0.36
0.000
0.35
0.000
0.40
0.000
0.30
0.001
0.55
0.000
0.41
0.000
0.39
0.000
0.49
0.000
0.42
0.000
0.39
0.000
0.26
0.004
0.34
0.000
0.52
0.000
0.66
0.000
0.37
0.000
0.33
0.000
0.59
0.000
0.54
0.000
0.51
0.000
0.42
0.000
0.49
0.000
0.43
0.000
0.55
0.000
0.46
0.000
0.54
0.000
0.44
0.000
0.37
0.000
0.29
0.001
0.58
0.000
0.85
0.000
0.65
0.000
0.61
0.000
0.51
0.000
0.71
0.000
0.71
0.000
0.59
0.000
0.43
0.000
0.48
0.000
0.37
0.000
0.24
0.008
0.56
0.000
0.69
0.000
0.60
0.000
0.70
0.000
0.42
0.000
0.44
0.000
0.36
0.000
0.58
0.000
0.49
0.000
0.43
0.000
0.55
0.000
0.49
0.000
0.56
0.000
0.60
0.000
0.43
0.000
0.56
0.000
0.34
0.000
0.55
0.000
0.60
0.000
0.34
0.000
Current clinical tests for ER and PR are based upon measurements of the protein in a tissue biopsy (e.g., [10; 11; 91; 135]). To assess the utility of mRNA measurements and their relationship to ER (
The log2 expression levels for ER and PR were then plotted for linear regression analyses (
Relationships of Gene Expression Levels with Clinical Characteristics
The expression of each candidate gene was analyzed for associations with the characteristics of each of 126 patients, such as race, menopausal status, family history of breast cancer, stage of disease, tumor grade, nodal involvement, ER status, and PR status with the use of SPSS software (Table 31). Analysis of race, menopausal status, family history, nodal status, ER status and PR status were performed using a two-tailed t-test (equal variances not assumed), while stage and grade were analyzed by ANOVA. Expression of genes outlined in Table 31 exhibited P values less than 0.05 when correlated with the characteristic indicated. Since t-tests do not provide information as to the levels of expression for each gene analyzed in the different groups, log2 (relative gene expression) was graphed as box and whisker plots in GRAPHPAD PRISM®.
Analyses of race, menopausal status, family history, nodal status, ER status and PR status were performed using a two-tailed t-test, while stage and grade were analyzed by ANOVA. The expression levels of genes listed exhibited P values less than 0.05.
Gene expression differences in pre-menopausal (n=30) and post-menopausal (n=51) breast cancer patients are shown in
Differences in gene expression levels for cancer patients who were tobacco smokers (n=27) and whose who were non-smokers (n=54) are shown in
Gene expression as a function of different tumor grades are shown in
Differences in gene expression levels are shown in
Similar analyses were performed comparing gene expression levels in PR negative (n=43) and PR positive (n=83) patients (
Correlation of Expression Levels of Individual Genes with Clinical Outcome
In a preliminary correlation of gene expression with patient outcome, t-tests were preformed comparing expression levels in patients exhibiting breast cancer recurrence with patients that remained disease-free (Table 32). In addition correlations of gene expression were made with patients that did not die from their breast cancer with those that died of breast cancer (Table 33). Analyses of gene expression levels with patient recurrence identified two genes (ATAD2 and CX3CL1) with P values less than 0.1. Both ATAD2 and CX3CL1 exhibited a lower level of expression in patients, who remained disease-free compared to those that had recurrences (Table 32). Similar analyses of gene expression levels with patient survival also identified two genes (PLK1 and CX3CL1) with P values less than 0.1. Both PLK1 and CX3CL1 exhibited a lower level of expression in patients who did not die of breast cancer compared to those that died of their cancer (Table 33). This observation is contradictory to another study [207] in prostate cancer, which expression of CX3CL1 was associated with good patient prognosis. While the P values in these evaluations are not statistically significant in this most basic form of survival analyses, it greatly suggests that these genes may prove useful for predicting disease recurrence and survival using more sophisticated methods. Expression of each gene was evaluated by Kaplan-Meier survival analyses using expression above and below median relative expression values to stratify patients (
Of the 32 genes evaluated individually in the gene subsets, only SCUBE2 exhibited a median expression level that significantly stratified 126 patients into good and poor prognosis groups for disease recurrence (P value of less than 0.05, Table 34). A hazard ratio of 1.8 was calculated for SCUBE2 expression between the prognosis groups, indicating that the poor prognosis group had a 1.8-fold greater chance of having a recurrence of their breast cancer compared to the good prognosis group. Although most of the individual genes tested did not show statistically significant correlations with recurrence and survival, many appear to indicate trends which separated patients into prognostic groups. Expression of six additional genes (GABRP, TBC1D9, SLC39A6, MELK, MCM6, and PTP4A2) appears to be associated with either disease-free or overall survival (P value less than 0.10). The hazard ratios for each gene are shown (Table 34). It should be noted that these hazard ratios are representative of the patient population only after the gene is determined to be statistically significant. Expression of several of these genes approached significance (GABRP, TBC1D9, SLC39A6, MCM6, and PTP4A2) with hazard ratios above 1, indicating that elevated expression of the gene is related to decreased patient survival. However, elevated expression of MELK was correlated with increased disease-free and overall survival. Representative Kaplan-Meier plots of patients with disease-free and overall survival as a function of expression of single genes (GABRP, SCUBE2, SLC39A6, and MELK) are shown in
From evaluations of various patient and cancer features (Table 31), genes that were differentially expressed related to a particular characteristic were evaluated in the two populations. Two genes (GABRP and CENPA) had differential expression when comparing patients with lymph node positive or negative cancers were analyzed for patient survival. The relationship of GABRP expression with patient disease-free and overall survival is shown for all patients (
Similar analyses were performed for the genes altered in patients with different tumor grades (FIGS. 27-29-44).
0.043
1.80
Patients were stratified by median gene expression values, and Kaplan-Meier analyses were performed. P values indicating that either high or low expression of an individual gene was related to survival outcomes of breast cancer patients are shown with the hazard ratios. Values shown in bold indicate a statistically significant difference (P value less than 0.05) was observed in patient survival between the strata.
Although 25 of the 32 gene set were associated with ER expression (Table 31), Kaplan-Meier analyses are shown for ESR1, SCUBE2, RABEP1, SLC39A6, TCEAL1, and XBP1 in ER positive or ER negative patients (
However, a surprising result was observed when only the ER (protein) negative population of patients was analyzed by Kaplan-Meier plots as a function of ESR1 gene expression (
Analyses of SCUBE2 gene expression in ER negative or ER positive patients were performed (
Analyses of TCEAL1 was interesting, because as indicated in the entire population (
Similar analyses of gene expression in PR negative or PR positive patients were performed for 21 genes differentially expressed between those patient cohorts (Table 31).
Expression of PTP4A2 was also analyzed based on a patient's PR status (
Survival Analyses of Genes Determined to have Bimodal Distributions
Since results presented in
Analyses of Continuous Survival Data with Univariate Cox Proportional Hazards Model
Cox proportional hazards models using SPSS® software were performed because this modeling approach allows use of continuous gene expression variables, without the requirement of group separation (e.g., above median, below median) for analysis [236-239; 249]. A simple proportional hazards model utilizes the following equation:
h[t(x)]=ho(t)exp [βx]
in which “h[t(x)]” is the hazard rate for an individual with co-variate (i.e., gene expression level) “x,” “ho(t)” is the baseline hazard rate, and “exp(β)” is the hazard ratio [249]. P values are then calculated to determine if the observed hazard ratio is not due to chance.
When investigating the 32 genes as single variables, this method yielded 5 genes (TBC1D9, RABEP1, SLC39A6, FUT8, and PTP4A2) with P values less than 0.05 when analyzed for disease-free survival (Table 36). Over-expression of each of these genes was correlated with a decreased likelihood of breast cancer recurrence (HR=0.90, 0.80, 0.85, 0.78, and 0.81, respectively). Expression of RABEP1, SLC39A6, FUT8, and PTP4A2 appears to be related to overall survival using this univariate analysis (P value less than 0.05, Table 37). Over-expression of RABEP1, SLC39A6, FUT8, and PTP4A2 were correlated with a decreased likelihood of death from breast cancer (HR=0.81, 0.87, 0.82, and 0.81, respectively). Thus over-expression of these genes individually forms the basis of a molecular signature predicting decreased risk of recurrence and death due to breast cancer. The ultimate goal of these collective studies is to develop clinically relevant, commercially available tests that may be used in hospital laboratories to aid in breast cancer management.
Analyses of Survival Data with Multivariate Cox Proportional Hazards Model
In order to elucidate a clinically relevant multi-gene signature from the gene expression data obtained, SPSS® 17.0 software was utilized. By importing relative gene expression data, the software performs a multivariate Cox proportional hazards model for particular time to event variable (i.e., time until breast cancer recurrence or time until death due to breast cancer). The proportional hazards model utilizes the following equation:
h[t(x)]=ho(t)exp [β1x1+β2x2+ . . . +βnxn]
in which “h[t(x)]” is the hazard rate for an individual with co-variates (in this case, gene expression level) “x,” “ho(t)” is the baseline hazard rate, and “exp(β)” is the hazard ratio [249]. P values are then calculated to determine if the observed hazard ratio is not due to chance. This algorithm can then be used to predict that particular characteristic in additional samples based on their relative gene expression data.
0.015
0.90
0.009
0.80
0.002
0.85
0.003
0.78
0.039
0.81
0.012
0.81
0.011
0.87
0.020
0.82
0.029
0.81
SSPS® uses two basic modes of model selection for proportional hazards: forward stepwise selection and backwards stepwise selection. The purpose for both methods of model selection is similar, in that unimportant covariates (i.e., genes) are discarded and ones with a meaningful effect remain in the equation. The forward selection algorithm initially fits all possible linear models of the response with each individual covariate [249]. It selects the covariate with the lowest P value and includes it in the subsequent steps. In the second step it fits all possible models with the covariate from the first step plus one of each of the remaining covariates. It selects the new covariate that has the lowest P value and includes it in the subsequent steps. This is repeated until none of the remaining covariates has a P values less than 0.05. The backwards stepwise selection algorithm begins with all the variables and eliminates the covariate with the least significance in each step [249]. The data are then refitted with the remaining variables, and the process is repeated until all remaining covariates in the 1.0 equation have a P value below 0.1.
In order for unbiased internal validation of models, a Training Set population was used for model development, and a separate Test Set (patients not used for model development) was utilized for validation [242]. Using the log2 expression data from each of the 32 genes analyzed in intact tissue sections, the patient specimens were randomly placed into Training and Test Sets at a ratio of approximately 67% (80 patients) to 33% (41 patients), respectively. Using the Training Set data to predict disease recurrence, both forward stepwise selection (data not shown) and backwards stepwise selection (Table 38) were performed. Values of β represent the log relative risk, and P values represent the level of significance of expression for each gene, as a continuous variable. Expression levels of ESR1, GABRP, ST8SIA1, TBC1D9, SCUBE2, RABEP1, SLC39A6, TPBG, TCEAL1, BUB1, PTP4A2, LRBA, CX3CL1, MAPRE2, GMPS, CKS2, and SLC43A3 were utilized in this model of disease-free survival. Using the proportional hazards model, the following equation was developed for disease-free survival:
h[t(x)]=ho(t)EXP((0.255*xESR1)+(−0.483*xGABRP)+(0.792*xST8SIA1)+(−0.34*xTBC1D9)+(0.494*xSCUBE2)+(−0.745*xRABEP1)+(−0.376*xSLC39A6)+(−0.476*xTPBG)+(0.378*xTCEAL1)+(0.528*xBUB1)+(−0.716*xPTP4A2)+(0.587*xLRBA)+(0.387*xCX3CL1)+(−0.365*xMAPRE2)+(−0.598*xGMPS)+(0.823*xCKS2)+(0.487*xSLC43A3)).
Hazard rates were calculated for each patient specimen in the Training Set, and patients were stratified by thirds into low, intermediate, and high risk populations (as suggested by Paik et al. [76] and Sparano and Paik [93]) and analyzed by Kaplan-Meier plots for disease-free and overall survival which gave P values less than 0.001 for each relationship (
Hazard rates were calculated for each patient specimen in the Test Set, and patients were stratified by thirds into low, intermediate, and high risk populations and analyzed by Kaplan-Meier plots for disease-free and overall survival that gave P values of 0.369 and 0.617, respectively (
Multivariate Cox models were designed to predict disease-free survival in an 80 patient training set population using backwards stepwise selection. Values of β represent the log relative risk, and P values represent the level of significance of expression for each gene, as a continuous variable. Expression levels of ESR1, GABRP, ST8SIA1, TBC1D9, SCUBE2, RABEP1, SLC39A6, TPBG, TCEAL1, BUB1, PTP4A2, LRBA, CX3CL1, MAPRE2, GMPS, CKS2, and SLC43A3 were utilized in this model of disease-free survival.
Using the Training Set (83 patients) data to predict overall survival, both forward stepwise selection (data not shown) and backwards stepwise selection (Table 39) were performed. Values of β represent the log relative risk, and P values represent the level of significance of expression for each gene, as a continuous variable. Expression levels of TRIM29, SCUBE2, SLC39A6, PTP4A2, LRBA, CX3CL1, and CKS2 were utilized in this model of overall survival. Using the proportional hazards model, the following equation was developed for disease-free survival:
h[t(x)]=ho(t)EXP((−0.224*xTRIM29)+(0.205*xSCUBE2)+(−0.353*xSLC39A6)+(−0.557*xPTP4A2)+(0.312*xLRBA)+(0.378*xCX3CL1)+(0.437*xCKS2)).
Hazard rates were calculated for each patient specimen in the Training Set, and patients were stratified by thirds into low, intermediate, and high risk populations and analyzed by Kaplan-Meier plots for disease-free and overall survival which gave P values less than 0.001 (
Hazard rates were calculated for each patient specimen in the Test Set, and patients were stratified by thirds into low, intermediate, and high risk populations and analyzed by Kaplan-Meier plots for disease-free and overall survival giving P values of 0.252 and 0.717, respectively (
Multivariate Cox models were designed to predict overall survival in an 83 patient training set population using backwards stepwise selection. Values of β represent the log relative risk, and P values represent the level of significance of expression for each gene, as a continuous variable. Expression levels of TRIM29, SCUBE2, SLC39A6, PTP4A2, LRBA, CX3CL1, and CKS2 were utilized in this model of overall survival.
Multivariate Models Developed from the Entire Population
In order to improve accuracy of the multivariate models predicting recurrence and survival, expression levels from the entire population (121 patients) were used (Table 40). Of the 32 genes, expression levels of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA, and SLC43A3 were utilized in this model of disease-free survival using backwards stepwise selection. Interestingly, these genes, with the exception of ATAD2, were also in the model developed from the Training Set population.
The following equation was developed for disease-free survival of the entire patient population:
h[t(x)]=ho(t)EXP((0.147*xESR1)+(−0.119*xGABRP)+(−0.537*xRABEP1)+(−0.373*xSLC49A6)+(0.462*xTCEAL1)+(0.445*xATAD2)+(−0.437*xPTP4A2)+(0.296*xLRBA)+(0.429*xSLC43A3)).
Hazard rates were calculated for each specimen, and patients were stratified by thirds into low, intermediate, and high risk populations and analyzed by Kaplan-Meier plots for disease-free and overall survival which gave P values less than 0.001 (
Receiver operating characteristic (ROC) curves (
Multivariate Cox models were designed to predict disease-free survival in the entire 121 patient cohort using backwards stepwise selection. Values of f3 represent the log relative risk, and P values represent the level of significance of expression for each gene, as a continuous variable. Expression levels of ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA, and SLC43A3 were utilized in this model of disease-free survival.
A multivariate Cox model was designed to predict overall survival in the entire 126 patient cohort using backwards stepwise selection (Table 41). Of the 32 genes, expression levels of GABRP, TRIM29, RABEP1, SLC39A6, TCEAL1, PLK1, and CX3CL1 were utilized in this model of overall survival. The following equation was developed for overall survival of the entire patient population:
h[t(x)]=ho(t)EXP((−0.121*xGABR1))+(−0.112*xTRIM29)+(−0.445*xRABEP1)+(−0.173*xSLC39A6)+(0.436*xTCEAL1)+(0.501*xPLK1)+(0.26*xCX3CL1)).
Hazard rates were calculated for each specimen, and patients were stratified by thirds into low, intermediate, and high risk populations and analyzed by Kaplan-Meier plots for disease-free and overall survival giving P values less than 0.001 (
ROC curves (
Additional patient characteristics (e.g., menopausal status, race, family history, tumor grade, stage of disease, lymph node status, ER/PR status) were converted to numerical values and utilized in multivariate Cox proportional hazards model [237]. This manipulation allowed the Cox proportional hazards model to incorporate all available information, both standard prognostic factors and gene expression combined, to most accurately predict a patient's clinical outcome. However, the backwards stepwise selection eliminated the requirement for including any of the above mentioned characteristics prior to the final model, indicating that these features of the patient and their breast cancer were unnecessary for predicting recurrence and survival when the 9 gene signature was employed. Thus, the 9 gene signature, derived from a broad spectrum of invasive ductal carcinomas, predicted risk of recurrence as an independent prognostic test.
After qPCR validation of the 32 gene set and their examination in LCM-procured carcinoma and stromal cells, as well as intact tissue, a total of 126 breast carcinoma specimens were evaluated for each gene by qPCR. To ensure that the sample population was representative of breast carcinoma in general, patient survival was examined as a function of known prognostic factors. The survival outcomes determined gave expected results, with the exception of nodal involvement, which was less significant than expected. This appears to be due to the selection of patients necessary for completion of the project described in Appendix I, which included equal numbers of patients with and without disease recurrence in lymph node negative and positive cancers.
Distribution of individual gene expression levels in the 126 breast cancers was examined. Those of thirteen genes (NAT1, ESR1, GABRP, IL6ST, CENPA, ATAD2, XBP1, MCM6, PTP4A2, LRBA, GATA3, GMPS, and SLC43A3) were indicative of non-Gaussian populations, which were investigated for bimodal distributions of expression. Seven of these genes appeared to have bimodal distribution, but the bimodality was insignificant in survival analyses.
Expression levels of several genes appeared to be highly correlated with other genes in the 32 gene seta Seven genes (NAT1, ESR1, SCUBE2, FUT8, PTP4A2, LRBA, and MAPRE2) had expression levels related to more than 20 of the other genes within the 32 gene set. In addition, expression levels of estrogen and progestin receptor mRNA were highly correlated with ER and PR protein levels of these known tumor markers using Pearson correlations and linear regressions.
Genes were analyzed association with known clinical characteristics, including race, menopausal status, family history, nodal status, ER, and PR status, prior to correlation of expression levels with clinical outcome (i.e., disease-free and overall survival). Genes were stratified by median expression level and subjected to Kaplan-Meier survival analyses. SCUBE2 exhibited a median expression level that significantly stratified patients into good and poor prognosis groups for DFS, while six additional genes (GABRP, TBC1D9, SLC39A6, MELK, MCM6, and PTP4A2) appeared to associate with DFS or OS (P value less than 0.10). Genes determined to be differentially expressed for a particular patient or cancer characteristic were evaluated in specific populations. Several genes (GABRP for nodal status; NAT1, CENPA, and BUB1 for tumor grade; ESR1, SCUBE2, RABEP1, SLC39A6, TCEAL1, and XBP1 for ER status; SLC39A6 and PTP4A2 for PR status) appear to distinguish between good and poor prognosis groups in specific patient populations better than the entire population.
Expression of 5 genes (TBC1D9, RABEP1, SLC39A6, FUT8, and PTP4A2) correlated independently with disease-free survival using univariate Cox Regression analyses (P less than 0.05). Expression of 4 genes (RABEP1, SLC39A6, FUT8, and PTP4A2) appeared to be related to overall survival using univariate analysis (P less than 0.05). Surprisingly, expression profiles of individual genes had predictive value although the level of confidence does not warrant their use in a single gene test.
Multivariate Cox proportional hazards models of DFS and OS were initially performed in a Training Set patient population and tested in a separate Test Set population using backwards stepwise selection. The DFS multivariate model predicted survival in the Test Set population (P values=0.16 for DFS and 0.36 for OS), and the OS model predicted survival in the Test Set population (P value=0.10 for DFS and 0.62 for OS).
Multivariate Cox proportional hazards models were performed with backwards stepwise selection in the entire population to predict disease-free survival using expression levels of 9 genes (ESR1, GABRP, RABEP1, SLC39A6, TCEAL1, ATAD2, PTP4A2, LRBA, and SLC43A3). ROC curves were composed to illustrate the sensitivity and specificity of the model for disease-free and overall survival with areas under the curves equal to 0.78 and 0.76, respectively. Although internal validation using Training and Test Sets is essential for model development, it is not a replacement for actual external validation using an independent patient population [242].
Small, biologically significant and clinically relevant gene sets that can be developed as a commercial test for assessing risk of breast cancer recurrence are described herein. These gene sets can be evaluated on a flow-thru chip (TIPCHIP™) for use in the ZIPLEX® Automated Workstation (Xceed Molecular Corp.), which allows for analyses in a clinical laboratory avoiding the necessity for a “send-out test.” Prediction of risk of recurrence of breast cancer at the time of surgical removal of the primary lesion, will facilitate improved treatment planning and disease surveillance resulting in improved care for these patients.
Genes were selected for subsequent analyses based on occurrence in multiple signatures. Utilizing studies examining pure carcinoma cell populations procured by LCM (e.g., [41; 57; 70; 71]), 14 candidate carcinoma-associated genes were selected. Studies from intact tissue sections (e.g., [47; 48; 54; 55; 62-65; 67]) provided an additional subset of 18 candidate genes with differential expression inferred in stromal cells with clinical relevance.
Using an IRB-approved study, frozen sections from de-identified specimens (Tables 42 and 43) from patients diagnosed with invasive ductal or lobular carcinoma were utilized [37; 38]. H & E staining was performed as described [37; 38; 41], and procedures were conducted under RNase-free conditions.
RNA Extraction, Purification and qPCR Analysis
Total RNA was extracted from frozen tissue sections [37; 38] with the RNEASY® Mini Kit (Qiagen Inc., Valencia, Calif.). Integrity of RNA was analyzed with the Bioanalyzer 2100 (Agilent Technologies, Palo Alto, Calif.). Total RNA was reverse transcribed in 50 mM Tris-HCl buffer containing 37.5 mM KCl, 1.5 mM MgCl2, 10 mM DTT, 0.5 mM dNTPs (Invitrogen, Carlsbad, Calif.), 20 u RNASIN® (Promega, Madison, Wis.), 200 u SUPERSCRIPT RT III® (Invitrogen) and 5 ng of T7 primers or 166 ng of random hexamers.
RNA quantification and analyses were performed using triplicate cDNA preparations with qPCR in duplicate wells using the ABI PRISM® 7900HT (Applied Biosystems, Foster City, Calif.) with POWER SYBR® Green (Applied Biosystems) for detection. Universal Human Reference RNA (Stratagene, La Jolla, Calif.) was reverse transcribed and amplified along with test samples as both a positive control and as standards for quantification of RNA using β-actin as a reference gene, and relative gene expression was calculated using the ΔΔCt method.
Total RNA samples were analyzed for quality with the Agilent BIOANALYZER™, amplified and biotin-labeled by oligo-dT primed in vitro transcription. TipChip microarrays, samples, and reagents were loaded into specific microplate wells, and then hybridization, washing, chemiluminescent imaging and data reduction were performed automatically on the ZIPLEX® Automated Workstation.
The ZIPLEX® manifold picks up the TipChips and lowers them into specific wells where solutions are repeatedly aspirated and dispensed through the chips. Up to eight TipChips were hybridized and analyzed simultaneously in less than three hours. Tables of mean intensities and coefficients of variation of triplicate spots for each probe were output by the instrument and analyzed on an external computer.
Multivariate analyses were performed using PARTEK GENOMICS SUITE™, including K-nearest neighbor, shrinking centroid, and discriminant analysis to determine the best fit model for predicting breast cancer recurrence in a training set of each sample population. The best fit models were then applied to the remaining samples (test set). Kaplan-Meier regression analyses were performed using PARTEK GENOMICS SUITE™ and GRAPHPAD PRISM™.
Clinical Correlations of Gene Expression Results Obtained by qPCR
Kaplan-Meier survival curves (
Cox regression survival analyses (Table 44) on expression of individual genes measured by qPCR. P values represent the level of significance of expression for each gene, as a continuous variable. Expression of 4 genes (B=FUT8, D=MCM6, L=GATA3, and C=TPBG) appear to be related to disease-free survival using univariate analysis. Over-expression of each of these genes was correlated with a decreased likelihood of recurrence (HR=0.79, 0.81, 0.89, and 0.80, respectively).
In order to predict breast cancer recurrence and survival, a multivariate model was developed using gene combinations from expression levels of the 32 gene set measured by qPCR. The multivariate model for disease-free survival (
P values represent the level of significance of expression for each gene, as a continuous variable. Expression of 4 genes (B=FUT8, D=MCM6, L=GATA3, and C=TPBG) appear to be related to disease-free survival using univariate analysis. Over-expression of each of these genes was correlated with a decreased likelihood of recurrence (HR=0.79, 0.81, 0.89, and 0.80, respectively).
Comparisons of Expression Results Obtained from qPCR and ZIPLEX®
Gene expression results obtained from qPCR or the ZIPLEX® Automated Workstation were correlated.
Kaplan-Meier survival curves (
Cox regression analyses (Table 45) were then performed on expression levels of individual genes measured by the ZIPLEX® Automated Workstation. Expression levels detected by probes of four different genes (S=DSC2, N=PFKP probes 1 and 2, K=MELK, and AE=SLC43A3) appear to be related to disease-free survival using univariate analysis. Over-expression of each of these probes was correlated with an increased likelihood of recurrence (HR=1.27, 1.23, 1.24, 1.27, and 1.49, respectively).
Probability of breast cancer recurrence and survival based on a model developed using gene combinations from the 32 gene set measured by the ZIPLEX® Automated Workstation (
The poor prognosis group had a 2.4-fold greater likelihood of breast cancer recurrence than the good prognosis group using this multivariate model based on gene expression levels determined by the ZIPLEX® platform.
P values represent the level of significance of expression for each gene, as a continuous variable. Expression of probes from 4 different genes (S=DSC2, N=PFKP probes 1 and 2, K=MELK, and AE=SLC43A3) appear to be related to disease-free survival using univariate analysis. Over-expression of each of these probes was correlated with an increased likelihood of recurrence (HR=1.27, 1.23, 1.24, 1.27, and 1.49, respectively).
A custom designed a “flow-thru” chip (TIPCHIPip™) was created containing each of the 32 genes supra, as well as other genes identified in an independent study described in Patent Cooperation Treaty Application No: PCT/US2009/060506 (WO 2010/045234). Two independent molecular signatures were shown to be related to the clinical behavior of human breast cancer. One of these based upon the gene subset described in this dissertation predicts risk of breast cancer recurrence regardless of estrogen receptor status and nodal involvement.
While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 61/224,115, filed on Jul. 9, 2009, the entire teachings of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61224115 | Jul 2009 | US |