A Biomarker is any indicia of the level of expression of an indicated Marker gene. The indicia can be direct or indirect and measure over- or under-expression of the gene given the physiologic parameters and in comparison to an internal control, normal tissue or another carcinoma. Biomarkers include, without limitation, nucleic acids (both over and under-expression and direct and indirect). Using nucleic acids as Biomarkers can include any method known in the art including, without limitation, measuring DNA amplification, RNA, micro RNA, loss of heterozygosity (LOH), single nucleotide polymorphisms (SNPs, Brookes (1999)), microsatellite DNA, DNA hypo- or hyper-methylation. Using proteins as Biomarkers can include any method known in the art including, without limitation, measuring amount, activity, modifications such as glycosylation, phosphorylation, ADP-ribosylation, ubiquitination, etc., imunohistochemistry (IHC). Other Biomarkers include imaging, cell count and apoptosis Markers.
The indicated genes provided herein are those associated with a particular tumor or tissue type. A Marker gene may be associated with numerous cancer types but provided that the expression of the gene is sufficiently associated with one tumor or tissue type to be identified using the algorithm described herein to be specific for a particular origin, the gene can be used in the claimed invention to determine tissue of origin for a carcinoma of unknown primary origin (CUP). Numerous genes associated with one or more cancers are known in the art. The present invention provides preferred Marker genes and even more preferred Marker gene combinations. These are described herein in detail.
“Origin” as referred to in ‘tissue of origin’ means either the tissue type (lung, colon, etc.) or the histological type (adenocarcinoma, squamous cell carcinoma, etc.) depending on the particular medical circumstances and will be understood by anyone of skill in the art.
A Marker gene corresponds to the sequence designated by a SEQ ID NO when it contains that sequence. A gene segment or fragment corresponds to the sequence of such gene when it contains a portion of the referenced sequence or its complement sufficient to distinguish it as being the sequence of the gene. A gene expression product corresponds to such sequence when its RNA, mRNA, or cDNA hybridizes to the composition having such sequence (e.g. a probe) or, in the case of a peptide or protein, it is encoded by such mRNA. A segment or fragment of a gene expression product corresponds to the sequence of such gene or gene expression product when it contains a portion of the referenced gene expression product or its complement sufficient to distinguish it as being the sequence of the gene or gene expression product.
The inventive methods, compositions, articles, and kits of described and claimed in this specification include one or more Marker genes. “Marker” or “Marker gene” is used throughout this specification to refer to genes and gene expression products that correspond with any gene the over- or under-expression of which is associated with a tumor or tissue type. The preferred Marker genes are described in more detail in Tables 1 and 15.
The present invention provides a method of diagnosing pancreatic cancers. The present invention thus provides methods for determining the direction of therapy by identifying pancreatic cancers potentially early enough to avoid resection thus allowing for chemotherapeutic regimens.
The present invention further provides composition containing at least one isolated sequence selected from SEQ ID NOs: 39-41 and 43-45. The present invention further provides kits for conducting an assay according to the methods provided herein and further containing Biomarker detection reagents.
The present invention further provides methods for measuring gene expression by generating the amplicons of SEQ ID NOs: 42 and 46 to determine gene expression and comparing levels of at least one of these amplicons to normal tissue gene expression to diagnose pancreatic cancer.
The present invention further provides microarrays or gene chips for performing the methods described herein.
The present invention further provides diagnostic/prognostic portfolios containing isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes as described herein where the combination is sufficient to measure or characterize gene expression in a biological sample having metastatic cells relative to cells from different carcinomas or normal tissue.
Any method described in the present invention can further include measuring expression of at least one gene constitutively expressed in the sample.
Preferably the Markers for pancreatic cancer are coagulation factor V (F5), prostate stem cell antigen (PSCA), integrin, β6 (ITGB6), kallikrein 10 (KLK10), claudin 18 (CLDN18), trio isoform (TR10), and hypothetical protein FLJ22041 similar to FK506 binding proteins (FKBP10). Preferably, Biomarkers for F5 and PSCA are measured together. Biomarkers for ITGB6, KLK10, CLDN18, TR10, and FKBP10 can be measured in addition to or in place of F5 and/or PSCA. F5 is described for instance by 20040076955; 20040005563; and WO2004031412. PSCA is described for instance by WO1998040403; 20030232350; and WO2004063355. ITGB6 is described for instance by WO2004018999; and 6339148. KLK10 is described for instance by WO2004077060; and 20030235820. CLDN18 is described for instance by WO2004063355; and WO2005005601. TR10 is described for instance by 20020055627. FKBP10 is described for instance by WO2000055320.
The invention further provides a method for providing a prognosis by determining the presence of pancreatic cancer according to the methods described herein and identifying the corresponding prognosis therefor.
The invention further provides a method for finding Biomarkers comprising determining the expression level of a Marker gene in a particular metastasis, measuring a Biomarker for the Marker gene to determine expression thereof, analyzing the expression of the Marker gene according to the methods described herein and determining if the Marker gene is effectively specific for pancreatic cancer.
The invention further provides compositions comprising at least one isolated sequence selected from SEQ ID NOs: 39-46.
The invention further provides kits, articles, microarrays or gene chip, diagnostic/prognostic portfolios for conducting the assays described herein and patient reports for reporting the results obtained by the present methods.
The mere presence or absence of particular nucleic acid sequences in a tissue sample has only rarely been found to have diagnostic or prognostic value. Information about the expression of various proteins, peptides or mRNA, on the other hand, is increasingly viewed as important. The mere presence of nucleic acid sequences having the potential to express proteins, peptides, or mRNA (such sequences referred to as “genes”) within the genome by itself is not determinative of whether a protein, peptide, or mRNA is expressed in a given cell. Whether or not a given gene capable of expressing proteins, peptides, or mRNA does so and to what extent such expression occurs, if at all, is determined by a variety of complex factors. Irrespective of difficulties in understanding and assessing these factors, assaying gene expression can provide useful information about the occurrence of important events such as tumorogenesis, metastasis, apoptosis, and other clinically relevant phenomena. Relative indications of the degree to which genes are active or inactive can be found in gene expression profiles. The gene expression profiles of this invention are used to provide a diagnosis and treat patients for CUP.
In the above methods, the sample can be prepared by any method known in the art including, but not limited to, bulk tissue preparation and laser capture microdissection. The bulk tissue preparation can be obtained for instance from a biopsy or a surgical specimen.
In the above methods, the gene expression measuring can also include measuring the expression level of at least one gene constitutively expressed in the sample.
In the above methods, the specificity is preferably at least about 40% and the sensitivity at least at least about 80%.
In the above methods, the pre-determined cut-off levels are at least about 1.5-fold over- or under-expression in the sample relative to benign cells or normal tissue.
In the above methods, the pre-determined cut-off levels have at least a statistically significant p-value over-expression in the sample having metastatic cells relative to benign cells or normal tissue, preferably the p-value is less than 0.05.
In the above methods, gene expression can be measured by any method known in the art, including, without limitation on a microarray or gene chip, nucleic acid amplification conducted by polymerase chain reaction (PCR) such as reverse transcription polymerase chain reaction (RT-PCR), measuring or detecting a protein encoded by the gene such as by an antibody specific to the protein or by measuring a characteristic of the gene such as DNA amplification, methylation, mutation and allelic variation. The microarray can be for instance, a cDNA array or an oligonucleotide array. All these methods and can further contain one or more internal control reagents.
The present invention provides a method of generating a pancreatic cancer prognostic patient report by determining the results of any one of the methods described herein and preparing a report displaying the results and patient reports generated thereby. The report can further contain an assessment of patient outcome and/or probability of risk relative to the patient population.
Sample preparation requires the collection of patient samples. Patient samples used in the inventive method are those that are suspected of containing diseased cells such as cells taken from a nodule in a fine needle aspirate (FNA) of tissue. Bulk tissue preparation obtained from a biopsy or a surgical specimen and laser capture microdissection are also suitable for use. Laser Capture Microdissection (LCM) technology is one way to select the cells to be studied, minimizing variability caused by cell type heterogeneity. Consequently, moderate or small changes in Marker gene expression between normal or benign and cancerous cells can be readily detected. Samples can also comprise circulating epithelial cells extracted from peripheral blood. These can be obtained according to a number of methods but the most preferred method is the magnetic separation technique described in U.S. Pat. No. 6,136,182. Once the sample containing the cells of interest has been obtained, a gene expression profile is obtained using a Biomarker, for genes in the appropriate portfolios.
Preferred methods for establishing gene expression profiles include determining the amount of RNA that is produced by a gene that can code for a protein or peptide. This is accomplished by reverse transcriptase PCR (RT-PCR), competitive RT-PCR, real time RT-PCR, differential display RT-PCR, Northern Blot analysis and other related tests. While it is possible to conduct these techniques using individual PCR reactions, it is best to amplify complementary DNA (cDNA) or complementary RNA (cRNA) produced from mRNA and analyze it via microarray. A number of different array configurations and methods for their production are known to those of skill in the art and are described in for instance, U.S. Pat. Nos. 5,445,934; 5,532,128; 5,556,752; 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501; 5,561,071; 5,571,639; 5,593,839; 5,599,695; 5,624,711; 5,658,734; and 5,700,637.
Microarray technology allows for the measurement of the steady-state mRNA level of thousands of genes simultaneously thereby presenting a powerful tool for identifying effects such as the onset, arrest, or modulation of uncontrolled cell proliferation. Two microarray technologies are currently in wide use. The first are cDNA arrays and the second are oligonucleotide arrays. Although differences exist in the construction of these chips, essentially all downstream data analysis and output are the same. The product of these analyses are typically measurements of the intensity of the signal received from a labeled probe used to detect a cDNA sequence from the sample that hybridizes to a nucleic acid sequence at a known location on the microarray. Typically, the intensity of the signal is proportional to the quantity of cDNA, and thus mRNA, expressed in the sample cells. A large number of such techniques are available and useful. Preferred methods for determining gene expression can be found in U.S. Pat. Nos. 6,271,002; 6,218,122; 6,218,114; and 6,004,755.
Analysis of the expression levels is conducted by comparing such signal intensities. This is best done by generating a ratio matrix of the expression intensities of genes in a test sample versus those in a control sample. For instance, the gene expression intensities from a diseased tissue can be compared with the expression intensities generated from benign or normal tissue of the same type. A ratio of these expression intensities indicates the fold-change in gene expression between the test and control samples.
The selection can be based on statistical tests that produce ranked lists related to the evidence of significance for each gene's differential expression between factors related to the tumor's original site of origin. Examples of such tests include ANOVA and Kruskal-Wallis. The rankings can be used as weightings in a model designed to interpret the summation of such weights, up to a cutoff, as the preponderance of evidence in favor of one class over another. Previous evidence as described in the literature may also be used to adjust the weightings.
A preferred embodiment is to normalize each measurement by identifying a stable control set and scaling this set to zero variance across all samples. This control set is defined as any single endogenous transcript or set of endogenous transcripts affected by systematic error in the assay, and not known to change independently of this error. All Markers are adjusted by the sample specific factor that generates zero variance for any descriptive statistic of the control set, such as mean or median, or for a direct measurement. Alternatively, if the premise of variation of controls related only to systematic error is not true, yet the resulting classification error is less when normalization is performed, the control set will still be used as stated. Non-endogenous spike controls could also be helpful, but are not preferred.
Gene expression profiles can be displayed in a number of ways. The most common is to arrange raw fluorescence intensities or ratio matrix into a graphical dendogram where columns indicate test samples and rows indicate genes. The data are arranged so genes that have similar expression profiles are proximal to each other. The expression ratio for each gene is visualized as a color. For example, a ratio less than one (down-regulation) appears in the blue portion of the spectrum while a ratio greater than one (up-regulation) appears in the red portion of the spectrum. Commercially available computer software programs are available to display such data including “Genespring” (Silicon Genetics, Inc.) and “Discovery” and “Infer” (Partek, Inc.)
In the case of measuring protein levels to determine gene expression, any method known in the art is suitable provided it results in adequate specificity and sensitivity. For example, protein levels can be measured by binding to an antibody or antibody fragment specific for the protein and measuring the amount of antibody-bound protein. Antibodies can be labeled by radioactive, fluorescent or other detectable reagents to facilitate detection. Methods of detection include, without limitation, enzyme-linked immunosorbent assay (ELISA) and immunoblot techniques.
Modulated genes used in the methods of the invention are described in the Examples. The genes that are differentially expressed are either up regulated or down regulated in patients with carcinoma of a particular origin relative to those with carcinomas from different origins. Up regulation and down regulation are relative terms meaning that a detectable difference (beyond the contribution of noise in the system used to measure it) is found in the amount of expression of the genes relative to some baseline. In this case, the baseline is determined based on the algorithm. The genes of interest in the diseased cells are then either up regulated or down regulated relative to the baseline level using the same measurement method. Diseased, in this context, refers to an alteration of the state of a body that interrupts or disturbs, or has the potential to disturb, proper performance of bodily functions as occurs with the uncontrolled proliferation of cells. Someone is diagnosed with a disease when some aspect of that person's genotype or phenotype is consistent with the presence of the disease. However, the act of conducting a diagnosis or prognosis may include the determination of disease/status issues such as determining the likelihood of relapse, type of therapy and therapy monitoring. In therapy monitoring, clinical judgments are made regarding the effect of a given course of therapy by comparing the expression of genes over time to determine whether the gene expression profiles have changed or are changing to patterns more consistent with normal tissue.
Genes can be grouped so that information obtained about the set of genes in the group provides a sound basis for making a clinically relevant judgment such as a diagnosis, prognosis, or treatment choice. These sets of genes make up the portfolios of the invention. As with most diagnostic Markers, it is often desirable to use the fewest number of Markers sufficient to make a correct medical judgment. This prevents a delay in treatment pending further analysis as well unproductive use of time and resources.
One method of establishing gene expression portfolios is through the use of optimization algorithms such as the mean variance algorithm widely used in establishing stock portfolios. This method is described in detail in 20030194734. Essentially, the method calls for the establishment of a set of inputs (stocks in financial applications, expression as measured by intensity here) that will optimize the return (e.g., signal that is generated) one receives for using it while minimizing the variability of the return. Many commercial software programs are available to conduct such operations. “Wagner Associates Mean-Variance Optimization Application,” referred to as “Wagner Software” throughout this specification, is preferred. This software uses functions from the “Wagner Associates Mean-Variance Optimization Library” to determine an efficient frontier and optimal portfolios in the Markowitz sense is preferred. Markowitz (1952). Use of this type of software requires that microarray data be transformed so that it can be treated as an input in the way stock return and risk measurements are used when the software is used for its intended financial analysis purposes.
The process of selecting a portfolio can also include the application of heuristic rules. Preferably, such rules are formulated based on biology and an understanding of the technology used to produce clinical results. More preferably, they are applied to output from the optimization method. For example, the mean variance method of portfolio selection can be applied to microarray data for a number of genes differentially expressed in subjects with cancer. Output from the method would be an optimized set of genes that could include some genes that are expressed in peripheral blood as well as in diseased tissue. If samples used in the testing method are obtained from peripheral blood and certain genes differentially expressed in instances of cancer could also be differentially expressed in peripheral blood, then a heuristic rule can be applied in which a portfolio is selected from the efficient frontier excluding those that are differentially expressed in peripheral blood. Of course, the rule can be applied prior to the formation of the efficient frontier by, for example, applying the rule during data pre-selection.
Other heuristic rules can be applied that are not necessarily related to the biology in question. For example, one can apply a rule that only a prescribed percentage of the portfolio can be represented by a particular gene or group of genes. Commercially available software such as the Wagner Software readily accommodates these types of heuristics. This can be useful, for example, when factors other than accuracy and precision (e.g., anticipated licensing fees) have an impact on the desirability of including one or more genes.
The gene expression profiles of this invention can also be used in conjunction with other non-genetic diagnostic methods useful in cancer diagnosis, prognosis, or treatment monitoring. For example, in some circumstances it is beneficial to combine the diagnostic power of the gene expression based methods described above with data from conventional Markers such as serum protein Markers (e.g., Cancer Antigen 27.29 (“CA 27.29”)). A range of such Markers exists including such analytes as CA 27.29. In one such method, blood is periodically taken from a treated patient and then subjected to an enzyme immunoassay for one of the serum Markers described above. When the concentration of the Marker suggests the return of tumors or failure of therapy, a sample source amenable to gene expression analysis is taken. Where a suspicious mass exists, a fine needle aspirate (FNA) is taken and gene expression profiles of cells taken from the mass are then analyzed as described above. Alternatively, tissue samples may be taken from areas adjacent to the tissue from which a tumor was previously removed. This approach can be particularly useful when other testing produces ambiguous results.
Kits made according to the invention include formatted assays for determining the gene expression profiles. These can include all or some of the materials needed to conduct the assays such as reagents and instructions and a medium through which Biomarkers are assayed.
Articles of this invention include representations of the gene expression profiles useful for treating, diagnosing, prognosticating, and otherwise assessing diseases. These profile representations are reduced to a medium that can be automatically read by a machine such as computer readable media (magnetic, optical, and the like). The articles can also include instructions for assessing the gene expression profiles in such media. For example, the articles may comprise a CD ROM having computer instructions for comparing gene expression profiles of the portfolios of genes described above. The articles may also have gene expression profiles digitally recorded therein so that they may be compared with gene expression data from patient samples. Alternatively, the profiles can be recorded in different representational format. A graphical recordation is one such format. Clustering algorithms such as those incorporated in “DISCOVERY” and “INFER” software from Partek, Inc. mentioned above can best assist in the visualization of such data.
Different types of articles of manufacture according to the invention are media or formatted assays used to reveal gene expression profiles. These can comprise, for example, microarrays in which sequence complements or probes are affixed to a matrix to which the sequences indicative of the genes of interest combine creating a readable determinant of their presence. Alternatively, articles according to the invention can be fashioned into reagent kits for conducting hybridization, amplification, and signal generation indicative of the level of expression of the genes of interest for detecting cancer.
The following examples are provided to illustrate but not limit the claimed invention. All references cited herein are hereby incorporated herein by reference.
RNA was isolated from pancreatic tumor, normal pancreatic, lung, colon, breast and ovarian tissues using Trizol. The RNA was then used to generate amplified, labeled RNA (Lipshutz et al. (1999)) which was then hybridized onto Affymetrix U133A arrays. The data were then analyzed in two ways.
In the first method, this dataset was filtered to retain only those genes with at least two present calls across the entire dataset. This filtering left 14,547 genes. 2,736 genes were determined to be overexpressed in pancreatic cancer versus normal pancreas with a p value of less than 0.05. Forty five genes of the 2,736 were also overexpressed by at least two-fold compared to the maximum intensity found from lung and colon tissues. Finally, six probe sets were found which were overexpressed by at least two-fold compared to the maximum intensity found from lung, colon, breast, and ovarian tissues.
In the second method, this dataset was filtered to retain only those genes with no more than two present calls in breast, colon, lung, and ovarian tissues. This filtering left 4,654 genes. 160 genes of the 4,654 genes were found to have at least two present calls in the pancreatic tissues (normal and cancer). Finally, eight probe sets were selected which showed the greatest differential expression between pancreatic cancer and normal tissues.
A total of 260 FFPE metastasis and primary tissues were acquired from a variety of commercial vendors. The samples tested included: 30 breast metastasis, 30 colorectal metastasis, 56 lung metastasis, 49 ovarian metastasis 43 pancreas metastasis, 18 prostate primary and 2 prostate metastases and 32 other origins (6 stomach, 6 kidney, 3 larynx, 2 liver, 1 esophagus, 1 pharynx, 1 bile duct, 1 pleura, 3 bladder, 5 melanoma, 3 lymphoma).
RNA isolation from paraffin tissue sections was based on the methods and reagents described in the High Pure RNA Paraffin Kit manual (Roche) with the following modifications. Paraffin embedded tissue samples were sectioned according to size of the embedded metastasis (2-5 mm=9×10 μm, 6-8 mm=6×10 μm, 8-≧10 mm=3×10 μm), and placed in RNase/DNase 1.5 ml Eppendorf tubes. Sections were deparaffinized by incubation in 1 ml of xylene for 2-5 min at room temperature following a 10-20 second vortex. Tubes were then centrifuged and supernatant was removed and the deparaffinization step was repeated. After supernatant was removed, 1 ml of ethanol was added and sample was vortexed for 1 minute, centrifuged and supernatant removed. This process was repeated one additional time. Residual ethanol was removed and the pellet was dried in a 55° C. oven for 5-10 minutes and resuspended in 100 μl of tissue lysis buffer, 16 μl 10% SDS and 80 μl Proteinase K. Samples were vortexed and incubated in a thermomixer set at 400 rpm for 2 hours at 55° C. 325 μl binding buffer and 325 μl ethanol was added to each sample that was then mixed, centrifuged and the supernatant was added onto the filter column. Filter column along with collection tube were centrifuged for 1 minute at 8000 rpm and flow through was discarded. A series of sequential washes were performed (500 μl Wash Buffer I→500 μl Wash Buffer II→300 μl Wash Buffer II) in which each solution was added to the column, centrifuged and flow through discarded. Column was then centrifuged at maximum speed for 2 minutes, placed in a fresh 1.5 ml tube and 90 μl of elution buffer was added. RNA was obtained after a 1 minute incubation at room temperature followed by a 1 minute centrifugation at 8000 rpm. Sample was DNase treated with the addition of 10 μl DNase incubation buffer, 2 μl of DNase I and incubated for 30 minutes at 37° C. DNase was inactivated following the addition of 20 μl of tissue lysis buffer, 18 μl 10% SDS and 40 μl Proteinase K. Again, 325 μl binding buffer and 325 μl ethanol was added to each sample that was then mixed, centrifuged and supernatant was added onto the filter column. Sequential washes and elution of RNA proceeded as stated above with the exception of 50 μl of elution buffer being used to elute the RNA. To eliminate glass fiber contamination carried over from the column RNA was centrifuged for 2 minutes at full speed and supernatant was removed into a fresh 1.5 ml Eppendorf tube. Samples were quantified by OD 260/280 readings obtained by a spectrophotometer and samples were diluted to 50 ng/μl. The isolated RNA was stored in Rnase-free water at −80° C. until use.
Appropriate mRNA reference sequence accession numbers in conjunction with Oligo 6.0 were used to develop TaqMan® CUP assays (lung Markers: human surfactant, pulmonary-associated protein B (HUMPSPBA), thyroid transcription factor 1 (TTF1), desmoglein 3 (DSG3), colorectal Marker: cadherin 17 (CDH17), breast Markers: mammaglobin (MG), prostate-derived ets transcription factor (PDEF), ovarian Marker: wilms tumor 1 (WT1), pancreas Markers: prostate stem cell antigen (PSCA), coagulation factor V (F5), prostate Marker kallikrein 3 (KLK3)) and housekeeping assays beta actin (β-Actin), hydroxymethylbilane synthase (PBGD). Primers and hydrolysis probes for each assay are listed in Table 2. Genomic DNA amplification was excluded by designing assays around exon-intron splicing sites. Hydrolysis probes were labeled at the 5′ nucleotide with FAM as the reporter dye and at 3′ nucleotide with BHQ1-TT as the internal quenching dye.
Quantitation of gene-specific RNA was carried out in a 384 well plate on the ABI Prism 7900HT sequence detection system (Applied Biosystems). For each thermo-cycler run calibrators and standard curves were amplified. Calibrators for each Marker consisted of target gene in vitro transcripts that were diluted in carrier RNA from rat kidney at 1×105 copies. Standard curves for housekeeping Markers consisted of target gene in vitro transcripts that were serially diluted in carrier RNA from rat kidney at 1×107, 1×105 and 1×103 copies. No target controls were also included in each assay run to ensure a lack of environmental contamination. All samples and controls were run in duplicate. qRTPCR was performed with general laboratory use reagents in a 10 μl reaction containing: RT-PCR Buffer (50 nM Bicine/KOH pH 8.2, 115 nM KAc, 8% glycerol, 2.5 mM MgCl2, 3.5 mM MnSO4, 0.5 mM each of dCTP, dATP, dGTP and dTTP), Additives (2 mM Tris-Cl pH 8, 0.2 mM Albumin Bovine, 150 mM Trehalose, 0.002% Tween 20), Enzyme Mix (2U Tth (Roche), 0.4 mg/μl Ab TP6-25), Primer and Probe Mix (0.2 μM Probe, 0.5 μM Primers). The following cycling parameters were followed: 1 cycle at 95° C. for 1 minute; 1 cycle at 55° C. for 2 minutes; Ramp 5%; 1 cycle at 70° C. for 2 minutes; and 40 cycles of 95° C. for 15 seconds, 58° C. for 30 seconds. After the PCR reaction was completed, baseline and threshold values were set in the ABI 7900HT Prism software and calculated Ct values were exported to Microsoft Excel.
One-Step vs. Two-Step Reaction.
First strand synthesis was carried out using either 100 ng of random hexamers or gene specific primers per reaction. In the first step, 11.5 μl of Mix-1 (primers and 1 ug of total RNA) was heated to 65° C. for 5 minutes and then chilled on ice. 8.5 μl of Mix-2 (1× Buffer, 0.01 mM DTT, 0.5 mM each dNTP's, 0.25 U/μl RNasin®, 10U/μl Superscript III) was added to Mix-1 and incubated at 50° C. for 60 minutes followed by 95° C. for 5 minutes. The cDNA was stored at −20° C. until ready for use. qRTPCR for the second step of the two-step reaction was performed as stated above with the following cycling parameters: 1 cycle at 95° C. for 1 minute; 40 cycles of 95° C. for 15 seconds, 58° C. for 30 seconds. qRTPCR for the one-step reaction was performed exactly as stated in the preceding paragraph. Both the one-step and two-step reactions were performed on 100 ng of template (RNA/cDNA). After the PCR reaction was completed, baseline and threshold values were set in the ABI 7900HT Prism software and calculated Ct values were exported to Microsoft Excel.
For each sample, a ΔCt was calculated by taking the mean Ct of each CUP Marker and subtracting the mean Ct of an average of the housekeeping Markers (ΔCt=Ct(CUP Marker)−Ct(Ave. HK Marker)). The minimal ΔCt for each tissue of origin Marker set (lung, breast, prostate, colon, ovarian and pancreas) was determined for each sample. The tissue of origin with the overall minimal ΔCt was scored one and all other tissue of origins scored zero. Data were sorted according to pathological diagnosis. Partek Pro was populated with the modified feasibility data and an intensity plot was generated.
First, five pancreas Marker candidates were analyzed: prostate stem cell antigen (PSCA), serine proteinase inhibitor, clade A member 1 (SERPINA1), cytokeratin 7 (KRT7), matrix metalloprotease 11 (MMP11), and mucin4 (MUC4) (Varadhachary et al (2004); Fukushima et al. (2004); Argani et al. (2001); Jones et al. (2004); Prasad et al. (2005); and Moniaux et al. (2004)) using DNA microarrays and a panel of 13 pancreatic ductal adenocarcinomas, five normal pancreas tissues, and 98 samples from breast, colorectal, lung, and ovarian tumors. Only PSCA demonstrated moderate sensitivity (six out of thirteen or 46% of pancreatic tumors were detected) at a high specificity (91 out of 98 or 93% were correctly identified as not being of pancreatic origin) (
Because pancreatic ductal adenocarcinoma develops from ductal epithelial cells that comprise only a small percentage of all pancreatic cells (with acinar cells and islet cells comprising the majority) and because pancreatic adenocarcinoma tissues contain a significant amount of adjacent normal tissue (Prasad et al. (2005); and Ishikawa et al. (2005)), it has been difficult to identify pancreatic cancer Markers (i.e., upregulated in cancer) which would also differentiate this organ from the organs. For use in a CUP panel such differentiation is necessary. The first query method (see Materials and Methods) returned six probe sets: coagulation factor V (F5), a hypothetical protein FLJ22041 similar to FK506 binding proteins (FKBP10), β6 integrin (ITGB6), transglutaminase 2 (TGM2), heterogeneous nuclear ribonucleoprotein A0 (HNRP0), and BAX delta (BAX). The second query method (see Materials and Methods) returned eight probe sets: F5, TGM2, paired-like homeodomain transcription factor 1 (PITX1), trio isoform mRNA (TRIO), mRNA for p73H (p73), an unknown protein for MGC:10264 (SCD), and two probe sets for claudin18. F5 and TGM2 were present in both query results and, of the two, F5 looked the most promising (
Optimization of Sample Prep and qRTPCR Using FFPE Tissues.
Next the RNA isolation and qRTPCR methods were optimized using fixed tissues before examining Marker panel performance. First the effect of reducing the proteinase K incubation time from sixteen hours to 3 hours was analyzed. There was no effect on yield. However, some samples showed longer fragments of RNA when the shorter proteinase K step was used (
Next, three different methods of reverse transcription were compared: reverse transcription with random hexamers followed by qPCR (two step), reverse transcription with a gene-specific primer followed by qPCR (two step), and a one-step qRTPCR using gene-specific primers. RNA was isolated from eleven metastases and compared Ct values across the three methods for β-actin, human surfactant protein B (HUMSPB), and thyroid transcription factor (TTF) (
Diagnostic Performance of a CUP qRTPCR Assay.
Next 12 qRTPCR reactions (10 Markers and two housekeeping genes) were performed on 239 FFPE metastases. The Markers used for the assay are shown in Table 2. The lung Markers were human surfactant pulmonary-associated protein B (HUMPSPB), thyroid transcription factor 1 (TTF1), and desmoglein 3 (DSG3). The colorectal Marker was cadherin 17 (CDH17). The breast Markers were mammaglobin (MG) and prostate-derived Ets transcription factor (PDEF). The ovarian Marker was Wilms tumor 1 (WT1). The pancreas Markers were prostate stem cell antigen (PSCA) and coagulation factor V (F5), and the prostate Marker was kallikrein 3 (KLK3). For gene descriptions, see Table 15.
Analysis of the normalized Ct values in a heat map revealed the high specificity of the breast and prostate Markers, moderate specificity of the colon, lung, and ovarian, and somewhat lower specificity of the pancreas Markers. Combining the normalized qRTPCR data with computational refinement improves the performance of the Marker panel. Results were obtained from the combined normalized qRTPCR data with the algorithm and the accuracy of the qRTPCR assay was determined.
In this example, microarray-based expression profiling was used on primary tumors to identify candidate Markers for use with metastases. The fact that primary tumors can be used to discover tumor of origin Markers for metastases is consistent with several recent findings. For example, Weigelt and colleagues have shown that gene expression profiles of primary breast tumors are maintained in distant metastases. Weigelt et al. (2003). Italiano and coworkers found that EGFR status, as assessed by IHC, was similar in 80 primary colorectal tumors and the 80 related metastases. Italiano et al. (2005). Only five of the 80 showed discordance in EGFR status. Italiano et al. (2005). Backus and colleagues identified putative Markers for detecting breast cancer metastasis using a genome-wide gene expression analysis of breast and other tissues and demonstrated that mammaglobin and CK19 detected clinically actionable metastasis in breast sentinel lymph nodes with 90% sensitivity and 94% specificity. Backus et al. (2005).
The microarray-based studies with primary tissue confirmed the specificity and sensitivity of known Markers. As a result, with the exception of F5, all of the Markers used have high specificity for the tissues studied here. Argani et al (2001; Backus et al. (2005); Cunha et al. (2005); Borgono et al. (2004); McCarthy et al. (2003); Hwang et al. (2004); Fleming et al. (2000); Nakamura et al. (2002); and Khoor et al. (1997). A recent study determined that, using IHC, PSCA is overexpressed in prostate cancer metastases. Lam et al. (2005). Dennis et al. (2002) also demonstrated that PSCA could be used as a tumor of origin Marker for pancreas and prostate. As shown herein, strong expression of PSCA is found in some prostate tissues at the RNA level but, because by including PSA in the assay, one can now segregate prostate and pancreatic cancers. A novel finding of this study was the use of F5 as a complementary (to PSCA) Marker for pancreatic tissue of origin. In both the microarray data set with primary tissue and the qRTPCR data set with FFPE metastases, F5 was found to complement PSCA (
Previous investigators have generated CUP assays using IHC or microarrays. Su et al. (2001); Ramaswamy et al. (2001); and Bloom et al. (2004). More recently, SAGE has been coupled to a small qRTPCR Marker panel. Dennis (2002); and Buckhaults et al. (2003). This study is the first to combine microarray-based expression profiling with a small panel of qRTPCR assays. Microarray studies with primary tissue identified some, but not all, of the same tissue of origin Markers as those identified previously by SAGE studies. Some studies have demonstrated that a modest agreement between SAGE- and DNA microarray-based profiling data exists and that the correlation improves for genes with higher expression levels. van Ruissen et al. (2005); and Kim (2003). For example, Dennis and colleagues identified PSA, MG, PSCA, and HUMSPB while Buckhaults and coworkers (Dennis et al. (2002)) identified PDEF. Executing the CUP assay using qRTPCR is preferred because it is a robust technology and may have performance advantages over IHC. Al-Mulla et al. (2005); and Haas et al. (2005). As shown herein, the qRTPCR protocol was improved through the use of gene-specific primers in a one-step reaction. This is the first demonstration of the use of gene-specific primers in a one-step qRTPCR reaction with FFPE tissue. Other investigators have either done a two step qRTPCR (cDNA synthesis in one reaction followed by qPCR) or have used random hexamers or truncated gene-specific primers. Abrahamsen et al. (2003); Specht et al. (2001); Godfrey et al. (2000); Cronin et al. (2004); and Mikhitarian et al. (2004).
Pancreatic ductal adenocarcinoma develops from ductal epithelial cells that comprise only a small percentage of all pancreatic cells (with acinar and islet cells comprising the majority) in the normal pancreas. Furthermore, pancreatic adenocarcinoma tissues contain a significant amount of adjacent normal tissue. Prasad et al. (2005); and Ishikawa et al. (2005). Because of this the candidate pancreas Markers were enriched for genes elevated in pancreas adenocarcinoma relative to normal pancreas cells. The first query method returned six probe sets: coagulation factor V (F5), a hypothetical protein FLJ22041 similar to FK506 binding proteins (FKBP10), beta 6 integrin (ITGB6), transglutaminase 2 (TGM2), heterogeneous nuclear ribonucleoprotein A0 (HNRP0), and BAX delta (BAX). The second query method (see Materials and Methods section for details) returned eight probe sets: F5, TGM2, paired-like homeodomain transcription factor 1 (PITX1), trio isoform mRNA (TRIO), mRNA for p73H (p73), an unknown protein for MGC:10264 (SCD), and two probe sets for claudin18.
A total of 23 tissue specific Marker candidates were selected for further RT-PCR validation on metastatic carcinoma FFPE tissues by qRT-PCR. Marker candidates were tested on 205 FFPE metastatic carcinomas, from lung, pancreas, colon, breast, ovary, prostate and prostate primary carcinomas. Table 4 provides the gene symbols of the tissue specific Markers selected for RT-PCR validation and also summarizes the results of testing performed with these Markers.
Out of 23 tested Markers, thirteen were rejected based on their cross reactivity, low expression level in the corresponding metastatic tissues, or redundancy. Ten Markers were selected for the final version of assay. The lung Markers were human surfactant pulmonary-associated protein B (HUMPSPB), thyroid transcription factor 1 (TTF1), and desmoglein 3 (DSG3). The pancreas Markers were prostate stem cell antigen (PSCA) and coagulation factor V (F5), and the prostate Marker was kallikrein 3 (KLK3). The colorectal Marker was cadherin 17 (CDH17). Breast Markers were mammaglobin (MG) and prostate-derived Ets transcription factor (PDEF). The ovarian Marker was Wilms tumor 1 (WT1).
Optimization of sample preparation and qRT-PCR using FFPE tissues. Next the RNA isolation and qRTPCR methods were optimized using fixed tissues before examining the performance of the Marker panel. First the effect of reducing the proteinase K incubation time from sixteen hours to 3 hours was analyzed. There was no effect on yield. However, some samples showed longer fragments of RNA when the shorter proteinase K step was used (
Next three different methods of reverse transcription were compared: reverse transcription with random hexamers followed by qPCR (two step), reverse transcription with a gene-specific primer followed by qPCR (two step), and a one-step qRTPCR using gene-specific primers. RNA was isolated from eleven metastases and compared Ct values across the three methods for β-actin, HUMSPB (
Diagnostic performance of optimized qRTPCR assay. 12 qRTPCR reactions (10 Markers and 2 housekeeping genes) were performed on new set of 260 FFPE metastases. Twenty-one samples gave high Ct values for the housekeeping genes so only 239 were used in a heat map analysis. Analysis of the normalized Ct values in a heat map revealed the high specificity of the breast and prostate Markers, moderate specificity of the colon, lung, and ovarian, and somewhat lower specificity of the pancreas Markers (
Using expression values, normalized to average of expression of two housekeeping genes, an algorithm to predict metastasis tissue of origin was developed by combining the normalized qRTPCR data with the algorithm and determined the accuracy of the qRTPCR assay by performing a leave-one-out-cross-validation test (LOOCV). For the six tissue types included in the assay, it was separately estimated that both the number of false-positive calls, when a sample was wrongly predicted as another tumor type included in the assay (pancreas as colon, for example), and the number of times a sample was not predicted as those included in the assay tissue types (other). Results of the LOOCV are presented on Table 5.
The tissue of origin was predicted correctly for 204 out of 260 tested samples with an overall accuracy of 78%. A significant proportion of the false positive calls were due to the Markers' cross-reactivity in histologically similar tissues. For example, three squamous cell metastatic carcinomas originated from pharynx, larynx and esophagus were wrongly predicted as lung due to DSG3 expression in these tissues. Positive expression of CDH17 in other than colon GI carcinomas, including stomach and pancreas, caused false classification of 4 out of 6 tested stomach and 3 out of 43 tested pancreatic cancer metastasis as colon.
In addition to a LOOCV test, the data was randomly split into 3 separate pairs of training and test sets. Each split contained approximately 50% of the samples from each class. At 50/50 splits in three separate pairs of training and test sets, assay overall classification accuracies were 77%, 71% and 75%, confirming assay performance stability.
Last, another independent set of 48 FFPE metastatic carcinomas that included metastatic carcinoma of known primary, CUP specimens with a tissue of origin diagnosis rendered by pathological evaluation including IHC, and CUP specimens that remained CUP after IHC testing were tested. The tissue of origin prediction accuracy was estimated separately for each category of samples. Table 6 summarizes the assay results.
The tissue of origin prediction was, with only a few exceptions, consistent with the known primary or tissue of origin diagnosis assessed by clinical/pathological evaluation including IHC. Similar to the training set, the assay was not able to differentiate squamous cell carcinomas originating from different sources and falsely predicted them as lung.
The assay also made putative tissue of origin diagnoses for eight out of eleven samples which remained CUP after standard diagnostic tests. One of the CUP cases was especially interesting. A male patient with a history of prostate cancer was diagnosed with metastatic carcinoma in lung and pleura. Serum PSA tests and IHC with PSA antibodies on metastatic tissue were negative, so the pathologist's diagnosis was CUP with an inclination toward gastrointestinal tumors. The assay strongly (posterior probability 0.99) predicted the tissue of origin as colon.
Discussion. In this study, microarray-based expression profiling on primary tumors was used to identify candidate Markers for use with metastases. The fact that primary tumors can be used to discover tumor of origin Markers for metastases is consistent with several recent findings. For example, Weigelt and colleagues have shown that gene expression profiles of primary breast tumors are maintained in distant metastases. Weigelt et al. (2003). Backus and colleagues identified putative Markers for detecting breast cancer metastasis using a genome-wide gene expression analysis of breast and other tissues and demonstrated that mammaglobin and CK19 detected clinically actionable metastasis in breast sentinel lymph nodes with 90% sensitivity and 94% specificity. Backus et al. (2005).
During the development of the assay, selection was focused on six cancer types, including lung, pancreas and colon which are among the most prevalent in CUP (Ghosh et al. (2005); and Pavlidis et al. (2005)) and breast, ovarian and prostate for which treatment could be potentially most beneficial for patients. Ghosh et al. (2005). However, additional tissue types and Markers can be added to the panel as long as the overall accuracy of the assay is not compromised and, if applicable, the logistics of the RTPCR reactions are not encumbered.
The microarray-based studies with primary tissue confirmed the specificity and sensitivity of known Markers. As a result, the majority of tissue specific Markers have high specificity for the tissues studied here. A recent study found that, using IHC, PSCA is overexpressed in prostate cancer metastases. Lam et al. (2005). Dennis et al. (2002) also demonstrated that PSCA could be used as a tumor of origin Marker for pancreas and prostate. Strong expression of PSCA in some prostate tissues at the RNA level was present but, because due to inclusion of PSA in the assay, prostate and pancreatic cancers can now be segregated. A novel finding of this study was the use of F5 as a complementary (to PSCA) Marker for pancreatic tissue of origin. In both the microarray data set with primary tissue and the qRTPCR data set with FFPE metastases, F5 was found to complement PSCA.
Previous investigators have generated CUP assays using IHC (Brown et al. (1997); DeYoung et al. (2000); and Dennis et al. (2005a)) or microarrays. Su et al. (2001); Ramaswamy et al. (2001); and Bloom et al. (2004). More recently, SAGE has been coupled to a small qRTPCR Marker panel. Dennis et al. (2002); and Buckhaults et al. (2003). This study is the first to combine microarray-based expression profiling with a small panel of qRTPCR assays. The microarray studies with primary tissue identified some, but not all, of the same tissue of origin Markers as those identified previously by SAGE studies. This finding is not surprising given studies that have demonstrated that a modest agreement between SAGE- and DNA microarray-based profiling data exists and that the correlation improves for genes with higher expression levels. van Ruissen et al. (2005); and Kim et al. (2003). For example, Dennis and colleagues identified PSA, MG, PSCA, and HUMSPB while Buckhaults and coworkers (Buckhaults et al. (2003)) identified PDEF. Execution of the CUP assay is preferably by qRTPCR because it is a robust technology and may have performance advantages over IHC. Al-Mulla et al. (2005); and Haas et al. (2005). Further, as shown herein, the qRTPCR protocol has been improved through the use of gene-specific primers in a one-step reaction. This is the first demonstration of the use of gene-specific primers in a one-step qRTPCR reaction with FFPE tissue. Other investigators have either done a two-step qRTPCR (cDNA synthesis in one reaction followed by qPCR) or have used random hexamers or truncated gene-specific primers. Abrahamsen et al. (2003); Specht et al. (2001); Godfrey et al. (2000); Cronin et al. (2004); and Mikhitarian et al. (2004).
In summary, the 78% overall accuracy of the assay for six tissue types compares favorably to other studies. Brown et al. (1997); DeYoung et al. (2000); Dennis et al. (2005a); Su et al. (2001); Ramaswamy et al. (2001); and Bloom et al. (2004).
In this study classifier using gene marker portfolios were built by choosing from MVO and using this classifier to predict tissue origin and cancer status for five major cancer types including breast, colon, lung, ovarian and prostate. Three hundred and seventy eight primary cancer, 23 benign proliferative epithelial lesions and 103 normal snap-frozen human tissue specimens were analyzed by using Affymetrix human U133A GeneChip. Leukocyte samples were also analyzed in order to subtract gene expression potentially masked by co-expression in leukocyte background cells. A novel MVO-based bioinformatics method was developed to select gene marker portfolios for tissue of origin and cancer status. The data demonstrated that a panel of 26 genes could be used as a classifier to accurately predict the tissue of origin and cancer status among the 5 cancer types. Thus a multi-cancer classification method is obtainable by determining gene expression profiles of a reasonably small number of gene markers.
Table 7 shows the Markers identified for the tissue origins indicated. For gene descriptions see Table 15.
The sample set included a total of 299 metastatic colon, breast, pancreas, ovary, prostate, lung and other carcinomas and primary prostate cancer samples. QC based on histological evaluation, RNA yield and expression of control gene beta-actin was implemented. Other samples category included metastasis originated from stomach (5), kidney (6), cholangio/gallbladder (4), liver (2), head and neck (4), ileum (1) carcinomas and one mesothelioma. Table 8 summarizes the results.
Testing the above samples resulted in the narrowing of the Marker set to those in Table 9 with the results seen in Table 10.
The results showed that out of 205 paraffin embedded metastatic tumors; 166 samples (81%) had conclusive assay results, Table 11.
Of the false positive results, many false derived from histologically and embryologically similar tissues, Table 12.
The following parameters were considered for the model development:
Separate markers on female and male sets and calculate CUP probability separately for male and female patients. The male set included: SP_B, TTF1, DSG3, PSCA, F5, PSA, HPT1; the female set included: SP_B, TTF1, DSG3, PSCA, F5, HPT1, MGB, PDEF, WT1. Background expression was excluded from the assay results: Lung: SP_B, TTF1, DSG3; Ovary: WT1; and Colon: HPT1.
The CUP model was adjusted to the CUP prevalence (%): lung 23, pancreas 16, colorectal 9, breast 3, ovarian 4, prostate 2, other 43. The prevalence for breast and ovarian adjusted to 0% for male patients, and prostate adjusted to 0% for female patients.
The following steps were taken:
Place markers on similar scale.
Reduce number of variables from 12 to 8 by selecting minimum value from each tissue specific set.
Leave out 1 sample. Build model from remaining samples. Test left out sample. Repeat until 100% of samples are tested.
Randomly leave out ˜50% of samples (˜50% per tissue). Build model from remaining samples. Test ˜50% of samples. Repeat for 3 different random splits.
Classification accuracy was adjusted to cancer types prevalence
To produce the results summarized in Table 13 with the raw data shown in Table 14
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, the descriptions and examples should not be construed as limiting the scope of the invention.
Al-Mulla et al. (2005) BRCA1 gene expression in breast cancer: a correlative study between real-time RT-PCR and immunohistochemistry J Histochem Cytochem 53:621-629
Argani et al. (2001) Discovery of new Markers of cancer through serial analysis of gene expression: prostate stem cell antigen is overexpressed in pancreatic adenocarcinoma Cancer Res 61:4320-4324
Borgono et al. (2004) Human tissue kallikreins: physiologic roles and applications in cancer Mol Cancer Res 2:257-280
Brown et al. (1997) Immunohistochemical identification of tumor Markers in metastatic adenocarcinoma. A diagnostic adjunct in the determination of primary site Am J Clin Pathol 107:12-19
Cunha et al. (2006) Tissue-specificity of prostate specific antigens: Comparative analysis of transcript levels in prostate and non-prostatic tissues Cancer Lett 236:229-238
Dennis et al. (2005a) Hunting the primary: novel strategies for defining the origin of tumors J Pathol 205:236-247
DeYoung et al. (2000) Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach Semin Diagn Pathol 17:184-193
Hwang et al. (2004) Wilms tumor gene product: sensitive and contextually specific Marker of serous carcinomas of ovarian surface epithelial origin Appl Immunohistochem Mol Morphol 12:122-126
Italiano et al. (2005) Epidermal growth factor receptor (EGFR) status in primary colorectal tumors correlates with EGFR expression in related metastatic sites: biological and clinical implications Ann Oncol 16:1503-1507
Jones et al. (2004) Comprehensive analysis of matrix metalloproteinase and tissue inhibitor expression in pancreatic cancer: increased expression of matrix metalloproteinase-7 predicts poor survival Clin Cancer Res 10:2832-2845
McCarthy et al. (2003) Novel Markers of pancreatic adenocarcinoma in fine-needle aspiration: mesothelin and prostate stem cell antigen labeling increases accuracy in cytologically borderline cases Appl Immunohistochem Mol Morphol 11:238-243
van Ruissen et al. (2005) Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips BMC Genomics 6:91
Lillemoe et al (2000) Pancreatic cancer: state-of-the-art care CA Cancer J Clin 50:241-68
Gibson et al. (1978) Histological typing of tumors of the liver, biliary tract and pancreas WHO Geneva
This application claims the benefit of U.S. provisional patent application Ser. Nos. 60/718,501 filed Sep. 19, 2005; and 60/725,680 filed Oct. 12, 2005.
Number | Date | Country | |
---|---|---|---|
60718501 | Sep 2005 | US | |
60725680 | Oct 2005 | US |