No government funds were used to make this invention.
Reference to a “Sequence Listing,” a table, or a computer program listing appendix submitted on a compact disc and an incorporation by reference of the material on the compact disc including duplicates and the files on each compact disc are hereby specified.
This application claims the benefit of U.S. Provisional Application No. 60/683,173, filed May 20, 2005.
There are approximately 25,600 new cases of thyroid carcinoma diagnosed in the United States each year, and 1,400 patients will die of the disease. About 75% of all thyroid cancers belong to the papillary thyroid carcinoma type. The rest consist of 10% follicular carcinoma, 5% to 9% medullary thyroid cancer, 1% to 2% anaplastic cancer, 1% to 3% lymphoma, and less than 1% sarcoma and other rare tumors. Usually a lump (nodule) in the thyroid is the first sign of thyroid cancer. There are 10 to 18 million people in US with a single thyroid nodule, and approximately 490,000 become clinically apparent each year. Fortunately only about 5% of these nodules are cancerous.
The commonly used method for thyroid cancer diagnosis is fine needle aspiration (FNA) biopsy. FNA samples are examined cytologically to determine whether the nodules are benign or cancerous. The sensitivity and specificity of FNA range from 68% to 98%, and 72% to 100% respectively, depending on institutions and doctors. Unfortunately, in 25% of the cases the specimens are either inadequate for diagnosis or indeterminable by cytology. In current medical practice, patients with indeterminate results are sent to surgery, with consequence that only 25% have cancer and 75% end up with unnecessary surgery. A molecular assay with high sensitivity and a better specificity (higher than 25%) would greatly improve current diagnostic accuracy of thyroid cancer, and omit unnecessary surgery for non-cancerous patients.
Comparative genomic hybridization (CGH), serial analysis of gene expression (SAGE), and DNA microarray have been used to identify genetic events occurring in thyroid cancers such as loss of heterozygosity, up and down gene regulation, and genetic rearrangements. PAX8 and PPARγ genetic rearrangement event has been demonstrated to be associated with follicular thyroid cancer (FTC). Rearrangement of the ret proto-oncogene is related to papillary thyroid cancer (PTC). Down-regulation of thyroid peroxidase (TPO) gene is observed in both FTC and PTC. Galectin-3 was reported to be a candidate marker to differentiate malignant thyroid neoplasms from benign lesions. However, there are other studies demonstrating that Galectin-3 is not a cancer-specific marker. Many genes purported to be useful in thyroid cancer diagnosis lack the sensitivity and specificity required for an accurate molecular assay.
The present invention encompasses methods of diagnosing thyroid cancer by obtaining a biological sample from a patient; and measuring the expression levels in the sample of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of thyroid cancer.
The present invention encompasses methods of differentiating between thyroid carcinoma and benign thyroid diseases by obtaining a sample from a patient; and measuring the expression levels in the sample of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of thyroid carcinoma.
The present invention encompasses methods of testing indeterminate thyroid fine needle aspirate (FNA) thyroid nodule samples by: obtaining a sample from a patient; and measuring the expression levels in the sample of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below predetermined cut-off levels are indicative of thyroid cancer.
The present invention encompasses methods of determining thyroid cancer patient treatment protocol by: obtaining a biological sample from a thyroid cancer patient; and measuring the expression levels in the sample of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are sufficiently indicative of cancer to enable a physician to determine the type of surgery and/or therapy recommend to treat the disease.
The present invention encompasses methods of treating a thyroid cancer patient by obtaining a biological sample from a thyroid cancer patient; and measuring the expression levels in the sample of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of cancer; and treating the patient with a thyroidectomy if they are cancer positive.
The present invention encompasses methods of cross validating a gene expression profile for thyroid carcinoma patients by: a. obtaining gene expression data from a statistically significant number of patient biological samples; b. randomizing sample order; c. setting aside data from about 10%-50% of samples; d. computing, for the remaining samples, for factor of interest on all variables and selecting variables that meet a p-value cutoff (p); e. selecting variables that fit a prediction model using a forward search and evaluating the training error until it hits a predetermined error rate; f. testing the prediction model on the left-out 10-50% of samples; g. repeating steps c., -g. with a new set of samples removed; and h. continuing steps c)-g) until 100% of samples have been tested and record classification performance.
The present invention encompasses methods of independently validating a gene expression profile and gene profiles obtained thereby for thyroid carcinoma patients by obtaining gene expression data from a statistically significant number of patient biological samples; normalizing the source variabilities in the gene expression data; computing for factor of interest on all variables that were selected previously; and testing the prediction model on the sample and record classification performance.
The present invention encompasses a method of generating a posterior probability score to enable diagnosis of thyroid carcinoma patients by: obtaining gene expression data from a statistically significant number of patient biological samples; applying linear discrimination analysis to the data to obtain selected genes; and applying weighted expression levels to the selected genes with discriminate function factor to obtain a prediction model that can be applied as a posterior probability score.
The present invention encompasses methods of generating a thyroid carcinoma prognostic patient report and reports obtained thereby, by obtaining a biological sample from the patient; measuring gene expression of the sample; applying a posterior probability thereto; and using the results obtained thereby to generate the report.
The present invention encompasses compositions containing at least one probe set selected from the group consisting of: SEQ ID NOs: 36, 53, 73, 211 and 242; and/or SEQ ID NOs: 199, 207, 255 and 354; or the psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25.
The present invention encompasses kits for conducting an assay to determine thyroid carcinoma diagnosis in a biological sample containing: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25.
The present invention encompasses articles for assessing thyroid carcinoma status containing: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25.
The present invention encompasses microarrays or gene chips for performing the methods provided herein.
The present invention encompasses diagnostic/prognostic portfolios containing isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes selected from the group consisting of those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets selected from the group consisting of psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25 where the combination is sufficient to characterize thyroid carcinoma status or risk of relapse in a biological sample.
a is an ROC curve of the 4-gene signature in 74 independent validation samples; 3b is an ROC curve of the 5-gene signature in 74 independent validation samples.
a is an ROC curve of the 4-gene signature that is normalized to the three-thyroid control genes; 4b is an ROC curve of the 5-gene signature that is normalized to the three-thyroid control genes.
a is an ROC curve of the 4-gene signature with one-round amplification in 47 thyroid samples; 5b is an ROC curve of the 4-gene signature with two-round amplification in 47 thyroid samples; 5c is an ROC curve of the 5-gene signature with one-round amplification in 47 thyroid samples; 5d is an ROC curve of the 5-gene signature with two-round amplification in 47 thyroid samples.
a and 6b depict the ROC curves for cross validation with the 83 independent fresh frozen thyroid samples.
a and 7b depict the ROC curves for signature validation with the 47 fine needle aspirate (FNA) thyroid samples.
a and 8b depict the ROC curves for signature performance in 28 paired fresh frozen and FNA thyroid samples.
In this study the goal was to identify signatures that can be used in assays such as DNA chip-based assay to differentiate thyroid carcinomas from benign thyroid diseases. 31 primary papillary thyroid tumors, 21 follicular thyroid cancers, 33 follicular adenoma samples, and 13 benign thyroid diseases were analyzed by using the Affymetrix human U133A Gene Chip. Comparison of gene expression profiles between thyroid cancers and benign tissues has enabled us to identify two signatures: a 5-gene signature identified by percentile analysis and manual selection, and a 4-gene signature selected by Linear Discrimination Analysis (LDA) approach. These two signatures have the performance of sensitivity/specificity 92%/70% and 92%/61%, respectively, and have been validated in 74 independent thyroid samples. The results presented herein demonstrate that these candidate signatures facilitate the diagnosis of thyroid cancers with better sensitivity and specificity than currently available diagnostic procedures. These two signatures are suitable for use in testing indeterminate FNA samples.
By performing gene profiling on 98 representative thyroid benign and tumor samples on Affymetrix U133a chips, we have selected two gene signatures, a 5-gene signature and a 4-gene signature, for thyroid FNA molecular assay. Signatures were selected to achieve the best sensitivity of the assay at a close to 95%. Except for fibronectin and thyroid peroxidase, the other seven genes from the two signatures have not been implicated previously in thyroid tumorogenesis. Both signatures have been validated with an independent 74 thyroid samples, and achieved performance that is equivalent to the one in the 98 training samples. The performances of the two gene signatures are 92% sensitivity and 70%/61% specificity, respectively. When these two signatures are normalized to the specific thyroid control genes the performances are improved relative to the ones of the non-normalized signatures. Furthermore, the signatures performed equivalently with two different target preparations, namely one-round amplification and two-round amplifications. This validation is extremely important for thyroid assays that are FNA samples, which usually contain limited numbers of thyroid cells.
The mere presence or absence of particular nucleic acid sequences in a tissue sample has only rarely been found to have diagnostic or prognostic value. Information about the expression of various proteins, peptides or mRNA, on the other hand, is increasingly viewed as important. The mere presence of nucleic acid sequences having the potential to express proteins, peptides, or mRNA (such sequences referred to as “genes”) within the genome by itself is not determinative of whether a protein, peptide, or mRNA is expressed in a given cell. Whether or not a given gene capable of expressing proteins, peptides, or mRNA does so and to what extent such expression occurs, if at all, is determined by a variety of complex factors. Irrespective of difficulties in understanding and assessing these factors, assaying gene expression can provide useful information about the occurrence of important events such as tumorogenesis, metastasis, apoptosis, and other clinically relevant phenomena. Relative indications of the degree to which genes are active or inactive can be found in gene expression profiles. The gene expression profiles of this invention are used to provide a diagnosis and treat patients for thyroid cancer.
Sample preparation requires the collection of patient samples. Patient samples used in the inventive method are those that are suspected of containing diseased cells such as cells taken from a nodule in a fine needle aspirate (FNA) of thyroid tissue. Bulk tissue preparation obtained from a biopsy or a surgical specimen and laser capture microdissection are also suitable for use. Laser Capture Microdissection (LCM) technology is one way to select the cells to be studied, minimizing variability caused by cell type heterogeneity. Consequently, moderate or small changes in gene expression between normal or benign and cancerous cells can be readily detected. Samples can also comprise circulating epithelial cells extracted from peripheral blood. These can be obtained according to a number of methods but the most preferred method is the magnetic separation technique described in U.S. Pat. No. 6,136,182. Once the sample containing the cells of interest has been obtained, RNA is extracted and amplified and a gene expression profile is obtained, preferably via microarray, for genes in the appropriate portfolios.
The present invention encompasses methods of diagnosing thyroid cancer by obtaining a biological sample from a patient; and measuring the expression levels in the sample of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of thyroid cancer.
The present invention encompasses methods of differentiating between thyroid carcinoma and benign thyroid diseases by obtaining a sample from a patient; and measuring the expression levels in the sample of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of thyroid carcinoma.
The present invention encompasses methods of testing indeterminate thyroid fine needle aspirate (FNA) thyroid nodule samples by: obtaining a sample from a patient; and measuring the expression levels in the sample of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or
recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of thyroid cancer.
The present invention encompasses methods of determining thyroid cancer patient treatment protocol by: obtaining a biological sample from a thyroid cancer patient; and measuring the expression levels in the sample of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are sufficiently indicative of cancer to enable a physician to determine the type of surgery and/or therapy recommend to treat the disease.
The present invention encompasses methods of treating a thyroid cancer patient by obtaining a biological sample from a thyroid cancer patient; and measuring the expression levels in the sample of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25; where the gene expression levels above or below pre-determined cut-off levels are indicative of cancer; and treating the patient with thyroidectomy if they are cancer positive.
The SEQ ID NOs in the above methods can be 36, 53, 73, 211 and 242 or 199, 207, 255 and 354, or 45, 215, 65, 29, 190, 199, 207, 255 and 354.
The invention also encompasses the above methods containing the steps of further measuring the expression level of at least one gene encoding mRNA: corresponding to SEQ ID NOs: 142, 219 and 309; and/or corresponding to SEQ ID NOs: 9, 12 and 18; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 130, 190 and 276 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 9, 12 and 18 as depicted in Table 25. The invention also encompasses the above methods containing the steps of further measuring the expression level of at least one gene constitutively expressed in the sample.
Cadherin 3, type 1 (SEQ ID NO: 53) is mentioned in US20030194406; US 20050037439; and US 20040137539. Fibronectin (SEQ ID NO: 242) is mentioned in US6436642 and US20030104419. Secretory granule, neuroendocrine protein 1 (SEQ ID NO: 76) is mentioned in US20030232350; and US20040002067. Testican-1 (SEQ ID NO: 36) is mentioned in US20030108963; and US20050037463. Thyroid peroxidase (SEQ ID NO: 211) is mentioned in US6066449, US20030118553; US20030054571; WO9102061; and WO9856953. Chemokine C (C-C) motif ligand 18 (SEQ ID NO: 354) is mentioned in WO2005005601 and US20020114806. Pulmonary surfactant-associated protein B (SEQ ID NO: 355) is mentioned in US20030219760; and US20030232350. K+ channel beta subunit (SEQ ID NO: 207) is mentioned in US20030096782; and US 20020168638. Putative prostate cancer suppressor (SEQ ID NO: 178) is mentioned in WO2005020784. Bone marrow stromal cell antigen 1 (SEQ ID NO: 142) is mentioned in WO2004040014; and WO2005020784. Leucocyte immunoglobulin-like receptor-6b (SEQ ID NO: 219) is mentioned in US20030060614. Bridging integrator 2 (SEQ ID NO: 309) is mentioned in EP1393776; WO02057414; WO 0116158 and US6831063. Cysteine-rich, angiogenic inducer, 61 (SEQ ID NO: 9) is mentioned in WO2004030615; and WO9733995. Selenoprotein P, Plasma 1 (SEQ ID NO: 12) is mentioned in US20040241653 and WO2005015236. Insulin-like growth factor-binding protein 4 (SEQ ID NO: 18) is mentioned in WO2005015236; WO9203469; WO9203152; and EP0546053.
In this invention, the most preferred method for analyzing the gene expression pattern of a patient in the methods provided herein is through the use of a linear discrimination analysis program. The present invention encompasses a method of generating a posterior probability score to enable diagnosis of thyroid carcinoma patients by: obtaining gene expression data from a statistically significant number of patient biological samples; applying linear discrimination analysis to the data to obtain selected genes; and applying weighted expression levels to the selected genes with discriminate function factor to obtain a prediction model that can be applied as a posterior probability score. Other analytical tools can also be used to answer the same question such as, logistic regression and neural network approaches.
For instance, the following can be used for linear discriminant analysis:
where,
I(psid)=The log base 2 intensity of the probe set enclosed in parenthesis.
d(CP)=The discriminant function for the cancer positive class
d(CN)=The discriminant function for the cancer negative class
P(CP)=The posterior p-value for the cancer positive class
P(CN)=The posterior p-value for the cancer negative class
Numerous other well-known methods of pattern recognition are available. The following references provide some examples: Weighted Voting: Golub et al. (1999); Support Vector Machines: Su et al. (2001); and Ramaswamy et al. (2001); K-nearest Neighbors: Ramaswamy (2001); and Correlation Coefficients: van 't Veer et al. (2002).
Preferably, portfolios are established such that the combination of genes in the portfolio exhibit improved sensitivity and specificity relative to individual genes or randomly selected combinations of genes. In the context of the instant invention, the sensitivity of the portfolio can be reflected in the fold differences exhibited by a gene's expression in the diseased state relative to the normal state. Specificity can be reflected in statistical measurements of the correlation of the signaling of gene expression with the condition of interest. For example, standard deviation can be a used as such a measurement. In considering a group of genes for inclusion in a portfolio, a small standard deviation in expression measurements correlates with greater specificity. Other measurements of variation such as correlation coefficients can also be used in this capacity. The invention also encompasses the above methods where the specificity is at least about 40%, at least about 50% and at least about 60%. The invention also encompasses the above methods where the sensitivity is at least at least about 90% and at least about 92%.
The invention also encompasses the above methods where the comparison of expression patterns is conducted with pattern recognition methods. One method of the invention involves comparing gene expression profiles for various genes (or portfolios) to ascribe diagnoses. The gene expression profiles of each of the genes comprising the portfolio are fixed in a medium such as a computer readable medium. This can take a number of forms. For example, a table can be established into which the range of signals (e.g., intensity measurements) indicative of disease is input. Actual patient data can then be compared to the values in the table to determine whether the patient samples are normal, benign or diseased. In a more sophisticated embodiment, patterns of the expression signals (e.g., fluorescent intensity) are recorded digitally or graphically. The gene expression patterns from the gene portfolios used in conjunction with patient samples are then compared to the expression patterns.
Pattern comparison software can then be used to determine whether the patient samples have a pattern indicative of the disease. Of course, these comparisons can also be used to determine whether the patient is not likely to experience the disease. The expression profiles of the samples are then compared to the portfolio of a control cell. If the sample expression patterns are consistent with the expression pattern for cancer then (in the absence of countervailing medical considerations) the patient is treated as one would treat a thyroid cancer patient. If the sample expression patterns are consistent with the expression pattern from the normal/control cell then the patient is diagnosed negative for cancer.
Preferably, levels of up and down regulation are distinguished based on fold changes of the intensity measurements of hybridized microarray probes. A 1.5 fold difference is preferred for making such distinctions (or a p-value less than 0.05). That is, before a gene is said to be differentially expressed in diseased versus normal cells, the diseased cell is found to yield at least about 1.5 times more, or 1.5 times less intensity than the normal cells. The greater the fold difference, the more preferred is use of the gene as a diagnostic or prognostic tool. Genes selected for the gene expression profiles of this invention have expression levels that result in the generation of a signal that is distinguishable from those of the normal or non-modulated genes by an amount that exceeds background using clinical laboratory instrumentation.
Statistical values can be used to confidently distinguish modulated from non-modulated genes and noise. Statistical tests find the genes most significantly different between diverse groups of samples. The Student's T-test is an example of a robust statistical test that can be used to find significant differences between two groups. The lower the p-value, the more compelling the evidence that the gene is showing a difference between the different groups. Nevertheless, since microarrays measure more than one gene at a time, tens of thousands of statistical tests may be asked at one time. Because of this, one is unlikely to see small p-values just by chance and adjustments for this using a Sidak correction as well as a randomization/permutation experiment can be made. A p-value less than 0.05 by the T-test is evidence that the gene is significantly different. More compelling evidence is a p-value less then 0.05 after the Sidak correction is factored in. For a large number of samples in each group, a p-value less than 0.05 after the randomization/permutation test is the most compelling evidence of a significant difference.
The present invention encompasses microarrays or gene chips for performing the methods provided herein. The microarrays can contain isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25 where the combination is sufficient to characterize thyroid carcinoma or risk of relapse in a biological sample. The microarray preferably measures or characterizes at least about 1.5-fold over- or under-expression, provides a statistically significant p-value over- or under-expression, or a p-value is less than 0.05. Preferably, the microarray contains a cDNA array or an oligonucleotide array and may contain one or more internal control reagents. One preferred internal control reagent is a method of detecting PAX8 gene expression which can be measured using SEQ ID NOs: 409-411.
Preferably, an oligonucleotide in the array corresponds to the 3′ non-coding region of the gene the expression of which is being measured.
Another parameter that can be used to select genes that generate a signal that is greater than that of the non-modulated gene or noise is the use of a measurement of absolute signal difference. Preferably, the signal generated by the modulated gene expression is at least 20% different than those of the normal or non-modulated gene (on an absolute basis). It is even more preferred that such genes produce expression patterns that are at least 30% different than those of normal or non-modulated genes.
Preferred methods for establishing gene expression profiles include determining the amount of RNA that is produced by a gene that can code for a protein or peptide. This is accomplished by reverse transcriptase PCR (RT-PCR), competitive RT-PCR, real time RT-PCR, differential display RT-PCR, Northern Blot analysis and other related tests. While it is possible to conduct these techniques using individual PCR reactions, it is best to amplify complementary DNA (cDNA) or complementary RNA (cRNA) produced from mRNA and analyze it via microarray.
A number of different array configurations and methods for their production are known to those of skill in the art and are described in U.S. patents such as: U.S. Pat. Nos. 5,445,934; 5,532,128; 5,556,752; 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501; 5,561,071; 5,571,639; 5,593,839; 5,599,695; 5,624,711; 5,658,734; and 5,700,637.
Microarray technology allows for the measurement of the steady-state mRNA level of thousands of genes simultaneously thereby presenting a powerful tool for identifying effects such as the onset, arrest, or modulation of uncontrolled cell proliferation. Two microarray technologies are currently in wide use. The first are cDNA arrays and the second are oligonucleotide arrays. Although differences exist in the construction of these chips, essentially all downstream data analysis and output are the same. The product of these analyses are typically measurements of the intensity of the signal received from a labeled probe used to detect a cDNA sequence from the sample that hybridizes to a nucleic acid sequence at a known location on the microarray. Typically, the intensity of the signal is proportional to the quantity of cDNA, and thus mRNA, expressed in the sample cells. A large number of such techniques are available and useful. Preferred methods for determining gene expression can be found in U.S. Pat. Nos. 6,271,002; 6,218,122; 6,218,114; and 6,004,755.
Analysis of the expression levels is conducted by comparing such signal intensities. This is best done by generating a ratio matrix of the expression intensities of genes in a test sample versus those in a control sample. For instance, the gene expression intensities from a diseased tissue can be compared with the expression intensities generated from benign or normal tissue of the same type. A ratio of these expression intensities indicates the fold-change in gene expression between the test and control samples.
Gene expression profiles can also be displayed in a number of ways. The most common method is to arrange raw fluorescence intensities or ratio matrix into a graphical dendogram where columns indicate test samples and rows indicate genes. The data are arranged so genes that have similar expression profiles are proximal to each other. The expression ratio for each gene is visualized as a color. For example, a ratio less than one (indicating down-regulation) may appear in the blue portion of the spectrum while a ratio greater than one (indicating up-regulation) may appear as a color in the red portion of the spectrum. Commercially available computer software programs are available to display such data including “GENESPRING” from Silicon Genetics, Inc. and “DISCOVERY” and “INFER” software from Partek, Inc.
Modulated genes used in the methods of the invention are described in the Examples. The genes that are differentially expressed are either up regulated or down regulated in patients with thyroid cancer relative to those with benign thyroid diseases. Up regulation and down regulation are relative terms meaning that a detectable difference (beyond the contribution of noise in the system used to measure it) is found in the amount of expression of the genes relative to some baseline. In this case, the baseline is the measured gene expression of a benign disease patient. The genes of interest in the diseased cells are then either up regulated or down regulated relative to the baseline level using the same measurement method. Diseased, in this context, refers to an alteration of the state of a body that interrupts or disturbs, or has the potential to disturb, proper performance of bodily functions as occurs with the uncontrolled proliferation of cells. Someone is diagnosed with a disease when some aspect of that person's genotype or phenotype is consistent with the presence of the disease. However, the act of conducting a diagnosis or prognosis includes the determination of disease/status issues such as determining the likelihood of relapse, type of therapy and therapy monitoring. In therapy monitoring, clinical judgments are made regarding the effect of a given course of therapy by comparing the expression of genes over time to determine whether the gene expression profiles have changed or are changing to patterns more consistent with normal tissue.
Genes can be grouped so that information obtained about the set of genes in the group provides a sound basis for making a clinically relevant judgment such as a diagnosis, prognosis, or treatment choice. These sets of genes make up the portfolios of the invention. As with most diagnostic markers, it is often desirable to use the fewest number of markers sufficient to make a correct medical judgment. This prevents a delay in treatment pending further analysis as well unproductive use of time and resources.
One method of establishing gene expression portfolios is through the use of optimization algorithms such as the mean variance algorithm widely used in establishing stock portfolios. This method is described in detail in US patent publication number 20030194734. Essentially, the method calls for the establishment of a set of inputs (stocks in financial applications, expression as measured by intensity here) that will optimize the return (e.g., signal that is generated) one receives for using it while minimizing the variability of the return. Many commercial software programs are available to conduct such operations. “Wagner Associates Mean-Variance Optimization Application,” referred to as “Wagner Software” throughout this specification, is preferred. This software uses functions from the “Wagner Associates Mean-Variance Optimization Library” to determine an efficient frontier and optimal portfolios in the Markowitz sense is preferred. Use of this type of software requires that microarray data be transformed so that it can be treated as an input in the way stock return and risk measurements are used when the software is used for its intended financial analysis purposes.
The process of selecting a portfolio can also include the application of heuristic rules. Preferably, such rules are formulated based on biology and an understanding of the technology used to produce clinical results. More preferably, they are applied to output from the optimization method. For example, the mean variance method of portfolio selection can be applied to microarray data for a number of genes differentially expressed in subjects with cancer. Output from the method would be an optimized set of genes that could include some genes that are expressed in peripheral blood as well as in diseased tissue. If samples used in the testing method are obtained from peripheral blood and certain genes differentially expressed in instances of cancer could also be differentially expressed in peripheral blood, then a heuristic rule can be applied in which a portfolio is selected from the efficient frontier excluding those that are differentially expressed in peripheral blood. Of course, the rule can be applied prior to the formation of the efficient frontier by, for example, applying the rule during data pre-selection.
Other heuristic rules can be applied that are not necessarily related to the biology in question. For example, one can apply a rule that only a prescribed percentage of the portfolio can be represented by a particular gene or group of genes. Commercially available software such as the Wagner Software readily accommodates these types of heuristics. This can be useful, for example, when factors other than accuracy and precision (e.g., anticipated licensing fees) have an impact on the desirability of including one or more genes.
The gene expression profiles of this invention can also be used in conjunction with other non-genetic diagnostic methods useful in cancer diagnosis, prognosis, or treatment monitoring. For example, in some circumstances it is beneficial to combine the diagnostic power of the gene expression based methods described above with data from conventional markers such as serum protein markers (e.g., Cancer Antigen 27.29 (“CA 27.29”)). A range of such markers exists including such analytes as CA 27.29. In one such method, blood is periodically taken from a treated patient and then subjected to an enzyme immunoassay for one of the serum markers described above. When the concentration of the marker suggests the return of tumors or failure of therapy, a sample source amenable to gene expression analysis is taken. Where a suspicious mass exists, a fine needle aspirate (FNA) is taken and gene expression profiles of cells taken from the mass are then analyzed as described above. Alternatively, tissue samples may be taken from areas adjacent to the tissue from which a tumor was previously removed. This approach can be particularly useful when other testing produces ambiguous results.
The present invention encompasses methods of cross validating a gene expression profile and the profiles thus obtained, for thyroid carcinoma patients by: a. obtaining gene expression data from a statistically significant number of patient biological samples; b. randomizing sample order; c. setting aside data from about 10%-50% of samples; d. computing, for the remaining samples, for factor of interest on all variables and selecting variables that meet a p-value cutoff (p); e. selecting variables that fit a prediction model using a forward search and evaluating the training error until it hits a predetermined error rate; f. testing the prediction model on the left-out 10-50% of samples; g. repeating steps c., -g. with a new set of samples removed; and h. continuing steps c)-g) until 100% of samples have been tested and record classification performance. In this method, preferably, the gene expression data obtained in step h. is represented by genes from those encoding mRNA: corresponding to SEQ ID NOs: 1, 4, 7, 8, 10-11, 13-17, 19-24, 26-27, 29-31, 33-35, 37-38, 40-52, 54-72, 75-82, 84-135, 138-141, 144-151, 153-159, 161-162, 164, 166-173, 176-198, 200-201, 203-206, 208-209, 212-213, 215-218, 220-221, 223, 227-233, 235-241, 243-244, 246-249, 251, 253-254, 256-263, 265-289, 291-293, 295-308, 310-331, 333-341, 343-345, 347-348, 350-353 and 355-363; or recognized specifically by the probe sets from psids in Table 25 corresponding to SEQ ID NOs: 1, 4, 7, 8, 10-11, 13-17, 19-24, 26-27, 29-31, 33-35, 37-38, 40-52, 54-72, 75-82, 84-135, 138-141, 144-151, 153-159, 161-162, 164, 166-173, 176-198, 200-201, 203-206, 208-209, 212-213, 215-218, 220-221, 223, 227-233, 235-241, 243-244, 246-249, 251, 253-254, 256-263, 265-289, 291-293, 295-308, 310-331, 333-341, 343-345, 347-348, 350-353 and 355-363.
The present invention encompasses methods of independently validating a gene expression profile and the profiles thus obtained, for thyroid cancer patients by obtaining gene expression data from a statistically significant number of patient biological samples; normalizing the source variabilities in the gene expression data; computing for factor of interest on all variables that were selected previously; and testing the prediction model on the sample and record classification performance. In this method, preferably, the gene expression data obtained in step d. is represented by genes from those encoding mRNA: corresponding to SEQ ID NOs: 1, 4, 7, 8, 10-11, 13-17, 19-24, 26-27, 29-31, 33-35, 37-38, 40-52, 54-72, 75-82, 84-135, 138-141, 144-151, 153-159, 161-162, 164, 166-173, 176-198, 200-201, 203-206, 208-209, 212-213, 215-218, 220-221, 223, 227-233, 235-241, 243-244, 246-249, 251, 253-254, 256-263, 265-289, 291-293, 295-308, 310-331, 333-341, 343-345, 347-348, 350-353 and 355-363; or recognized specifically by the probe sets from psids in Table 25 corresponding to SEQ ID NOs: 1, 4, 7, 8, 10-11, 13-17, 19-24, 26-27, 29-31, 33-35, 37-38, 40-52, 54-72, 75-82, 84-135, 138-141, 144-151, 153-159, 161-162, 164, 166-173, 176-198, 200-201, 203-206, 208-209, 212-213, 215-218, 220-221, 223, 227-233, 235-241, 243-244, 246-249, 251, 253-254, 256-263, 265-289, 291-293, 295-308, 310-331, 333-341, 343-345, 347-348, 350-353 and 355-363.
The present invention encompasses methods of generating a posterior probability to enable diagnosis of thyroid carcinoma patients by obtaining gene expression data from a statistically significant number of patient biological samples; applying linear discrimination analysis to the data to obtain selected genes; applying weighted expression levels to the selected genes with discriminate function factor to obtain a prediction model that can be applied as a posterior probability score. For instance, the following can be used for Linear Discriminant Analysis:
where,
I(psid)=The log base 2 intensity of the probe set enclosed in parenthesis.
d(CP)=The discriminant function for the cancer positive class
d(CN)=The discriminant function for the cancer negative class
P(CP)=The posterior p-value for the cancer positive class
P(CN)=The posterior p-value for the cancer negative class
The present invention encompasses methods of generating a thyroid carcinoma diagnostic patient report and reports obtained thereby, by obtaining a biological sample from the patient; measuring gene expression of the sample; applying a posterior probability score thereto; and using the results obtained thereby to generate the report. The report can also contain an assessment of patient outcome and/or probability of risk relative to the patient population.
The present invention encompasses compositions containing at least one probe set from: SEQ ID NOs: 36, 53, 73, 211 and 242; and/or SEQ ID NOs: 199, 207, 255 and 354; or the psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25.
The present invention encompasses kits for conducting an assay to determine thyroid carcinoma diagnosis in a biological sample containing: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25. The SEQ ID NOs. can be 36, 53, 73, 211 and 242, 199, 207, 255 and 354 and 45, 215, 65, 29, 190, 199, 207, 255 and 354.
Kits made according to the invention include formatted assays for determining the gene expression profiles. These can include all or some of the materials needed to conduct the assays such as reagents and instructions and a medium through which nucleic acid sequences, their complements, or portions thereof are assayed.
Articles of this invention include representations of the gene expression profiles useful for treating, diagnosing, prognosticating, and otherwise assessing diseases. These profile representations are reduced to a medium that can be automatically read by a machine such as computer readable media (magnetic, optical, and the like). The articles can also include instructions for assessing the gene expression profiles in such media. For example, the articles may comprise a CD ROM having computer instructions for comparing gene expression profiles of the portfolios of genes described above. The articles may also have gene expression profiles digitally recorded therein so that they may be compared with gene expression data from patient samples. Alternatively, the profiles can be recorded in different representational format. A graphical recordation is one such format. Clustering algorithms such as those incorporated in “DISCOVERY” and “INFER” software from Partek, Inc. mentioned above can best assist in the visualization of such data.
Different types of articles of manufacture according to the invention are media or formatted assays used to reveal gene expression profiles. These can comprise, for example, microarrays in which sequence complements or probes are affixed to a matrix to which the sequences indicative of the genes of interest combine creating a readable determinant of their presence. Alternatively, articles according to the invention can be fashioned into reagent kits for conducting hybridization, amplification, and signal generation indicative of the level of expression of the genes of interest for detecting cancer.
The present invention encompasses articles for assessing thyroid carcinoma status containing: materials for detecting isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25. The SEQ ID NOs. can be 36, 53, 73, 211 and 242; 199, 207, 255 and 354; or 45, 215, 65, 29, 190, 199, 207, 255 and 354.
The present invention encompasses diagnostic/prognostic portfolios containing isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes from those encoding mRNA: corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242; and/or corresponding to SEQ ID NOs: 199, 207, 255 and 354; or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 36, 53, 73, 211 and 242 as depicted in Table 25; and/or recognized specifically by the probe sets from psids corresponding to SEQ ID NOs: 199, 207, 255 and 354 as depicted in Table 25 where the combination is sufficient to characterize thyroid carcinoma status or risk of relapse in a biological sample. Preferably, the portfolio measures or characterizes at least about 1.5-fold over- or under-expression or provides a statistically significant p-value over- or under-expression. Preferably, the p-value is less than 0.05.
The following examples are provided to illustrate but not limit the claimed invention. All references cited herein are hereby incorporated by reference herein.
Tissue samples
Fresh frozen thyroid benign diseases, follicular adenoma, follicular carcinoma, and papillary carcinoma samples were obtained from different commercial vendors including Genomics Collaborative, Inc. (Cambridge, Mass.), Asterand (Detroit, Mich.), and Proteogenex (Los Angeles, Calif.). All samples were collected according to an Institutional Review Board approval protocol. Patients demographic and pathology information were also collected. The histopathological features of each sample were reviewed to confirm diagnosis, estimate sample preservation and tumor content.
RNA isolation
Standard TriZol protocol was used for all the RNA isolations. Tissue was homogenized in TriZol reagent (Invitrogen, Carlsbad, Calif.). Total RNA was isolated from TriZol and precipitated at −20° C. with isopropyl alcohol. RNA pellets were washed with 75% ethanol, dissolved in water and stored at −80° C. until use. RNA integrity was examined with Agilent 2100 Bioanalyzer RNA 6000 NanoAssay (Agilent Technologies, Palo Alto, Calif.).
Linear Discrimination Analysis
Linear Discriminant Analysis was performed using these steps: calculation of a common (pooled) covariance matrix and within-group means; calculation of the set of linear discriminant functions from the common covariance and the within-group means; and classification using the linear discriminant functions.
Plugging the chip intensity readings for each probe into the following equation can be used to derive the posterior probability of an unknown thyroid sample as either cancer positive or negative. For example, if a thyroid sample is tested with the assay and gives a p(CP)>0.5 this sample will be classified as thyroid cancer.
For the 4 gene signature:
d(CP)=−50.9964+0.220424(I(32128
d(CN)=−46.7445+0.374751(I(32128
For the 5 gene signature:
d(CP)=−135.931+4.737838(I(202363
d(CN)=−128.978+4.610498(I(202363
where,
I(psid)=The log base 2 intensity of the probe set enclosed in parenthesis.
d(CP)=The discriminant function for the cancer positive class
d(CN)=The discriminant function for the cancer negative class
P(CP)=The posterior p-value for the cancer positive class
P(CN)=The posterior p-value for the cancer negative class
Two-Round aRNA Amplification
aRNA was amplified from 10 ng total RNA using the RiboBeast 2-Round Aminoallyl-aRNA Amplification kit (Epicentre, WI), a T7 based RNA linear amplification protocol, with some modifications. Total RNA was reverse transcribed using an oligo(dT) primer containing a T7 RNA polymerase promoter sequence and Superscript III RT. The second-strand synthesis was carried out using Bst DNA polymerase. An extra step of incubation with an exonuclease mix of Exo I and Exo VII was performed to reduce background. The double-stranded cDNA served as the template for T7-mediated linear amplification by in vitro transcription. For the second round of amplification, instead of using the RiboBeast reagents, the ENZO BioArray HighYield RNA Transcript Labeling kit (Affymetrix, CA) was used in place of the in vitro transcription step of Aminoallyl-aRNA. The aRNA was quantified by Agilent Nano Chip technology.
Labeled cRNA was prepared and hybridized with the high-density oligonucleotide array Hu133A Gene Chip (Affymetrix, Santa Clara, Calif.) containing a total of 22,000 probe sets. Hybridization was performed according to a standard protocol provided by the manufacturer. Arrays were scanned using Affymetrix protocols and scanners. For subsequent analysis, each probe set was considered as an independent gene. Expression values for each gene were calculated by using Affymetrix Gene Chip analysis software MAS 5.0. All chips met the following quality control standards: the percentage of “presence” call, the scaling factor, the background level, and the noise level have to be within the range of mean plus or minus 3 standard deviation. All chips used for subsequent analysis have passed these quality control criteria. Sample collection for signature selection and independent validation is summarized in Table 1.
A. Gene Selection
A total of 98 samples including 31 primary papillary thyroid tumors, 21 follicular thyroid cancers, 33 follicular adenoma, and 13 benign thyroid tissues were analyzed by using Affymetrix human U133A gene chips. Five gene selection criteria were applied to the entire data set to obtain a limited number of genes for subsequent gene marker or signature identification:
1. Genes with at least one “Present Call” in this sample set were considered.
2. Genes with more than one “Present Call” in 12 PBL samples were excluded.
3. Only genes with chip intensity larger than 200 in all samples were selected.
4. Using genes that passed the above three criteria, we performed a variety of analyses, as listed in Table 2, to identify genes that are either up-regulated or down-regulated in thyroid tumors.
5. Finally, genes with expression change greater than 1.4-fold were selected.
The final number of selected genes for signature identification is 322, described in Table 25, SEQ ID NOs: 1, 4, 7, 8, 10-11, 13-17, 19-24, 26-27, 29-31, 33-35, 37-38, 40-52, 54-72, 75-82, 84-135, 138-141, 144-151, 153-159, 161-162, 164, 166-173, 176-198, 200-201, 203-206, 208-209, 212-213, 215-218, 220-221, 223, 227-233, 235-241, 243-244, 246-249, 251, 253-254, 256-263, 265-289, 291-293, 295-308, 310-331, 333-341, 343-345, 347-348, 350-353 and 335-363. The data obtained from the 322 selected genes are provided in Table 3 and summarized in Table 4.
B. Signature Identification using Linear Discrimination Analysis
We used a forward selection process that adds one gene at a time until the posterior error as evaluated by a linear discriminator is less than or equal to 0.1. A four-gene signature was discovered using this approach with the 322 genes. The identities of these 4 genes are listed in Table 5 and 16 and their chip data are shown in Tables 6 and 7.
Leave One Out Cross Validation (LOOCV) resulted in 92% sensitivity and 61% specificity, shown in Table 4. The ROC curve gave an AUC of 0.897, as shown in
C. Manual Selection of Markers
Individual genes were selected with an aim to formulate a RT-PCR based assay. Comparison of gene expression profiles between thyroid cancers and non-cancer tissues has identified a five-gene signature from these 322 genes. The identities of these five genes are shown in Tables 8 and 16 and the chip data are shown in Table 9. The performance of this signature was assessed using LDA in the 98 samples, and the signature gives 92% sensitivity and 70% specificity, shown in Table 10. The ROC curve gave an AUC of 0.88, as shown in
D. Cross Validation with the 74 Independent Thyroid Samples
74 independent thyroid samples were processed and profiled with the U133a chip, and the chip data for these two signatures are shown in the Table 11. The performances of the 4-gene and the 5-gene signatures were assessed with LDA. Both signatures gave equivalent performance in these samples compared to the 98 training samples. The sensitivity and specificity for both signatures are shown in Table 12, and the ROC curves are demonstrated in
E. Control Gene Marker Identification
With the 98 thyroid samples and 12 PBL samples we selected two groups of genes as sampling control. One group consists of genes that are expressed in thyroid but not in PBL, the second group includes genes that are expressed in PBL but not in thyroid. The full gene list and corresponding chip data are shown in Tables 13a and 13b. From these genes we selected six genes that are abundant and the differentiation between thyroid and PBL is relatively large. Their expression profile was validated in the 74 independent thyroid samples. The identities of these six genes are listed in Table 14 and their chip data are shown in Tables 15a and 15b.
F. Signature Normalized to Control Genes
We further examined our 5-gene and 4-gene signatures by normalizing these genes to the three selected thyroid control genes as an algorithm for gene chip data normalization. The average fluorescent intensities of the three thyroid control genes were used for signature gene signal normalization. The performance of both signatures improved slightly when these two signatures were normalized. The sensitivity and specificity of the two signatures are listed in Table 16, and the ROC curves are shown in
G. Signature Validation with Two-Round Amplified Probes
To determine if the FNA samples lack sufficient thyroid cells to provide enough probe material for hybridizing to the Affymetrix U133a gene chips after one round of amplification, two-round amplification of the target RNAs we performed two-round amplification with 47 samples that are among the 74 independent validation sample set. The data obtained show that the performances of the 5-gene and 4-gene signatures are identical with either one-round or two-round amplifications. The ROC curves of the two gene signatures with two different target preparations are shown in
A. Cross Validation with the 83 Independent Fresh Frozen Thyroid Samples
83 independent thyroid samples were processed and profiled with the U133a chip. The number of samples in each category is list in Table 17.
The performance of the 4-gene signature and the 5-gene signature was assessed with LDA using the same cut-off value as in the training set. Both signatures gave equivalent performance in these samples, and they are comparable with the performance in the 98 training set. The sensitivity and specificity for both signatures are shown in Table 18, and the ROC curves are demonstrated in
B. Signature Validation with the 47 Fine Needle Aspirate (FNA) Thyroid Samples
47 thyroid FNA samples were processed and profiled with the UI33a chip. The number of samples in each category is list in Table 19.
The performance of the 4-gene signature and the 5-gene signature was assessed with LDA model. Both signatures gave equivalent performance in the FNA samples, and they are comparable with the performance in the 98 training set. The sensitivity and specificity for both signatures are shown in Table 20, and the ROC curves are demonstrated in
C. Signature Performance in 28 Paired Fresh Frozen and FNA Thyroid Samples
Within the 83 fresh frozen and the 47 FNA sample collections there are 28 samples that were from the same patient. The direct comparison of our signatures in these paired samples demonstrates how well the signature will translate into the final molecular assay. The performance of the 4-gene signature and the 5-gene signature was assessed with the LDA model. Both signatures gave equivalent performance in the fresh frozen and FNA samples. These results demonstrated that our 4-gene and 5-gene signatures can perform equally well in both sample types, and proved the approach using fresh frozen samples for gene/signature identification is valid. The sensitivity and specificity for both signatures are shown in Table 21, and the ROC curves are demonstrated in
Sample Acquisition:
In order to determine whether a subset of the gene profiles and/or controls would give adequate specificity and sensitivity with RT-PCR, the following experiment was performed. The following has the advantage of requiring only one round of RNA amplification.
A total of 107 thyroid biopsies were analyzed in our study: 26 follicular adenoma, 23 follicular carcinoma, 38 papillary carcinoma, 5 normal, 3 papillary carcinoma follicular variant, 3 Hashimoto thyroiditis, 2 microfollicular adenoma, 1 diffuse goiter, 1 goiter with papillary hyperplasia, 1 Hurthle cell adenoma, 1 hyperplasia with papillary structure, 1 multinodular goiter, 1 oncocytic hyperplasia, 1 thyroiditis. Total RNA isolation was extracted by using the Trizol reagent according to the manufacturer's instructions. RNA concentrations were determined by absorbance readings at 260 nm with the Gene-Spec (Hitachi) spectrophotometer. The isolated RNA was stored in RNase-free water at −80° C. until use.
Primer and Probe Design:
The primers and hydrolysis probes were designed using Oligo 6.0 and the Genebank sequences for thyroid cancer status markers (Table 22). These primers and probe sets were designed such that the annealing temperature of the primers was 62° C. and the probes 5-10° C. higher and amplicon size ranges from 100-150 bp. Genomic DNA amplification was excluded by designing our primers around exon-intron splicing sites. Hydrolysis probes were labeled at the 5′ nucleotide with FAM as the reporter dye and at 3′ nucleotide with TAMRA as the quenching dye.
Real-Time Quantitative RT-PCR:
Gene specific real-time quantitative RT-PCR amplification of 21 thyroid cancer status genes and a housekeeping gene was performed using the TaqMan One-Step RT-PCR Master Mix (2×) (Applied Biosystems) and the ABI Prism 7900HT sequence detection system (Applied Biosystems). In a 25 μl one-step reaction total RNA (10 ng) was added to a mix that contained: 1×RT-PCR Master Mix, 0.25 U/μl Multiscribe Enzyme, 0.6 μM primers and 0.25 μM probe. Cycling parameters were 48° C. for 30 min and 95° C. 10 min, followed by 40 cycles of 95° C. 15 sec and 62° C. 1 min. Real-time PCR monitoring was achieved by measuring fluorescent signal at the end of the annealing phase for each cycle. The number of cycles to reach the fluorescence threshold was defined as the cycle threshold (Ct value). To minimize the errors arising from the integrity of the RNA in the samples β-actin mRNA was amplified as an internal reference. External standards were prepared by 10-fold serial dilutions of known thyroid cancer positive RNA and used to ensure linearity throughout our assays. Results were expressed in mean Ct value and samples were excluded that had a standard deviation greater then one. The results are provided in Tables 23a, 23b, 23c, 24a and 24b.
The data show that the two gene signature shown in Table 23b is not as sensitive and specific as the four-gene signature from which it was derived. Table 24 shows that use of the PAX8 gene in an RT-PCR reaction as a control for thyroid-specific tissue is effective in an RT-PCR reaction.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, the descriptions and examples should not be construed as limiting the scope of the invention.
Number | Date | Country | |
---|---|---|---|
60683173 | May 2005 | US |