The present invention relates to the diagnosis of thyroid cancer and thyroid cancer types based on DNA methylation analysis.
Thyroid nodules are widely spread and approximately 20% of the people develop a palpable nodule during live and even up to 70% of the adults have nodules detectable by sonography or autopsy. However, incidence is increasing, mainly to improved diagnostic technologies, the mortality rate decreases and only 5-15% of those nodules prove to be malignant. In Austria for instance malignant nodules have a prevalence of 10-20%, with 9879 new diagnosed cases in 2009, whereas women (n=7321) are at higher risk than man (n=2558). The current method of choice for thyroid nodule diagnostic is fine needle aspiration (FNA), followed by cytological assessment. FNA is recognized as minimal invasive method for the evaluation of the nodules, but the method is far away from perfect in terms of specificity and sensitivity. However, FNA contributes to improved diagnostics as it helps to avoid diagnostic surgery in 62-85% of the patients. Nevertheless, it produces a large number of indeterminate or suspicious results. Patients with such an indeterminate diagnosis should be scheduled for a diagnostic surgery, which goes along with either lobectomy or thyroidectomy in 20-30% of the case due to confirmed malignancy. This leads to an overtreatment of a high number of patients. The introduction of additional diagnostics to avoid unnecessary surgeries would also impact on the health care system which can reduce costs for the health care system at a large scale. In the clinical setting the main challenge is the separation of follicular adenomas (FTA) from follicular carcinomas (FTC), which is very challenging by non-operative diagnostics.
In the past it has been clearly shown that molecular techniques like expression profiling or analyzing the DNA methylation profile can add substantial value to the discrimination of different tumor entities. Vierlinger et al. (BMC Med Genomics 2011, 4:30; and WO 2009/026605) executed a meta-analysis on 4 independent expression datasets for the identification of biomarkers for PTC. They showed that the expression profile of a single gene (SERPINA1) provides sufficient information to discriminate PTC from all other major histological thyroid entities with very high precision (sensitivity=1; specificity=0.90).
WO 2012/068400 focuses on miRNA expression analysis in the diagnosis of thyroid cancer.
WO 2010/086388 and WO 2010/086389 showed that DNA methylation analysis can be used in the diagnosis of various tumor diseases, especially lung cancer. This was done using a preselected marker set of high relevance in cancer settings.
Ryan et al., The Jour. of Clinic. Endocr. & Metab. 99 (2) (2014): E329-E337 relates to methylated CpG islands in case of PTC.
EP 2 518 166 A2 relates to marker sets for differential expression based thyroid cancer detection.
Probes for genetic testing are used on common platforms marketed by Illumina Inc., such as the Illumina HumanMethylation450 BeadChip (2011).
Rodriguez-Romero et al. (J. Clin. Endocrinol. Metab. 2013, 98:2811-2821) measured DNA methylation in thyroid nodules using a previous platform from Illumina which contained probes for 27000 CpG sites. They report 8613 CpG sites as differentially methylated at a p-value <0.05, but do not report any diagnostically relevant values (accuracies, AUC-values, etc. . . . ). Furthermore, they do not report any combination of markers to be diagnostically relevant. Thus this data was of little practical usability in the clinical setting.
Regardless of these advances, there remains a need for powerful diagnostic methods that provide high reliability and resolution, in particular in distinguishing subtypes of thyroid cancer.
The present invention provides a method of distinguishing a thyroid cancer type or risk thereof, comprising the step of determining the DNA methylation status of at least 3 thyroid cancer genes of a sample of a subject, wherein the at least 3 thyroid cancer genes are selected from three or more of the genes of table 1 and/or table 2, and comparing the methylation status of said genes with a control sample, thereby identifying thyroid cancer DNA in the sample, with the proviso that at least one thyroid cancer gene is selected from TREM1, LRP2, NEK11, ABTB2, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, C20orf5, CAPS, CHKA, CIITA, CIT, CLN5, COBL, COL22A1, CPLX2, DERL3, DNAH17, DNAH9, ELMO1, ELOVL5, ENO2, FAM20A, FMOD, FRMPD2, GALNT9, GJB6, GRIN2C, HK1, HLA-DOA, HOXD9, IFT140, IL17RD, IP6K3, ITM2C, ITPR1, KCNAB1, KCNN4, KRT80, LILRB1, LIPH, LOC100130238, LRRC23, LYSMD2, MACC1, MICALCL, MINA, MPPED2, MTSS1, MYO1G, NRXN2, NT5C2, NTSR1, PAG1, a PCDHA other than PCDHA13, PCNXL2, PCNXL2, PDZK1IP1, PDZRN4, PER1, PIM3, PRDM11, PRR7, PTHLH, PTPRF, RUNX2, SH2B3, SH3GL3, SLC22A9, SORBS2, SPC24, SUPT3H, SYN2, TFAP2B, TIMP4, TMC6, TMC8, TMEM204, TMOD2, TRIM29, UHRF1, WSCD2, ZSCAN18.
Surprisingly, although it prima facie appeared that Rodriguez-Romero et. al. (supra) provided a thorough investigation of DNA methylation in thyroid cancer using DNA methylation analysis of various hypo- and hypermethylated genes, the genetic methylation markers and methylation patterns identified by the present invention differed significantly from the genes and patterns found by Rodriguez-Romero et al. The invention further improved prior art attempts by including reliable significance values.
The present invention provides an identifier based on DNA methylation distinguishing thyroid tumor types, including the differentiation between benign (FTA, SN) from malignant (FTC, PTC) cases and distinguishing FTCs from FTAs. The unique genetic markers are not only backed-up by distinguishing DNA methylation patterns but also by their relevance towards mRNA expression. The information provided by the invention is useable in the clinics and can boost the current diagnostic procedures by aiding the cytological assessment not only of indeterminate cases, resulting in higher discrimination power of benign and malignant cases, as 74845913.1-3 well as between FTAs and FTCs. The inventive diagnosis allows improved patient treatment and patient care, towards personalized medicine.
Also disclosed are set comprising probes or primers suitable for the inventive methods.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
The present invention provides methylation specific marker genes for use in methylation analysis and expression analysis in the diagnosis in thyroid cancer. These genes are given in tables 1 and 2. The inventive genes are identified in the tables by Gene Symbols (column 4) and by at least one chromosome positions (columns 2 and 3), which identify preferred potentially methylated nucleotide positions of these genes. The genes are further identified by the probe ids (column 1), which identify a CpG site (at the chromosome positions) in these genes, especially in their regulatory elements.
The one or more nucleic acid that is preferably determined according to the invention is given by reference to the chromosomal locus (column MAPINFO in tables 1 and 2), which together with the chromosome number (column CHR) refers to the hg19 human genome assembly (version “GRCh/hg19” of February 2009—see http://genome-euro.ucsc.edu) and identifies an exact position in the genome by a single base). Genetic references herein always refer to the hg19 human genome assembly. Probe sequences (According to probe ids) were made available by Illumina and published by Sandoval et. al. (Sandoval et al. Epigenetics 2011; 6:692-702). In the tables, probe ids refer to the sequences represented on the array platform. Each one is used to interrogate a specific CpG site. Chromosome and Mapinfo uniquely identify the location of the first nt of each probe. Methylation of genomic regions near transcription start sites, CpG sites (including CgG islands and CpG shores) and in the first exon is usually associated with reduced gene expression. Methylation at other positions, e.g. in regulatory silencer or elements or repressors, may lead to increased gene expression. The present invention is based on an analysis of the methylation status in a genetic region of these genes, such as in the promoter region or other regulatory regions, as well as regions in the open reading frame, including exon or intron portions. Regulatory genetic portion, that are potentially methylated, may be in 5′ (upstream) or 3′ (downstream) direction of the open reading frame (coding region). Novel genes or novel gene combinations (of which a minority of the individual genes might have been known before) are provided which provide an improvement in thyroid cancer or thyroid condition identification.
The present invention also relates to a set, such as in a kit, of primer and/or probes specific to potentially methylated regions of the inventive genes. Primers are preferably provided as primer pairs. The set is suitable for performing the inventive method, which primers and/or probes are specific for targeting a potentially methylated region in a DNA molecule of one or more of the genes selected from table 1 and/or table 2. Such a set can be a set of PCR primers or a microarray comprising the probes.
The following detailed description relates to all aspects of the invention likewise: The inventive method can be performed by any embodiment of set or the primers and/or probes and the inventive set can be used for or be suitable for, i.e. comprising the means for performing, any of the inventive methods. Of course all described embodiments can be combined with each other as is apparent to a skilled practitioner. Further aspects and embodiments are disclosed in the claims, which can be combined with any embodiment in other claims or described in the detailed description. Where claims require a proviso, subject matter of these claims is also disclosed without said proviso, as it may be disregarded in other embodiments.
The inventive genes of tables 1 and 2 are particularly: ABLIM3, ABTB2, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, C20orf5, CAPS, CDH13, CHKA, CIITA, CIT, CLN5, COBL, COL22A1, CPLX2, CYB561, DERL3, DNAH17, DNAH9, ELMO1, ELOVL5, ENO2, EPHA10, FAM20A, FMOD, FRMD4A, FRMPD2, GAD1, GALNT9, GJB6, GRIN2C, HK1, HLA-DOA, HOXB4, HOXD9, IFT140, IL17RD, IP6K3, IRF5, ITM2C, ITPR1, KCNAB1, KCNN4, KLK10, KRT80, LILRB1, LIPH, LOC100130238, LRP2, LRRC23, LYSMD2, MACC1, MICALCL, MINA, MIOX, MPPED2, MTSS1, MYO1G, NEK11, NRXN2, NT5C2, NTSR1, PAG1, PCDHA, PCNXL2, PCNXL2, PDZK1IP1, PDZRN4, PER1, PIM3, PRDM11, PRR7, PTHLH, PTPRF, RBP1, RUNX2, SH2B3, SH3GL3, SLC22A9, SORBS2, SPC24, STRA6, SUPT3H, SYN2, TBX2, TFAP2B, TIMP4, TMC6, TMC8, TMEM204, TMOD2, TREM1, TRIM29, UHRF1, WSCD2, ZIC1, ZSCAN18.
All genes of tables 1 and 2 are suitable to distinguish non-cancerous from cancerous indications, wherein table 1 is specialized for grouping non-cancerous and cancerous conditions together (e.g. normal samples, Struma nodosa (SN) and FTA as non-cancerous and PTC and FTC as cancerous) and table 2 is specialized to distinguish FTA and FTC. The markers of table 2 are preferably used to distinguish FTC from FTA in a sample from a patient which/who is suspected of having either FTC or FTA, e.g. as indicated in a previous thyroid or thyroid sample inspection.
The sample may be of a patient who has an enlarged thyroid gland, which may be due to non-cancerous nodes (e.g. SN or FTA) or due to a cancerous condition (e.g. FTA or PTC). The inventive method may also be used on a sample with any thyroid size for risk assessment and prognosis.
Preferably the genes are selected from List 1, which is: ABLIM3, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, CHKA, CIITA, CIT, COBL, CYB561, DNAH9, ELMO1, EPHA10, FAM20A, FMOD, GJB6, HK1, IFT140, TMEM204, IL17RD, IP6K3, IRF5, ITPR1, KCNAB1, KCNN4, KLK10, KRT80, LIPH, LRP2, MACC1, MICALCL, MINA, MIOX, MPPED2, MTSS1, MYO1G, NEK11, PAG1, PCNXL2, PDZK1IP1, PDZRN4, PIM3, PRDM11, PRR7, RUNX2, SORBS2, SPC24, STRA6, SUPT3H, RUNX2, SYN2, TIMP4, TBX2, TMC6, TMC8, TREM1, UHRF1, WSCD2 (genes of table 1); and List 2, which is: ACOT7, PTPRF, C1orf21, PCNXL2, GAD1, HOXD9, ITM2C, RBP1, ZIC1, KCNAB1, PCDHA, ABLIM3, CPLX2, HLA-DOA, TREM1, TFAP2B, ELOVL5, COBL, COL22A1, FRMD4A, FRMPD2, NT5C2, ABTB2, SLC22A9, NRXN2, TRIM29, LRRC23, ENO2, PTHLH, WSCD2, SH2B3, CIT, GALNT9, LOC100130238, CLN5, TMOD2, LYSMD2, SH3GL3, CDH13, PER1, HOXB4, AXIN2, GRIN2C, DNAH17, CAPS, SPC24, LILRB1, ZSCAN18, C20orf85, NTSR1, DERL3 (genes of table 2). Gene sequences and further information is available for each of these Gene Symbols at a human genome database, such as the hg19 human genome assembly version “GRCh/hg19” of February 2009.
Especially preferred are markers or marker combinations with high AUC values, such as marker genes TREM1, LRP2 or NEK11, each one independently: alone or in combination with any one of the markers of tables 1 and 2. Especially preferred is the 3-marker combination of TREM1, LRP2 and NEK11, alone or in combination with further markers, especially further markers of tables 1 or 2.
In all aspects and embodiments of the inventions PDCHA, which stands for PCDHA complex (protocadherin alpha and subfamily C), is preferably determined at any one or more (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15) of its members selected from PCDHA9, PCDHA6, PCDHA4, PCDHA13, PCDHAC1, PCDHA10, PCDHA8, PCDHA3, PCDHA1, PCDHA5, PCDHA12, PCDHAC2, PCDHA2, PCDHA7 and/or PCDHA11. The PCDHA is preferably a PCDHA other than PCDHA13, or a combination of such other PCDHAs.
74845913.1-7
Especially preferred, the genes include genes selected from are ACOT7, C1orf21, PCNXL2, KCNAB1, ABLIM3, TREM1, COBL, WSCD2, CIT, AXIN2, SPC24 (genes of both tables 1 and 2).
In further preferred embodiments, the markers used in any embodiment of the invention do not require (or even—but not necessarily—exclude) markers ABLIM3, CYB561, EPHA10, IRF5, KLK10, MIOX, STRA6 and TBX2 (List 3a), or markers ZIC1, PCDHA13, ABLIM3, FRMD4A and HOXB4 (List 3b). In further—combinable with the above—preferred embodiments, also markers GAD1, RBP1, and CDH13 (List 3c) are not prescribed for use or even excluded. In further—combinable with the above—preferred embodiments, also markers KCNAB1 and LRP2 (List 3d) are not prescribed for use or even excluded. Preferably, at least one of genes ABTB2, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, C20orf5, CAPS, CHKA, CIITA, CIT, CLN5, COBL, COL22A1, CPLX2, DERL3, DNAH17, DNAH9, ELMO1, ELOVL5, ENO2, FAM20A, FMOD, FRMPD2, GALNT9, GJB6, GRIN2C, HK1, HLA-DOA, HOXD9, IFT140, IL17RD, IP6K3, ITM2C, ITPR1, KCNAB1, KCNN4, KRT80, LILRB1, LIPH, LOC100130238, LRP2, LRRC23, LYSMD2, MACC1, MICALCL, MINA, MPPED2, MTSS1, MYO1G, NEK11, NRXN2, NT5C2, NTSR1, PAG1, PCDHA (not PCDHA13), PCNXL2, PCNXL2, PDZK1IP1, PDZRN4, PER1, PIM3, PRDM11, PRR7, PTHLH, PTPRF, RUNX2, SH2B3, SH3GL3, SLC22A9, SORBS2, SPC24, SUPT3H, SYN2, TFAP2B, TIMP4, TMC6, TMC8, TMEM204, TMOD2, TREM1, TRIM29, UHRF1, WSCD2, ZSCAN18 is used or provided for with methylation specific probes or primers in the inventive set (but not necessarily in any embodiment of the invention; claim 1 is also specifically disclosed without the proviso).
Thus in preferred embodiments the inventive markers are of List 1a: ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, CHKA, CIITA, CIT, COBL, DNAH9, ELMO1, FAM20A, FMOD, GJB6, HK1, IFT140, TMEM204, IL17RD, IP6K3, ITPR1, KCNAB1, KCNN4, KRT80, LIPH, LRP2, MACC1, MICALCL, MINA, MPPED2, MTSS1, MYO1G, NEK11, PAG1, PCNXL2, PDZK1IP1, PDZRN4, PIM3, PRDM11, PRR7, RUNX2, SORBS2, SPC24, SUPT3H, RUNX2, SYN2, TIMP4, TMC6, TMC8, TREM1, UHRF1, WSCD2; and
List 2a: ACOT7, PTPRF, C1orf21, PCNXL2, GAD1, HOXD9, ITM2C, RBP1, KCNAB1, PCDHA (excluding PCDHA13 or all PCDHA members), CPLX2, HLA-DOA, TREM1, TFAP2B, ELOVL5, COBL, COL22A1, FRMPD2, NT5C2, ABTB2, SLC22A9, NRXN2, TRIM29, LRRC23, ENO2, PTHLH, WSCD2, SH2B3, CIT, GALNT9, LOC100130238CLN5, TMOD2, LYSMD2, SH3GL3, CDH13, PER1, AXIN2, GRIN2C, DNAH17, CAPS, SPC24, LILRB1, ZSCAN18, C20orf85, NTSR1, DERL3. List 1a and List and 2a are based on List 1 and List 2, respectively, not including the above mentioned less-preferred markers.
Hyper- or hypomethylation of genes ABLIM3, CYB561, EPHA10, IRF5, KLK10, MIOX, STRA6 and TBX2, or markers ZIC1, PCDHA13, ABLIM3, FRMD4A and HOXB4 in connection with thyroid cancer has been mentioned in Rodriguez-Romero et al. (supra). Regrettably, Rodriguez-Romero et al. did not provide any particular information, like AUC or fold changes or significance that would allow a diagnosis or thyroid cancer state investigation using these markers. The present invention can improve on Rodriguez-Romero et al. by providing improved embodiments with these markers—in other embodiments these markers are not necessarily used. Thus, if these markers are used or included in the set, it is preferred to do this in connection with any one of the preferred inventive embodiments, e.g. as defined in the dependent claims. Such preferred embodiments are e.g. using these markers in combination with any other combination of marker genes of tables 1 and 2) not of List 3a,b,c, possibly further not of List 3d; using these markers in when using probes specific for the potentially methylated regions as defined by the position given in tables 1 and 2; detecting the methylation status of these genes in more than one potentially methylated region, such as 2 or 3 potentially methylated regions, such potentially methylated regions being preferably defined by the positions given in tables 1 and 2; using these markers of list 3a,b,c for distinguishing special thyroid conditions such as FTA from FTC; combining a methylation status analysis with a gene expression analysis; etc.
It is particularly preferred to determine more than one gene of the inventive table(s) in any embodiment of the invention, including the set, which may comprises primers and/or probes specific for potentially methylated regions of said more than one genes. Determining the methylation status may comprise determining the methylation status of at least 2, preferably of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, at least 25, 30, 33, 35, 40, 45, 50 or 74845913.1-9 more of the genes of said table(s) or list(s), e.g. of the combines tables 1 and 2, of table 1, of table 2 or list 1a or list 2a, e.g. of ABLIM3, ABTB2, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, C20orf85, CAPS, CDH13, CHKA, CIITA, CIT, CLN5, COBL, COL22A1, CPLX2, CYB561, DERL3, DNAH17, DNAH9, ELMO1, ELOVL5, ENO2, EPHA10, FAM20A, FMOD, FRMD4A, FRMPD2, GAD1, GALNT9, GJB6, GRIN2C, HK1, HLA-DOA, HOXB4, HOXD9, IFT140, IL17RD, IP6K3, IRF5, ITM2C, ITPR1, KCNAB1, KCNN4, KLK10, KRT80, LILRB1, LIPH, LOC100130238, LRP2, LRRC23, LYSMD2, MACC1, MICALCL, MINA, MIOX, MPPED2, MTSS1, MYO1G, NEK11, NRXN2, NT5C2, NTSR1, PAG1, PCDHA, PCNXL2, PCNXL2, PDZK1IP1, PDZRN4, PER1, PIM3, PRDM11, PRR7, PTHLH, PTPRF, RBP1, RUNX2, SH2B3, SH3GL3, SLC22A9, SORBS2, SPC24, STRA6, SUPT3H, SYN2, TBX2, TFAP2B, TIMP4, TMC6, TMC8, TMEM204, TMOD2, TREM1, TRIM29, UHRF1, WSCD2, ZIC1, ZSCAN18. It is possible to pick any small number from these subsets or combined set since a distinction between benign and malignant states or the diagnosis of cancer can also be performed with acceptable certainty. For example in a preferred embodiment the inventive set or method comprises at least 3 (or any of the above mentioned numbers) of genes of methylation markers. In fact, these markers can be chosen at random since the inventive tables have been thoroughly compiled to allow just that.
As said, these numbers are achieved by a random selection of the inventive tables. The result can be even increased by selecting marker combinations with high complementarity to lower the classification error (see.
Such methods include class comparisons wherein a specific p-value is selected, e.g. a p-value below 0.1, preferably below 0.08, more preferred below 0.06, in particular preferred below 0.05, below 0.04, below 0.02, most preferred below 0.01.
Preferably the correlated results for each marker or gene are rated by their correct correlation to thyroid cancer positive state, preferably by p-value test or t-value test or F-test. Rated (best first, i.e. low p- or t-value) markers are the subsequently selected and added to the marker combination until a certain diagnostic value is reached, e.g. the herein mentioned at least 60%, at least 70%, at least 80%, at least 90% or at least 95% (or more) correct classification of thyroid cancer.
Class Comparison procedures include identification of genes that were differentially methylated among the two or more classes using a random-variance t-test. The random-variance t-test is an improvement over the standard separate t-test as it permits sharing information among genes about within-class variation without assuming that all genes have the same variance (Wright G. W. and Simon R, Bioinformatics 19:2448-2455, 2003). Genes were considered statistically significant if their p value was less than a certain value, e.g. 0.1 or 0.01. A stringent significance threshold can be used to limit the number of false positive findings. A global test can also be performed to determine whether the methylation profiles differed between the classes by permuting the labels of which arrays corresponded to which classes. For each permutation, the p-values can be re-computed and the number of genes significant at the e.g. 0.01 level can be noted. The proportion of the permutations that give at least as many significant genes as with the actual data is then the significance level of the global test. If there are more than 2 classes, then the “F-test” instead of the “t-test” should be used.
Class Prediction includes the step of specifying a significance level to be used for determining the genes that will be included in the subset. Genes that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the set. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction can be achieved by being more liberal about the gene sets used as features. The sets may be more biologically interpretable and clinically applicable, however, if fewer genes are included.
To prevent increase of the number of the members of the subset, only marker genes with at least a significance value of at most 0.1, preferably at most 0.8, even more preferred at most 0.6, at most 0.5, at most 0.4, at most 0.2, or more preferred at most 0.01 are selected.
Since the combination should be small, it is preferred that not more than 10000, not more than 5000, not more than 2500, not more than 2000, not more than 1500, not more than 1000, not more than 800, not more than 600, or not more than 400, preferably not more than 350, not more than 300, not more than 250, not more than 200, not more than 150, not more than 100, not more than 80, not more than 60, or not more than 40, preferably not more than 30, in particular preferred not more than 20, marker genes are used according to the inventive method or in the inventive set, not counting controls for methylation testing or for gene expression testing. In particular the set of the present invention provides less primer pairs/and or probes than these numbers in order to reduce manufacturing costs in addition to the above reasons.
In preferred embodiments, the inventive diagnosis using DNA methylation data is combined with an expression analysis of these genes used in the methylation status analysis or any one of more of the genes of tables 1 and 2, or lists 1a, or 2a. E.g. The method may further comprise determining the gene expression of at least one of said genes of table 1 and/or 2, wherein a differential expression as compared to a normal sample indicates thyroid cancer or the risk thereof. Differential expression may be an increased or decreased expression. Such directions of differential expression are indicated in
The methylation status can be determined by any method known in the art including methylation dependent bisulfite deamination (and consequently the identification of mC-methylated C—changes by any known methods, including PCR and hybridization techniques). Preferably, the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridisation analysis to non-digested or digested fragments or PCR amplification analysis of non-digested fragments. The methylation status can also be determined by any probes suitable for determining the methylation status including DNA, RNA, PNA, LNA probes which optionally may further include methylation specific moieties.
As further explained below the methylation status can be particularly determined by using hybridisation probes or amplification primer (preferably PCR primers) specific for methylated regions of the inventive marker genes. Discrimination between methylated and non-methylated genes, including the determination of the methylation amount or ratio, can be performed by using e.g. either one of these tools.
The determination using only specific primers aims at specifically amplifying methylated (or in the alternative non-methylated) DNA. This can be facilitated by using (methylation dependent) bisulfite deamination, methylation specific enzymes or by using methylation specific nucleases to digest methylated (or alternatively non-methylated) regions—and consequently only the non-methylated (or alternatively methylated) DNA is obtained. By using a genome chip (or simply a gene chip including hybridization probes for the marker genes), all amplification or non-digested products are detected. I.e. discrimination between methylated and non-methylated states as well as gene selection (the inventive set or subset) is before the step of detection on a chip.
Alternatively it is possible to use universal primers and amplify a multitude of potentially methylated genetic regions (including the genetic markers of the invention) which are, as described either methylation specific amplified or digested, and then use a set of hybridisation probes for the characteristic markers on e.g. a chip for detection. E.g. gene selection is performed on the chip.
Either set, a set of probes or a set of primers, can be used to obtain the relevant methylation data of the genes of the present invention. Of course, both sets can be used.
The method according to the present invention may be performed by any method suitable for the detection of methylation of the marker genes. In order to provide a robust and optionally re-useable test format, the determination of the gene methylation is preferably performed with a DNA-chip, real-time PCR, or a combination thereof. The DNA chip can be a commercially available general gene chip (also comprising a number of spots for the detection of genes not related to the present method) or a chip specifically designed for the method according to the present invention (which predominantly comprises marker gene detection spots).
Preferably the methylated DNA of the sample is detected by a multiplexed hybridization reaction. In further embodiments a methylated DNA is preamplified prior to hybridization, preferably also prior to methylation specific amplification, or digestion. Preferably, also the amplification reaction is multiplexed (e.g. multiplex PCR).
Preferred DNA methylation analyses use bisulfite deamination-based methylation detection or methylation sensitive restriction enzymes. Preferably the restriction enzyme-based strategy is used for elucidation of DNA methylation changes. Further methods to determine methylated DNA are e.g. given in EP 1 369 493 A1 or U.S. Pat. No. 6,605,432. Combining restriction digestion and multiplex PCR amplification with a targeted microarray-hybridization is a particular advantageous strategy to perform the inventive methylation test using the inventive markers. A microarray-hybridization step can be used for reading out the PCR results. For the analysis of the hybridization data statistical approaches for class comparisons and class prediction can be used.
The inventive methods (for the screening of subsets or for diagnosis or prognosis of a disease or tumor type) are particularly suitable to detect low amounts of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng. The inventive method is particularly suitable to detect low concentrations of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng, per ml sample.
The inventive method may comprise comparing the methylation status with the status of a confirmed thyroid cancer or thyroid cancer type positive and/or negative state. The control may be of a healthy subject or devoid of significant cancer signatures, such as healthy tissue of a healthy subject or SN or FTA.
In particular preferred a negative control is used. The inventive diagnosis may be based on increased methylation of the inventive marker genes. In comparison with other controls a decreased methylation may be detected. Markers with increased or increased methylation in case of cancer or any given thyroid type are shown in tables 1 and 2. The invention may comprise the step of comparing the methylation status with the status of a confirmed thyroid cancer positive and/or negative state, preferably selected from a normal control, FTA, FTC and PTC, preferably wherein the control comprises a healthy thyroid nodule or no nodule.
A particular benefit is surprisingly the use of more than one probe or primer (or primer pair) for each gene, e.g. determining the methylation status for more than one marker, such as CpG sites, islands or shores, of one gene improves the classification rate, despite that the expression level of the same gene is influenced. Thus in preferred embodiments the method comprises determining the methylation status for at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more genes in at least two (e.g. 2, 3 or more) potentially methylated regions of each gene. These genes may be the ones selected as discussed above of tables 1 and 2. For the inventive set this means that at least 2 probes or primers are included for the mentioned gene(s).
Preferably determining the methylation status comprises comparing a methylation-status specific signal with a methylation-status unspecific signal at a preselected potentially methylated region of said gene. In such embodiments, the inventive methylation status determinations may include generating a signal of a methylation specific probe, i.e. a probe that causes a different signal in dependence of the methylation status, and a methylation status indifferent probe, i.e. a probe, which does not distinguish between the methylation status—also referred to as “methylation unspecific”. The ratio of the signal of the methylation specific probe to the signal of the methylation indifferent probe can be used as an indicator of the methylation status of a target nucleic acid. This ratio is also referred to as “beta difference”. Using such a ratio has the benefit of normalizing the signal data and cancellation of noise and unwanted signal interferences, that are similar for the methylation specific probe and methylation indifferent probe. Of course this embodiment is not limited to probes but equally applies to any other means of generating methylation dependent and methylation indifferent signal from a target nucleic acids, such as when using primer extension reactions, such as PCR.
The sample of the subject can be a thyroid tissue sample, preferably of a biopsy sample, especially needle aspiration sample. The control sample may be selected from the same type.
In preferred embodiments of the invention, combinable with any one of the other embodiments and gene selections mentioned above, the methylation status of said genes is determined in an upstream region of the open reading frame of the marker genes, in particular a promoter region. In addition or alternatively, it may be determined in a) a nucleic acid defined by the chromosomal locus as identified in table 1 or table 2; b) a CpG site encompassing the nucleic acid a), or c) a one or more nucleic acids within at most 1000 nucleotides in length distanced from said nucleic acid a). The one or more nucleic acid that is preferably determined according to the invention is given by reference to the chromosomal locus (column MAPINFO in tables 1 and 2), which together with the chromosome number (column CHR) refers to the hg19 human genome assembly (version “GRCh/hg19” of February 2009—see http://genome-euro.ucsc.edu) and identifies an exact position in the genome by a single base). A further preferred nucleic acid or CpG locus for detection may be within the vicinity of the more preferred nucleic acid locus that includes the position of the chromosomal locus as identified in table 1 or table 2, e.g. within at most 800, at most 600, at most 500, at most 400, at most 300, at most 200, or at most 100, nucleotides in length distanced from said nucleic acid a).
In a further aspect, the present invention provides a set of nucleic acid primers, primer pairs or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict thyroid cancer according to any method of the invention, E.g. the set may comprise probes or primers or primer pairs for genes ABLIM3, ABTB2, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, C20orf85, CAPS, CDH13, CHKA, CIITA, CIT, CLN5, COBL, COL22A1, CPLX2, CYB561, DERL3, DNAH17, DNAH9, ELMO1, ELOVL5, ENO2, EPHA10, FAM20A, FMOD, FRMD4A, FRMPD2, GAD1, GALNT9, GJB6, GRIN2C, HK1, HLA-DOA, HOXB4, HOXD9, IFT140, IL17RD, IP6K3, IRF5, ITM2C, ITPR1, KCNAB1, KCNN4, KLK10, KRT80, LILRB1, LIPH, LOC100130238, LRP2, LRRC23, LYSMD2, MACC1, MICALCL, MINA, MIOX, MPPED2, MTSS1, MYO1G, NEK11, NRXN2, NT5C2, NTSR1, PAG1, PCDHA, PCNXL2, PCNXL2, PDZK1IP1, PDZRN4, PER1, PIM3, PRDM11, PRR7, PTHLH, PTPRF, RBP1, RUNX2, SH2B3, SH3GL3, SLC22A9, SORBS2, SPC24, STRA6, SUPT3H, SYN2, TBX2, TFAP2B, TIMP4, TMC6, TMC8, TMEM204, TMOD2, TREM1, TRIM29, UHRF1, WSCD2, ZIC1, ZSCAN18. Preferably at least 3 probes and/or primers for genes selected from three or more of the genes of table 1 and/or table 2, are selected. Preferably
at least one thyroid cancer gene is selected from ABTB2, ACOT7, ADM, ALOX5, ANKRD22, AXIN2, BHLHE40, C10orf107, C1orf21, C20orf85, CAPS, CHKA, CIITA, CIT, CLN5, COBL, COL22A1, CPLX2, DERL3, DNAH17, DNAH9, ELMO1, ELOVL5, ENO2, FAM20A, FMOD, FRMPD2, GALNT9, GJB6, GRIN2C, HK1, HLA-DOA, HOXD9, IFT140, IL17RD, IP6K3, ITM2C, ITPR1, KCNAB1, KCNN4, KRT80, LILRB1, LIPH, LOC100130238, LRP2, LRRC23, LYSMD2, MACC1, MICALCL, MINA, MPPED2, MTSS1, MYO1G, NEK11, NRXN2, NT5C2, NTSR1, PAG1, a PCDHA other than PCDHA13, PCNXL2, PCNXL2, PDZK1IP1, PDZRN4, PER1, PIM3, PRDM11, PRR7, PTHLH, PTPRF, RUNX2, SH2B3, SH3GL3, SLC22A9, SORBS2, SPC24, SUPT3H, SYN2, TFAP2B, TIMP4, TMC6, TMC8, TMEM204, TMOD2, TREM1, TRIM29, UHRF1, WSCD2, ZSCAN18. Also preferred, the set contains at most 5000 probes or primers (or any maximum number given above).
Preferably, the primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes, in particular a promoter region; or specific for a) a nucleic acid defined by the chromosomal locus as identified in table 1 or table 2; b) a CpG site encompassing the nucleic acid a), or c) a nucleic acid within at most 1000 nucleotides in length distanced from said nucleic acid a). Preferably as further defines as above.
Preferably, the set further comprises probes or primer specific for the potentially specific for a potentially methylated region of marker genes, wherein said further probes or primers are non-specific for DNA methylation and are suitable for use as a control or normalization agent. Also, such methylation unspecific probes can be used to determine a beta difference as disclosed above. The inventive set may also comprise a computer readable memory device, such as a CD, DVD, BR, flash drive, with a computer program product for calculating such normalizations or, in general, for assisting in a method of the invention, including the statistical methods described above.
Set according to the invention may be provided in a kit together with a methylation specific restriction enzyme and/or a reagent for bisulfite nucleotide deamination; and/or wherein the set comprises probes on a microarray.
Preferably the set is provided on a solid surface, in particular a chip, whereon the primers or probes can be immobilized. Solid surfaces or chips may be of any material suitable for the immobilization of biomolecules such as the moieties, including glass, modified glass (aldehyde modified) or metal chips.
The primers or probes can also be provided as such, including lyophilized forms or being in solution, preferably with suitable buffers. The probes and primers can of course be provided in a suitable container, e.g. a tube or micro tube.
The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between thyroid cancer, SN, FTA, FTC and PTC, in particular for diagnostic or prognostic uses.
The present invention is further explained by way of the following figures and examples, without being limited to these embodiments of the invention. The invention as described above can of course be combined with any element of these examples.
Fresh frozen thyroid nodules from 46 patients (10 PTC, 14 FTA, 11 FTC, 11 SN) were collected at the Medical University of Vienna, Department of Clinical Pathology in the years 1993-2009. Average age at surgery was 52±19 years. After surgery the thyroid tissue was immediately submerged in liquid nitrogen to preserve nucleic acid. The tissue samples were made anonymous and forwarded to AIT. This study was approved by the Ethics Committee of the Medical University of Vienna.
Sample quality and sample allocation was evaluated by a qualified pathologist. All samples provided sufficient amounts of high quality DNA (purity [260/280]: 1.7-2.2) for all downstream analysis.
A section of each sample was histologically examined by a pathologist to confirm the tumor entity and quality. Approximately 100 mg of tissue was used for DNA and mRNA isolation. Genomic DNA was isolated using the AllPrep DNA/RNA Mini-Kit (Qiagen, Hilden, Germany) according to the manufacturer's protocol. DNA quantification was done on a Nanodrop 1000 upon absorbance measurements (260/280 nm).
For whole genome methylation analysis, the Infinium 450k methylation platform (Illumina, USA) was used (Quantitative cross-validation and content analysis of the 450k DNA methylation array from Illumina, Inc. BioMed Central Ltd 2012). Briefly, a total of 500 ng of genomic DNA was subjected to sodium bisulfate conversion using the EZ DNA Methylation Kit (Zymo Research, California, USA), following the manufacturers protocol with a slight adaption during the incubation protocol according to Illumina's recommendations. Instead of an isothermal incubation at 50° C. for 16 h, a cycling incubation was used (16 cycles; 95° C. for 30 sec; 50° C. for 60 min; storage at 4° C.). The DNA was eluted in 12 μl elution buffer.
An aliquot of the converted DNA (4 μl) of the 48 samples was assayed by Illumina's HumanMethylation450k BeadChip, following the manufacturer's protocol. The remaining 8 μl were stored at −20° C. as backup.
Briefly, 200 ng of total RNA was reverse transcribed. Amplification and labeling were performed by T7-polymerase in vitro transcription, to give Cy3-labeled cRNA. The dye incorporation rate was assessed with a Nanodrop ND-1000 spectrophotometer and was consistently >9 pmolCy3/ugRNA. Single color hybridization were carried out using the Agilent Gene Expression Hybridisation Kit (p/n 5188-5242), following the manufacturer's instructions. Briefly, 1650 ng of cRNA was subjected to fragmentation (30 min at 60C) and then hybridization on 4×44K Human Whole-Genome 60-mer oligo-chips (G4112F, Agilent Technologies) in a rotary oven (10 rpm, 65C, 17 h). Slides were disassembled and washed in solutions I and II according to the manufacturer's instructions, and dried using Acetonitril. Scanning was done on an Agilent microarray scanner (p/n G2565BA) followed by Agilent Feature Extraction Software.
Results from the BeadChips were initially extracted by Illumina's BeadStudio software with the Methylation Module. Beta scores as well as detection p-values were generated in BeadStudio.
Data of both platforms (Methylation and Gene Expression) were analyzed within the R environment. Missing values were imputed using KNN-Impute (Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays: The Institute of Mathematical Statistics; 2003). The data was quantile normalized before statistical evaluation.
For both methylation and gene expression data, differential methylation/expression analysis was performed using ANOVA models with empirical bayes moderated variances as implemented in the limma package (Bioconductor) (Bioconductor: open software development for computational biology and bioinformatics: BioMed Central). Similarly, ROC analysis was performed to assess the diagnostic relevance of the findings.
For the selection of relevant marker genes and CpG sites from the methylation data, an AUC-value (from ROC analysis)>0.8 and an absolute beta-difference >0.1 and a p-value <0.05 (Benjamini Hochberg corrected) in methylation analysis and a p-value <0.05 in expression analysis was chosen.
Selected markers were used to train classification models using a nearest centroid algorithm implemented in the PAMR package. In order to assess whether classification accuracies depend on the size of the gene panel used in classification, a random set of n genes from the pool of genes surviving the thresholds (AUC>0.8 AND absolute beta-difference >0.1 AND p-value <0.05 AND p-value in gene expression <0.05, see above) was drawn and classification accuracies were determined in leave-one-out-cross-validation (loocv). This procedure was repeated 1000 times for each n.
The sample set was subjected to genome wide methylation analysis using the HumanMethylation450 BeadChip from Illumina. We selected genes according to the rules specified in in example 1 with the aim of selecting marker genes and CpG sites with strong differential methylation (beta difference, i.e. the difference between the methylation specific probe and methylation non-specific probe, and p-value), predictive power (AUC) and an effect on gene expression (p-value from gene expression).
This yielded the inventive marker sets, which contains markers with two specialties: markers which can distinguish between benign and malignant thyroid nodules and markers which distinguishes between FTA and FTC. The first subset of markers consists of 126 CpG sites which map to 63 genes (many genes represented by many CpG sites). The second subset of markers consists of 73 CpG sites which map to 65 genes. The tables 1 and 2 of methylated genes plus their graphical representation as boxplot and ROC curves are given above in the detailed description and illustrated in the figures. 11 genes are shared between these two tables, the rest is unique (ACOT7, C1orf21, PCNXL2, KCNAB1, ABLIM3, TREM1, COBL, WSCD2, CIT, AXIN2, SPC24).
Unsupervised clustering based on these genes shows clear patterns of methylation which correlates to the histological endpoint used for analysis (
Owing to the complex nature of tumours on the one hand, and the redundancy in biological processes on the other hand, using only one gene or CpG site has a high risk. Therefore, two sets of markers in tables 1 and 2 (with 126 and 73 CpG sites, respectively) are provided, which greatly improve on single marker diagnosis. When a minimum of markers is drawn, a good classification accuracy is achieved—see
For the classification task benign vs malign, 6 genes out of a total of 126 need to be drawn to yield a median misclassification rate of <15%, which is the minimum of what the best single genes out of the pool can achieve (PDZK1IP1 or SORBS2). Similarly, for the task of predicting FTA vs FTC, also 6 genes need to be drawn out of the pool of 73 genes to yield a misclassification rate <20%, which is the minimum of what the best single gene out of the pool can achieve (C1ORF21). Some markers of the inventive sets are also suitable for single marker diagnosis, but even in these cases, an improvement can be achieved by selecting more than one marker.
The drop in classification accuracy shown here is in stark contrast to recent work done by Rodriguez-Romero et. al. (J. Clin. Endocrinol. Metab. 2013, 98:2811-2821). They measured DNA methylation in thyroid nodules using the predecessor platform from Illumina which contained probes for 27000 CpG sites. They report 8613 CpG sites as differentially methylated at a p-value <0.05, but do not report any diagnostically relevant values (accuracies, AUC-values, etc. . . . ). Furthermore, they do not report any combination of markers to be diagnostically relevant.
The result of the study is a novel set of biomarkers combined in two classifiers for correct prediction of benign and malignant thyroid nodules as well as for the discrimination of FTCs and FTAs. The set of biomarkers suggests that there are detectable epigenetic alterations which allow the identification of the different thyroid nodules entities. In contrast to other studies we did not focus exclusively on the 5′UTR region of the certain genes, but included any gene region for which an informative character was suggested by the microarray experiments and we included gene expression data to assess whether any methylation change has an effect on gene expression or not.
This allows the use of the biomarkers in the clinical routine setting. Furthermore the presented set of biomarkers based on DNA methylation is easier to handle and more amenable compared to biomarkers based on mRNA. Replacing or aiding cytology by an assay covering the newly defined set of biomarkers should result in fewer patients with indeterminate cases of thyroid nodules. That would also facilitate patients care by reducing unnecessary surgeries of indeterminate cases and increase patients care towards personalized medicine.
Number | Date | Country | Kind |
---|---|---|---|
14180318.9 | Aug 2014 | EP | regional |
This application is a continuation of U.S. application Ser. No. 15/502,591 filed 8 Feb. 2017, which is a national phase application under 35 U.S.C. § 371 of International Application No. PCT/EP2015/068397 filed 10 Aug. 2015, which claims priority to European Patent Application No. 14180318.9 filed 8 Aug. 2014. The entire contents of each of the above-referenced disclosures is specifically incorporated by reference herein without disclaimer.
Number | Date | Country | |
---|---|---|---|
Parent | 15502591 | Feb 2017 | US |
Child | 16529053 | US |