The present application relates to methods and compositions for detecting thyroid cancer in mammals, in particular in human beings. The invention provides novel biomarkers which allow effective, specific and sensitive detection of thyroid cancer, particularly to assist in the diagnosis of thyroid lesions in human subjects. The invention further relates to tools and/or kits (such as reagents, probes, primers, chips or arrays) suitable for implementing said methods, and the preparation and uses thereof.
Thyroid cancer is the most common malignancy of the endocrine system (Curado, Edwards et al. 2007). Over 60 000 new cases are diagnosed each year in the US; despite its low malignancy potential, it is estimated that 1,890 patients will die of this disease in 2014 (http://seer.cancer.gov/statfacts/html/thyro.html accessed Oct. 15, 2014).
Three different types of thyroid cancer have been defined according to their histological features: differentiated thyroid cancer (DTC) deriving from epithelial cells from the thyroid follicles; medullary thyroid cancer (MTC); and anaplastic thyroid cancer (ATC). Approximately 90% of diagnosed thyroid cancers correspond to DTC, with papillary (PTC) histology being the most frequent (75%), followed by follicular (FTC) (10%), Hürthle cells (5%), and poorly differentiated carcinomas (1-6%). Overall, MTC accounts for approximately 10% of all thyroid tumors and ATC barely 1% [2]. Although the papillary thyroid cancer (PTC) is the most frequent, the follicular thyroid cancer (FTC) is less common but more aggressive. Hürthle cell thyroid cancer (HCC) is an uncommon but distinct and generally more aggressive type of thyroid cancer, which is now classified by the World Health Organization (WHO) as a subtype of follicular cancer (Harris and Bible 2011). Most DTC patients have a very good prognosis if diagnosed at early stages, with a 91% survival rate at 20 years when the classical treatment with surgery followed by suppression of thyroid stimulating hormone (TSH) is employed.
Thyroid nodules are a frequent clinical finding during thyroid examination with a prevalence ranging between 4 and 7% in the general population. This prevalence is even higher in case of ultrasonography of the cervical area and can reach 50% of patients over the age of 65. The most accurate tool for evaluating thyroid nodules is fine needle aspiration (FNA), with an accuracy >95%. There are several classification systems for thyroid cytology, the most recognized and the most widely used is the Bethesda classification (Cibas and Ali, 2009) composed of 6 categories for which the recommended patient management is different:
Because FNA is unable to provide a definitive diagnosis for the nodules in categories 3, 4 and 5, most patients with indeterminate cytology undergo diagnostic surgery (i.e., partial thyroidectomy) to establish a histopathological diagnosis. If malignancy is identified, the patients require a complete thyroid resection. In contrast, if the pathological exam reveals a benign nodule, partial thyroidectomy has proven to be unnecessary (frequency: 60 to 80%). This situation of indeterminate cytology is encountered in up to 25% of the patients with a thyroid nodule undergoing FNA and shows the need of more powerful molecular tools to decrease the number of unnecessary surgeries.
Recent work has investigated the potential benefit of combining microscopic and mutation analyses (Nikiforov et al, 2011). There are several molecular tests available such as Thyroseq (Nikiforova et al, 2013) or Afirma (Alexander et al, 2012). These tests, however, have either low specificity or low sensitivity. Epigenetic and peripheral blood markers have also been studied but they have limited sensitivity and fail to detect 30% to 40% of cancers (Hu et al, 2006). WO2013/086429 provides long lists of genes presented as potentially related to thyroid cancer. Eszlinger et al (Endocrine reviews 28(3), 2007, 322) and Huang et al (PNAS 98(26), 2001, 15044) discuss gene expression profiling of thyroid tumors. A number of genes are listed in these documents but no hint towards specific combinations thereof is provided which would allow specific and effective thyroid cancer management.
There is a need for alternative methods of detecting thyroid cancer that are specific and sensitive, therefore increasing the number of definitive diagnoses and reducing unnecessary diagnostic surgeries.
The present invention provides novel biomarkers which allow very specific and sensitive diagnosis of thyroid cancer. Starting from approximately 20 000 genes and 35 000 alternative splicing events, the inventors have been able to identify target genes as highly relevant biomarkers of thyroid cancer. By combining expression markers from these genes and/or domains of these genes, it is possible to determine with a high level of certainty, the malignant vs benign stage of thyroid lesions in human samples. The invention specifically improves the ability to distinguish malignant versus benign profiles within a population of FNA samples being considered as “indeterminate” (especially in categories III and IV from the Bethesda classification) during a cytological exam. By providing a determinant diagnosis to subjects initially identified as “indeterminate” the invention improves the clinical management of patients with thyroid nodules in order to avoid unnecessary thyroid surgeries.
An object of the invention more particularly relates to a method for aiding in the diagnosis of thyroid cancer in mammals, comprising measuring a set of target biomarkers in a biological sample from the mammal and diagnosing said thyroid cancer by comparing the measured value to a reference value, wherein the set of target biomarkers comprises an expression product of at least two genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10 or at least two distinct domains of at least one expression product of a gene selected among CITED1, TENM1, CCM2, SCLY and IGSF10.
A further object of the invention relates to a method as defined above for determining whether the mammal has or is likely to have a malignant thyroid lesion.
Another object of the invention relates to a method for diagnosing thyroid cancer in mammals, comprising measuring an expression product of at least two genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10, or at least two distinct domains of at least one expression product of a gene selected among CITED1, TENM1, CCM2, SCLY and IGSF10, in a biological sample from the mammal, and diagnosing said thyroid cancer by comparing the measured value to a reference value.
The invention further relates to the use of a set of target biomarkers comprising an expression product of at least two genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10, or at least two distinct domains of at least one expression product of a gene selected among CITED1, TENM1, CCM2, SCLY and IGSF10, for the diagnosis in vitro or ex vivo of thyroid cancer.
In a preferred embodiment, the method comprises measuring in a biological sample from the subject an expression product of CITED1 and an expression product of TENM1. In a further preferred embodiment, the method comprises measuring in a biological sample from the subject an expression product of CITED1, TENM1, CCM2, SCLY and IGSF10 genes.
The above biomarkers may be further combined with additional biomarkers. In this regard, in a particular embodiment, the set of biomarkers further comprises an expression product of the FRMD3 gene.
In further particular embodiments of the invention, the set of biomarkers further comprises at least one protein or RNA encoded by a gene selected from Table A, preferably each gene listed in Table A.
In further particular embodiments of the invention, the set of biomarkers further comprises at least one protein or RNA encoded by a gene selected from Table B, preferably each gene listed in Table B.
In a particular embodiment of the invention, the set of biomarkers comprises one expression product of CITED1, TENM1, CCM2, SCLY, IGSF10 and FRMD3 and of each gene listed in Table A and in Table B.
In a specific mode of carrying out the invention, the method comprises:
a. providing a biological sample (comprising thyroid cells) from the subject having a thyroid condition or suspected to have a thyroid condition; and
b. measuring in said biological sample the expression of at least two genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10, or at least two distinct domains of at least one expression product of a gene selected among CITED1, TENM1, CCM2, SCLY and IGSF10.
A further object of the invention resides in a kit comprising a solid support comprising at least two capture agents attached thereto, wherein each capture agent binds to an expression product of a distinct gene selected among CITED1, TENM1, CCM2, SCLY and IGSF10, or to at least two distinct domains of at least one expression product of a gene selected among CITED1, TENM1, CCM2, SCLY and IGSF10.
Another object of the invention relates to a method of treating thyroid cancer in a subject, the method comprising (i) measuring the expression product of at least 2 genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10 in a sample from the subject, said measure allowing the diagnosis of thyroid cancer, and (ii) resecting the cancer when the subject is diagnosed to have thyroid cancer in step (i).
The invention may be used on any mammal, in particular any human subject. It is particularly effective to discriminate between malignant and benign thyroid lesions or nodules in a subject, and therefore to determine whether a thyroid lesion in a subject is malignant. The invention may be used on cellular samples, tissue biopsy or fine thyroid needle aspirate.
The present invention provides novel methods for diagnosing thyroid cancer based on the analysis of new biomarkers. Upon comprehensive analysis of all of the human genes in biological samples from patients with malignant or benign nodules, the inventors identified genes which represent strong diagnostic biomarkers of thyroid cancer and may be used to determine, with high level of certainty, the malignant vs benign status of thyroid lesions in human subject samples. Expression of these gene products is strongly correlated to the disease and allows discrimination between malignant and benign nodules of the thyroid gland based on only two genes or on only two distinct domains of one of said genes.
An object of the invention more particularly relates to a method for (aiding in) the diagnosis of thyroid cancer in mammals, comprising the measurement of an expression product of at least two genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10 or of two distinct domains of one of said genes, in a sample from the mammal and the diagnosis of said thyroid cancer by comparing the measured value to a reference value.
Another object of the invention relates to a method for (aiding in) the diagnosis of thyroid cancer in mammals, comprising the measurement of the expression of at least two distinct domains of at least one, preferably at least 2 genes selected among CITED1, TENM1, CCM2, SCLY and IGSF10 in a sample from the mammal and the diagnosis of said thyroid cancer by comparing the measured value to a reference value.
In a particular embodiment, the method comprises:
In a further particular embodiment, the invention is conducted after cytological exam and allows the diagnosis of patient samples remaining “indeterminate” after said cytological exam. In this regard, a particular method of the invention comprises:
Target Genes
The invention discloses the identification of genes whose expression products allow efficient diagnosis of thyroid cancer. Said genes have been selected from a set of approximately 20,000 human genes and demonstrate a greater than 97% specificity in a thyroid cancer diagnostic, which is remarkable. These genes, in various combinations, are sufficient to produce strong diagnostic results of thyroid cancer, which is very advantageous when compared with current proposed tests comprised of more than 100 genes.
Preferred target genes for use in the invention are CITED1, TENM1, CCM2, SCLY and IGSF10. As disclosed in the examples, detecting a gene product of at least 2 of these genes, or at least two domains of a gene product of one of these genes, is sufficient to provide a reliable diagnosis of thyroid cancer.
The full length nucleic acid sequences of these genes are publicly available and may be found in the NCBI Entrez gene database. The identifications for each of these genes is provided below as well as in Table C:
Within the context of the present invention, the term “CITED1 gene” designates preferably a human CITED1 gene, particularly a nucleic acid molecule or sequence comprising (i) a sequence of gene#4435 or a sequence complementary thereto; or (ii) a natural variant of a sequence of (i) such as a polymorphism; or (iii) a sequence having at least 90% identity, preferably at least 95, 96, 97, 98 or 99%, to a sequence of (i) or (ii); or (iv) a sequence hybridizing under stringent conditions to a sequence of (i) or (ii).
The same definition applies to the TENM1, CCM2, SCLY and IGSF10 genes by reference to gene#10178, 83605, 51540 and 285313, respectively, as well as to any other gene quoted in the present application, by reference to the corresponding reference sequence (see Tables A to C).
The present invention shows that the above genes or alternative isoforms of those genes are deregulated in thyroid cancer subjects so that any domain or sequence of an expression product encoded by said genes may be measured. For improved detection, however, especially when the method is based on a combined measure of the expression product of only 1 to 3 of said genes, the invention shows that particular domains or sequences (“target sequences”) may be targeted in the expression products of these genes for improved thyroid cancer diagnosis.
In this regard, in relation to CITED1 gene, the method preferably comprises measuring (e.g., determining the presence, absence or amount of) at least one sequence encoded by exon4, exon3, or exon(−2) in an expression product of the CITED1 gene. As shown in the experimental section, by designing reagents (e.g., probes or primers) that detect such preferred domains of CITED1 gene products, a more specific and sensitive detection of thyroid cancer is obtained. In a more preferred embodiment, the method comprises measuring (e.g., determining the presence, absence or amount of) a target sequence of CITED1 comprised in any of the sequences listed as SEQ ID NOs: 106-119.
With regard to TENM1, in a particular embodiment, the method comprises measuring (e.g., determining the presence, absence or amount of) at least one sequence encoded by exon1, 3, 13, 15, 17, 18, 19, 25 or 32 in an expression product of the TENM1 gene. As shown in the experimental section, by designing reagents that detect such preferred domains of TENM1 gene products, a more specific and sensitive detection of thyroid cancer is obtained. In a more preferred embodiment, the method comprises measuring (e.g., determining the presence, absence or amount of) a target sequence of TENM1 comprised in any of the sequences listed as SEQ ID NOs: 817-838.
With regard to CCM2, in a particular embodiment, the method comprises measuring a target sequence comprised in SEQ ID NO: 83.
With regard to SCLY, in a particular embodiment, the method comprises measuring a target sequence comprised in SEQ ID NO: 749.
With regard to IGSF10, in a particular embodiment, the method comprises measuring a target sequence comprised in SEQ ID NO: 335.
Moreover, in a further particular embodiment, the invention comprises measuring at least two distinct target sequence of an expression product the above target genes. The invention indeed shows that by monitoring two or more distinct domains in e.g., CITED1 or TENM1 gene products, an efficient and reliable diagnosis of thyroid cancer is made.
In this regard, in a particular embodiment, the method comprises measuring at least one sequence encoded by exon4 of the CITED1 gene and at least one sequence encoded by exon 3 and/or exon(−2) of the CITED1 gene.
In another particular embodiment, the method comprises measuring at least one sequence encoded by exon17, 18, 19, 25, or 32 of the TENM1 gene and at least one sequence encoded by exon1, 3, 13 or 15 of the TENM1 gene.
The results shown in the experimental section demonstrate that, by detecting distinct target sequences or domains of a single target gene as defined above, a very reliable detection of thyroid cancer is obtained. For instance, by detecting 3 domains of CITED1, a sensitivity and specificity of 75% and 90%, respectively, can be achieved, and by detecting several distinct target sequences as defined above of CITED1 and TENM1, a specificity of 92.5% can be obtained, which is remarkable and suitable to design a commercial test.
In a preferred embodiment, the invention comprises measuring a gene product of at least the CITED1 and TENM1 genes.
In another embodiment, the invention comprises measuring a gene product of at least the CITED1 and CCM2 genes.
In another embodiment, the invention comprises measuring a gene product of at least the CITED1 and SCLY genes.
In another embodiment, the invention comprises measuring a gene product of at least the CITED1, TENM1, CCM2, SCLY and IGSF10 genes.
Furthermore, while the above 5 target genes already provide highly reliable test, it is possible to further improve the method by combining said genes with further marker genes identified by the inventors.
In this regard, in a particular and preferred embodiment, the method further comprises the determination of at least one gene product of the FRMD3 gene (gene#257019). In a particular embodiment, for measuring FRMD3 gene expression, the method comprises measuring a target sequence comprised in any of the sequences listed as SEQ ID NOs: 249 to 269.
In a preferred embodiment, the invention comprises measuring a gene product of at least the CITED1 and FRMD3 genes.
In another embodiment, the invention comprises measuring a gene product of at least the CITED1 and TENM1 and FRMD3 genes.
In another embodiment, the invention comprises measuring a gene product of at least the CITED1, TENM1, CCM2, SCLY, IGSF10, and FRMD3 genes.
Moreover, to design more universal tests, the invention combines additional marker(s) identified by the inventors, selected from the expression products of the genes listed in table A and/or in Table B below:
For each gene included in Table A and in Table B, a most preferred sequence (or domain) to be targeted for measuring gene expression according to the invention is provided in Table C. Accordingly, measuring BCL9 preferably comprises measuring a target sequence comprised in anyone of the sequences listed as SEQ ID NOs: 52 to 54 in Table C; measuring ENTPD2 preferably comprises measuring a target sequence comprised in SEQ ID NO: 205 as listed in Table C, etc.
Specific combinations of target gene products are provided in the following table D, which combinations can provide strong specificity and sensitivity for detection of thyroid cancer:
Methods for Measuring Gene Expression Products
Within the context of the invention, the “measure” of a gene or gene expression product designates a measure or determination of the presence, absence, amount or alteration of such gene product. More preferably, the term “measure” or “measuring” designates a determination of the presence or amount of a gene product or domain, or of the level of expression of a gene. The gene product may be a RNA (e.g., a messenger RNA) or a polypeptide (or protein).
In a first preferred embodiment, the gene product is an RNA (which may be converted during the method into a cDNA). Various techniques for detecting a nucleic acid (e.g., RNA) in a sample can be used in the present invention, such as for example northern blot, selective hybridisation, the use of supports covered with oligonucleotide probes, nucleic acid amplification such as for example RT-PCR, quantitative PCR or ligation-PCR, etc. These methods can include the use of a nucleic probe (for example an oligonucleotide) capable of detecting selectively or specifically the target nucleic acid in the sample. Amplification can be performed according to various methods known to the person skilled in the art, such as PCR, LCR, transcription mediated amplification (TMA), strand displacement amplification (SDA), NASBA, the use of allele specific oligonucleotides (ASO), allele specific amplification, RNAseq. Detection can also be made using, e.g., Southern blot, single-strand conformation analysis (SSCA), in situ hybridisation (e.g., FISH), gel migration, heteroduplex analysis, NextGen sequencing, etc.
According to a preferred embodiment, the method comprises determining the presence or absence or (relative) amount of an RNA encoded by a target gene as defined above, preferably by selective hybridisation or selective amplification.
Measure by Selective Hybridization
Selective hybridisation is typically performed using nucleic probes, preferably immobilised on a support, such as a solid or semi-solid support having at least one surface, flat or not, for immobilising nucleic probes. Such supports are, for example, a slide, bead, membrane, filter, column, plate, etc. They can be made out of any compatible material, such as in particular glass, silica, plastic, fiber, metal, polymer, etc. The nucleic probes can be any nucleic acid (DNA, RNA, PNA, etc.), preferably single-stranded, with a sequence specific to a target RNA or DNA molecule. The probes typically comprise from 5 to 400 bases, preferably from 8 to 200, more preferentially less than 100, and even more preferentially less than 75, 60, 50, 40 or even 30 bases. The probes can be synthetic oligonucleotides, produced on the basis of the sequences of the target genes, according to standard synthesis techniques. Such oligonucleotides typically comprise from 10 to 50 bases, preferably from 20 to 40, for example approximately 25 bases. In a particularly advantageous embodiment, so as to improve the detection of signal, several different oligonucleotides (or probes) are used to detect the same target molecule (transcribed from the amplification of the patient's RNA to produce either cRNA or cDNA) during hybridisation. This may include specific oligonucleotides of various regions of the same target sequence, or centred differently on a given region. Advantageously, use is made of probe sets comprising 1-3 probes, which can be overlapping or not, completely or partially, and which are specific of the same target molecule. Use can also be made of probe pairs, in which one member is paired perfectly with the target sequence, and the other presents a mismatch, thus making it possible to estimate background signal. Probes can be designed to hybridise with a region of an exon or an intron, or with an exon-exon or exon-intron junction region. Thus, the probes make it possible to reveal and to distinguish various alternative spliced isoforms.
In a preferred embodiment, use is made of probes whose sequence comprises all or part of a nucleic acid sequence selected from anyone of SEQ ID NOs: 1-918 or of a sequence complementary thereto. In a preferred mode, use is made of nucleic acid probes having a length between 15 and 50 bases, more preferentially between 15 and 40 bases, and whose sequence is identical to a fragment of a sequence selected from anyone of SEQ ID NOs: 1-918 or of a sequence complementary thereto. The probe can also be designed in the opposite orientation. Such probes constitute another object of the present application, as well as the use thereof for diagnosing thyroid cancer in a subject.
Specific examples of probes are provided in Table E. These probes have been designed to bind target sequences as disclosed in Table C. Further probes can be designed using information provided in the invention, directed specifically to each target sequence of Table C. Each probe as disclosed in Table E represents a specific object of the invention.
An object of the invention thus relates to a nucleic acid having anyone of the nucleotide sequences as described in Table E, as well as to the use of these nucleic acids for diagnosing thyroid cancer.
The probes can be synthesised in advance and then deposited on the support, or synthesised directly in situ, on the support, according to methods known to the person skilled in the art. The probes can also be manufactured by genomic or molecular techniques, for example by amplification, recombination, ligation, etc.
The probes can by hybridized with nucleic acids, preferably RNAs contained in test samples under standard, conventional conditions allowing specific hybridization to occur. Hybridisation can be performed under standard conditions, known to and adjustable by the person skilled in the art (see Sambrook, Fritsch, Maniatis (1989) Molecular Cloning, Cold Spring Harbor Laboratory Press). Most notably, hybridisation can be performed under conditions of high, medium or low stringency, according to the level of sensitivity needed, the quantity of material available, etc. For example, suitable hybridisation conditions include a temperature between 55° C. and 63° C. for 2 hours to 18 hours. Other hybridisation conditions, adapted to high density supports, are for example a hybridisation temperature between 45° C. and 55° C. After hybridisation, various washes can be performed to eliminate the non-hybridised molecules, typically in SSC buffers including SDS, such as a buffer comprising 0.1× to 10×SSC and 0.5% to 0.01% SDS. Other wash buffers containing SSPE, MES, NaCl or EDTA can also be used.
In a typical embodiment, the nucleic acids (or arrays or supports) are pre-hybridised in hybridisation buffer (Rapid Hybrid Buffer, Amersham) typically containing 100 μg/ml of salmon sperm DNA at 65° C. for 30 min. The nucleic acids of the sample are then placed in contact with the probes (typically applied to the support or the array) at 65° C. for 2 hours to 18 hours. Preferably, the nucleic acids of the sample are marked beforehand by any known marker (biotin, radioactive, enzymatic, fluorescent, luminescent, etc.). The supports are then washed in 5×SSC, 0.1% SDS buffer at 65° C. for 30 min., then in a 0.2×SSC, 0.1% SDS buffer. The expression profile is analysed according to standard techniques, such as for example by measuring labelling on the support by means of a suitable instrument (for example InstantImager, Packard Instruments). Hybridisation conditions naturally can be adjusted by the person skilled in the art, for example by modifying hybridisation temperature and/or the salt concentration of the buffer as well as by adding auxiliary substances such as formamide or single-strand DNA.
A particular object of the invention thus resides in a method to diagnose thyroid cancer in mammals comprising contacting, under conditions allowing hybridisation between complementary sequences to occur, (i) nucleic acids from a sample from the subject and (ii) a set of probes specific of target gene expression products as defined above; and diagnosing thyroid cancer by measuring hybridization to determine expression values and comparing said values to a reference value.
In a preferred embodiment, the method uses:
wherein the term “part” advantageously indicates a region of 15 to 50 consecutive nucleotides, preferably of 15 to 30 nucleotides, even more preferably of 20-30 nucleotides.
In a more preferred embodiment, the method uses:
wherein the term “part” advantageously indicates a region as defined above.
In a particularly preferred embodiment, the method uses:
wherein the term “part” advantageously indicates a region as defined above.
In another preferred embodiment, the method uses:
In another preferred embodiment, the method uses:
The invention may utilize additional probes specific for any group or combination of target genes of the invention, as preferably listed in the above table.
Measure by Selective Amplification
Selective amplification is preferably performed using a primer or a pair of primers to amplify all or part of one of the target nucleic acids in the sample, when the target nucleic acid is present. The primer can be specific for a target sequence according to anyone of SEQ ID NOs: 1 to 918, or for a region flanking the target sequences in a nucleic acid of the sample. The primer typically comprises a single-strand nucleic acid, with a length advantageously between 5 and 50 bases, preferably between 5 and 30 bases. Such a primer constitutes another object of the present application, as well as the use thereof (primarily in vitro) for diagnosing thyroid cancer in a subject. The primers can be designed to hybridise with a region of an exon or an intron, or with an exon-exon or exon-intron junction region. Thus, the primers reveal and distinguish various forms of gene splicing.
In this respect, another object of the invention resides in the use of a set of nucleotide primers comprising at least 2 pairs of primers, each pair of primers comprising a sense and/or antisense nucleic acid primers complementary to and specific of a target nucleic acid selected from anyone of the sequences listed as SEQ ID NOs: 1-918, or the complementary strand thereof, for in vitro or ex vivo diagnosis of thyroid cancer in a subject.
Another specific object of the invention resides in a method to diagnose thyroid cancer in mammals, comprising contacting, under conditions allowing amplification, nucleic acids from a sample from mammals and a set of primers comprising at least 2 pairs of primers, each pair of primers comprising nucleic acid primers complementary to and specific of a target nucleic acid selected from anyone of the sequences listed as SEQ ID NOs: 1-918 or the complementary strand thereof to obtain an amplification profile, and comparing said amplification profile to a reference value to diagnose thyroid cancer.
The invention also relates to a composition or kit comprising a set of primers comprising at least 2 pairs of primers, each pair of primers comprising a sense and/or antisense nucleic acid primers complementary to and specific of a target nucleic acid selected from anyone of the sequences listed as SEQ ID NOs: 1-918 or the complementary strand thereof.
Measuring Polypeptide
In another embodiment, the method comprises the measurement of a polypeptide encoded by a target gene such as defined previously. Measuring or assaying a polypeptide in a sample can be performed by any known technique, most notably by means of a specific ligand, for example an antibody or an antibody fragment or derivative. Preferably, the ligand is a specific antibody of the polypeptide, or a fragment of such an antibody (for example Fab, Fab′, CDR, etc.), or a derivative of such an antibody (for example a single-chain variable-fragment antibody, scFv). The ligand is typically immobilised on a support, such as a slide, bead, column, plate, etc. The presence or quantity of target polypeptide in the sample can be detected by revealing a complex between the target and the ligand, for example by using a labelled ligand or by using a second labelled indicator ligand, etc. Immunological techniques that can be used and are well known are ELISA, IHC and RIA techniques, etc. If necessary, the quantity of polypeptide detected can be compared to a reference value, for example a median or mean value observed among patients who do or do not have thyroid cancer, or with a value measured in parallel in a control sample. Thus, it is possible to demonstrate variation in expression levels.
Specific antibodies of target polypeptides can be produced by conventional techniques, most notably by immunization of a non-human mammal with an immunogen comprising the polypeptide (or an immunogenic fragment thereof), and recovery of the antibodies (polyclonal) or cells producing monoclonal antibodies. Production techniques for polyclonal or monoclonal antibodies, single-chain variable-fragment antibodies and human or humanised antibodies are described for example in Harlow et al., A Laboratory Manual, CSH Press, 1988; Ward et al., Nature 341 (1989) 544; Bird et al., Science 242 (1988) 423; WO 94/02602; U.S. Pat. No. 5,223,409; U.S. Pat. No. 5,877,293 and WO 93/01288. The immunogen can be produced by synthesis, or by expression, in a suitable host, of a nucleic acid target such as defined previously. Such antibodies, monoclonal or polyclonal, as well as derivatives with the same antigenic specificity, constitute further objects of the present application, as well as the use thereof to diagnose thyroid cancer.
Changes in protein expression and/or structure can also be detected by means of techniques, known by the person skilled in the art, involving mass spectroscopy, more generally grouped under the name proteome analysis.
Implementation of the Method
The method of the invention is applicable to any biological sample from the tested mammal, most notably any nature of sample that includes nucleic acids or polypeptides from a thyroid cell, most preferably from a thyroid lesion or nodule. Specific examples include any tissue, organ or biological fluid that includes nucleic acids or polypeptides from a thyroid cell, most preferably from a thyroid lesion or nodule. Most preferred samples are a tissue biopsy from thyroid, or a fine needle thyroid aspirate.
In one preferred and particularly advantageous embodiment, the sample is a sample collected by the physician for cytological analysis of the subject's thyroid lesion or nodules. Upon completion of such cytological analysis, if the status of the sample is indeterminate, then this sample may be used to perform the current test.
The sample can be obtained by any known technique, for example by drawing, by non-invasive techniques, or from sample collections or biobanks, etc. Further, the sample can be pre-treated to facilitate access to the target molecules, for example by lysis (mechanical, chemical, enzymatic, etc.), purification, centrifugation, separation, etc. The sample can also be labelled to facilitate the determination of the presence of target molecules (biotin, fluorescent, radioactive, luminescent, chemical or enzymatic labelling, etc.). The nucleic acids of the sample can in addition be separated, treated, enriched, purified, reverse transcribed, amplified, fragmented, etc. In a particular embodiment, the nucleic acids of the sample are RNAs, most notably mRNA of the sample. In a very specific embodiment, the nucleic acids are the product of amplification of RNA, most notably of mRNA, or cDNA prepared from RNA, most notably mRNA of the sample.
In a particular embodiment, the method comprises:
In a further particular embodiment, the invention is conducted after a cytological exam and allows the diagnosis of patient samples remaining “indeterminate” after said cytological exam. In this regard, a particular method of the invention comprises:
In a typical embodiment, the method measures a gene expression product and compares such expression signal to a reference value in order to diagnose thyroid cancer. In this regard, the reference value may be any control value, or any mean value in healthy subjects, or 2 reference values characteristic of a diseased and a non-diseased subject, respectively. In this regard, in a preferred embodiment of the invention, the measured value is compared to a reference value of a control benign thyroid condition and to a reference value of a control malignant thyroid condition, wherein, when the measured value is closer to the reference value of the control benign thyroid condition, then the tested thyroid condition is benign, and when the measured value is closer to the reference value of the control malignant thyroid condition, then the tested thyroid condition is malignant.
The invention is applicable to any mammal, preferably human beings. The data provided in the examples show that the invention can detect the presence of malignant or benign lesions when the cytology was unable to provide a diagnostic result. 60 samples (40 benign and 20 malignant lesions) were used to validate the relevance of the invention, which addresses a clinical unmet need and allows reduction or avoidance of unnecessary surgeries.
Another object of the present application relates to a product comprising a support on which are immobilised nucleic acids comprising a sequence complementary to and/or specific for at least 2 genes selected from CITED1, TENM1, SCXLY, CCM2 and IGSF10 or the complementary strand thereof, more preferably at least 3, 4 or 5; or for at least 2 distinct domains of an expression product of at least one gene selected from CITED1, TENM1, SCXLY, CCM2 and IGSF10, more preferably at least 3, 4 or 5.
A particular object of the invention relates to a product comprising a support on which are immobilised:
A particular object of the invention relates to a product comprising a support on which are immobilised:
Another particular object of the invention relates to a product comprising a support on which are immobilised:
Another particular object of the invention relates to a product comprising a support on which is immobilised:
wherein, the term “part” is as defined above.
Another particular object of the invention relates to a product comprising a support on which is immobilised:
wherein the term “part” advantageously indicates a region as defined above.
The invention also relates to a kit comprising a product such as defined previously and reagents for a hybridisation reaction.
The invention further relates to the use of a product or kit defined above for in vitro or ex vivo diagnosis of thyroid cancer in a subject.
Another object of the present application relates to a product comprising a support on which is immobilised at least two ligands that specifically bind a polypeptide coded by at least 2 genes selected from CITED1, TENM1, CCM2, SCLY, IGSF10 and FRMD3.
The support can be any solid or semi-solid support having at least one surface, flat or not (i.e., in two or three dimensions), allowing the immobilisation of nucleic acids or polypeptides. Such supports are for example a slide, bead, membrane, filter, column, plate, etc. They can be made of any compatible material, such as most notably glass, silica, plastic, fibre, metal, polymer, polystyrene, Teflon, etc. The reagents can be immobilised on the surface of the support by known techniques, or, in the case of nucleic acids, synthesised directly in situ on the support. Immobilisation techniques include passive adsorption (Inouye et al., J. Clin. Microbiol. 28 (1990) 1469) and covalent bonding. Techniques are described for example in WO 90/03382 and WO 99/46403. The reagents immobilised on the support can be placed in a predetermined order, to facilitate detection and identification of the complexes formed, and according to a variable and adaptable density.
In one embodiment, the inventive product comprises a multiplicity of synthetic oligonucleotides, of length between 5 and 100 bases, and specific for one or more genes or RNAs such as previously defined.
The products of the invention typically comprise control molecules, which are used to calibrate and/or standardise the results.
Another object of the present application relates to a kit comprising a compartment or container comprising at least one, preferably several, nucleic acids comprising a complementary and/or specific sequence of one or more genes or RNAs such as previously defined and/or one, preferably several ligands of one or more polypeptides such as previously defined. Preferably, the product comprises at least 5 distinct nucleic acids and/or ligands selected from the nucleic acids and ligands mentioned above.
Another object of the invention relates to the use of a product or kit such as defined above for the diagnosis of thyroid cancer in a mammalian subject, preferably a human subject.
It is understood that any equivalent technique can be used within the scope of the present application to determine the presence of a target molecule.
Further aspects and advantages of the present invention will appear upon consideration of the following examples, which must be regarded as illustrative and non-restrictive.
A. Material and Methods
1. Biological Samples:
Candidate signatures have been identified using retrospective cohorts (DeCanThyr, Hospices Civils de Lyon and FNAC, Centre Baclesse—Caen) on a total of 60 subjects, 20 patients had malignant nodules and 40 patients had benign lesions. All samples were previously classified as “indeterminate” according to the Bethesda classification and were associated with the histopathological analysis results of the surgical specimen. Table 1 below provides details about the distribution of cancerous and benign thyroid lesions in each Bethesda indeterminate category.
2. Total RNA Extraction or Storage
Dedicated fine needle aspirations were collected in 350 μl of RLT buffer (RNeasy Micro kit, Qiagen) and stored at −80° C. Total RNA was extracted using the manufacturer's recommended procedure and stored at −80° C.
The acceptance criteria were the following:
3. HsGWSA2.0.2.Dx POC 1M Agilent Array
The Hs GWSA 2.0.2.Dx POC 1M Microarray on the Agilent platform is a tool created to detect and quantify gene expression and alternative expression of transcripts. To achieve this, gene transcripts are identified that represent reference and alternative expression patterns and are processed to resolve the discriminative regions in these expression pattern. The identified alternative event patterns may include: alternative exons, novel splice sites, elimination of splice sites, skipping of normally constitutive exons or some combination thereof.
Analysis was performed on publically available sequence data from a set of human genes identified in the NCBI release 189 and using the NCBI Human genome release 37.3. These sequences were analyzed with a proprietary software system that processed two overlapping datasets: modeling the exon structure of the annotated reference forms of the genes and generating models to identify alternative events identified by these reference forms within the genes.
The reference form analysis for the genes involved identification of expression regions that represented every exon of the reference exemplar sequence data. The alternative event analysis was performed by modeling expression patterns of the identified alternative transcripts and deriving the discriminative expression regions.
After the expression regions were identified a second analysis was performed to identify a minimally redundant expression path through the gene for all selected expression features (reference form expression and included discriminative expression regions). A series of probes were designed from this path: probes that target the expressed exons of the reference forms, junctions to constitutive exons of the reference forms and probes designed against the identified discriminative regions and the junctions (explicit and implicit) indicated by the evidence of the alternative transcripts. This combination of exon and junction probes enables the platform to specifically monitor the expression of the individual components of the alternatively spliced mRNA populations.
19,391 human genes were submitted to the analysis process. 34056 alternative events were identified that matched the selection and scoring criteria of the analysis and selection algorithms. Probes were designed to the sense strand of the sequence as a 24 nucleotide sequence with a 35 nucleotide poly-T linker and the microarrays were produced by Agilent Technologies.
4. RNA Amplification
15 to 50 ng of total RNA were used as matrix for target synthesis using the Ovation Pico WTA System v2 kit (NuGen, San Carlos, Calif.), as recommended by the manufacturer. The quality and quantity of cDNA generated were evaluated using Agilent's 2100 Bioanalyser (Agilent Technologies) and the NanoDrop 1000 Spectrophotometer (Labtech International, UK). 2 μg of cDNA molecules were labeled using Agilent SureTag DNA labeling kit, following manufacturer's recommended protocol. The quality of the Cy3-labelled cDNA is assessed by measuring the yield of cDNA and the specific activity, proportional to the degree of labeling, by measuring the absorbance of the purified product at 260 nm and 550 nm.
5. Hybridization with GWSA Slide
10 μg of cDNA amplified and labeled with Cy3 were used for the hybridization on the HsGWSA2.0.0.Dx POC 1M Agilent array following manufacturer's recommendation, at 55° C. for 17 hours. Non-specific hybridization signals were removed by a 2-step wash of the arrays, following manufacturer's protocol. Hybridization signals were measured using the G5761A SureScan Dx Microarray Scanner System (Agilent Technologies, USA).
6. Data Extraction and Normalization
Intensity signal measurements were extracted with Feature Extraction 11.0.1.1 software (Agilent Technologies). The files (.Tiff) processed with Feature Extraction were quantile normalized using GeneSpring Gx software (Agilent Technologies) prior to statistical analysis. Expression values below 20 were all floored at 20. Probe sets were filtered out if the ratio of the maximum to minimum intensity across the training set was lower or equal to 2.5 or if the difference between the maximum and minimum intensity was lower or equal to 50. 207 043 probe sets from the 448 863 probe sets on the chips remained for further analysis.
7. Statistical Analysis and Signature Identification
The point biserial correlation coefficient was calculated for probe sets between probe set expression and a binary indicator, the value of which was defined by the malignancy status of the patient (0 for cancer free and 1 for patients with Cancer) (van't Veer, et al. 2002). The probe sets with the highest absolute correlation are those most correlated with the occurrence of thyroid cancer. T-tests to evaluate whether point biserial correlation values were significantly different from 0 were used.
To develop a prediction rule to classify samples from new patients with unknown status, we chose the nearest centroid prediction rule (Michiels et al, 2005). This prediction rule classifies new patients according to the correlation between the expression of their molecular classifier probe sets (or genes or events) and the average profiles in the two categories (the average Benign and Malignant profiles) are defined as the vector of the average expression values of the molecular classifier probe sets in benign and malignant samples in the training set, the predicted category being the one with the highest correlation. The molecular classifier probe sets were defined as those for which the p-value of the point biserial correlation coefficient with the benign or malignant binary status variable was below a particular threshold.
Leave-one-out cross-validation (LOOCV) was used to estimate the prediction accuracy of the rule determined on the training set. One sample is left out, and the remaining samples are used to build the prediction rule, which is then used to classify the left-out sample. The entire model-building process was repeated for each leave-one-out training set to provide unbiased estimates of the prediction accuracy of the prediction rule determined on the entire training set: correlation coefficients were recalculated, sorted according to their absolute value, the molecular classifier probe sets defined, and the nearest centroid prediction rule constructed. We predicted the outcome of the sample that was left out in the first place, on the basis of the highest correlation coefficient it had with the average benign and malignant profiles from the remaining other samples. At the end of the leave-one-out cross-validation procedure, we counted in how many cases the predictions were correct and in how many cases the predictions were incorrect to estimate the misclassification rate of the prediction rule determined on the entire training set. The results obtained are listed in the table below. ROC-curves are constructed to determine optimal cut-offs based on the correlation values with the cancer and benign profiles, and area under the ROC curve (AUC) has been calculated for estimating predictive accuracy.
Based on these correlations coefficients (Approach C), 3 different selection methods were used in order to reduce the size of the signature:
Two other approaches were used to reduce the size of the classifiers:
B. Results
Expression values below 20 were all floored at 20. Probe sets (PS) were filtered out if the ratio of the maximum to minimum intensity across the training set was lower or equal to 2.5 or if the difference between the maximum and minimum intensity was lower or equal to 50. 207 043 probe sets out of the 448 863 probe sets per chip remained for further analysis.
The probe sets associated to the highest fold changes and the most statistically significant of a difference between malignant and benign nodules were identified and are listed in Table 2.
Molecular signatures have been developed using the p-value thresholds described above. Table 3 summarizes the leave one out cross-validation results. In this table, the signatures error rates, sensitivities and specificities obtained with the nearest centroid prediction rules are denoted as Error Rate (diff), Sensitivity (diff) and Specificity (diff) because based on the difference in the correlation with the Malignant and Benign profiles.
The accuracy of the signatures is measured by the area under the ROC [Receiver Operating Characteristics] curve. The ROC curve is created by plotting the true positive rate (sensitivity) against the false positive rate (1-specificity) at various threshold settings. The point with coordinate (0,1) (also called perfect classification) in the ROC space is associated to a sensitivity of 100% and a specificity of 100%. A test with perfect discrimination has a ROC curve that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the test (Zweig & Campbell, 1993).
The Area Under the ROC Curve (AUC) based on the difference between the correlation values (Malignant and/or Benign profiles) are labeled as AUC (diff), AUC (M) or AUC (B).
Signature B, based on the combination of 918 probe sets with p-values below 1e−3, produces the lowest overall error rate (13.3%). For this signature, the leave-one-out cross validated estimations of sensitivity and specificity are 70% and 95%, respectively. The ROC curves associated to the 918PS signature are presented in
Signature B contains a plurality of probes that allows specific detection of at least one expression product of CITED1, TENM1, CCM2, SCLY, IGSF10 and FRMD3 and of each gene listed in Table A and in Table B.
Signature D, based on the combination of 109 probe sets with p-values below 1e−5, produces also the lowest overall error rate (13.3%). For this signature, the leave-one-out cross validated estimations of sensitivity and specificity are 75% and 92.5%, respectively. The ROC curves associated to the 109PS signature are presented in
Signature D contains a plurality of probes that allows specific detection of at least one expression product of CITED1, TENM1, CCM2, SCLY, IGSF10 and FRMD3 and of each gene listed in Table A.
Signature F, based on the combination of 25 probe sets with p-values below 1e7, produces an error rate of 18.3%. For this signature, the leave-one-out cross validated estimations of sensitivity and specificity are 65% and 90%, respectively. The ROC curves associated to the 25PS signature are presented in
Each of these signatures may be used to effectively determine the malignant status of a thyroid lesion with very high specificity and a good sensitivity.
In order to identify the smallest signature composition able to classify patient with high accuracy, 3 approaches (see Material and Methods) were selected and applied on the composition of the 25PS signature (90% specificity and 65% sensitivity). The results show that Signature F actually targets 6 distinct marker genes, namely CITED1, TENM1, SCLY, CCM2, IGSF10 and FRMD3. The composition and associated fold changes of the 25 PS are listed in Table 4. These target genes represent valuable markers which can be used to analyse thyroid cancer in patients.
Starting from the list of probe sets (PS) included in the 25PS signature, the correlation between each PS expression level was estimated using Pearson correlation coefficients and associated p-values. Based on these correlation coefficients, 3 different selection methods (Approach C methods 1, 2 and 3 as described in Material and Methods) as well as the following two approaches were used to reduce the size of the signatures:
Signatures associated to error rates below or equal to 20% were identified for the 3 approaches used and are listed in Table 5.
Various probe sets are designed that target particular regions of the CITED1 or TENM1 gene products. Tables 6 and 7 identify the exons targeted by these probes for TENM1 and CITED1, respectively, and the upregulations of these exons measured in the malignant samples of cytological indeterminate thyroid lesions.
Various combinations of such probes were assessed for relevance. The invention shows that when a limited number of target genes are monitored according to the invention, it is highly preferred to use two distinct probes specific for distinct regions of the expression product of said gene. In this regard, monitoring exon4 and/or exon3 and/or exon-2 of CITED1 and targeting exons 32, 25, 19, 18, 17, and/or 1, for TENM1 is particularly advantageous.
The combination of the following particular exons of CITED1 provides strong sensitivity:
For_CITED1 gene, the targeting of exon3, exon4 and exon-2 with the following probe sets increases performance:
Sensitivity 73% and Specificity 83%
Sensitivity 75% and Specificity 90%
These results show that an effective and reliable detection of thyroid cancer can be implemented by monitoring 2 or 3 distinct domains of CITED1 expression product.
As discussed below, by further combining such marker with TENM1, a further specific and sensitive detection is obtained.
In this experiment, two probeSets specific for CITED1 and two probeSets specific for TENM1 were combined. The specific probeSets used are listed below.
Sensitivity 65% and Specificity 92.5%
Further exon analysis results linked to the performances listed in table 5 are reported in Table 8, focusing only on CITED1 and TENM1.
These results show that by detecting exon1, 19 and/or 25 of TENM1 and exon 4 and/or (−2) of CITED1, highly relevant diagnosis can be made.
Number | Date | Country | Kind |
---|---|---|---|
EP14198610.9 | Dec 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/080101 | 12/16/2015 | WO | 00 |