Lung cancer (LC) is the leading cause of cancer-related deaths worldwide, accounting for an estimated 1.6 million deaths out of 1.8 million cases in 2012 (Globocan 2012). The incidence pattern of LC closely parallels the mortality rate because of persistently low survival rates. There are two major classes of LC, non-small cell lung cancer (NSCLC, representing 85% of all lung cancers) and small cell lung cancer (SCLC, the remaining 15%)1. Histologically, NSCLC is further divided into three major subtypes; squamous cell carcinoma, adenocarcinoma and large cell carcinoma. Adenocarcinoma is the most common form and has approximately 40% prevalence, followed by squamous cell and large cell carcinoma, which represent 25% and 10%, respectively2. Clinical manifestations of LC are diverse and patients are mostly asymptomatic at early stages. Symptoms, even when present, are non-specific and unfortunately mimic more common benign etiologies3. Traditional diagnostic strategies for LC include imaging tests, such as chest X-ray radiography (CXR) or computed tomography (CT), cytological assessment of sputum or bronchial suctioning and histopathological evaluation of biopsies taken during bronchoscopy, mediastinoscopy, open lung surgery or from metastasis resections4-6. In the majority of patients, these procedures are initiated after the development of symptoms, therefore at advanced stages of the disease, when the overall condition of the patient is already impaired and prognosis is poor, as shown by the low five-year patient survival of 1-5%1. Strikingly, patient survival is high as 52% if LC is diagnosed early, demonstrating that early diagnosis of LC is pivotal to increase the probability of successful therapy.
Accordingly, there is a need for new techniques for diagnosis of specific cancers and their subtypes as well as for further and/or alternative treatment options in cancer therapy. Thus, the technical problem underlying the present invention is the provision of reliable means and methods for the detection of cancer, in particular lung cancer and its subtypes, and for the determination of treatment options.
The solution to this technical problem is provided by the embodiments as defined herein and as characterized in the claims.
The invention provides a statistical method for assessing whether a subject suffers from cancer or is prone to suffering from cancer. The invention provides an anti-cancer agent and/or radiation therapy, said agent or radiation therapy being selected on basis of the patient group determined by the statistical method provided herein.
The object of the invention is solved with the features of the independent claims. Dependent claims refer to preferred embodiments.
The invention provides a statistical method of assessing whether a subject suffers from cancer or is prone to suffering from cancer, said method comprising the step of performing at least one statistical algorithm for classification and for regression on measurement data of the subject, wherein the measurement data of the subject comprises at least one of the following: a value of GATA6 Em isoform in at least one sample taken from the subject, a value NKX2-1 Em isoform in said at least one sample, a value of GATA6 Ad isoform in said at least one sample, NKX2-1 Ad isoform in said at least one sample; and wherein at least one of the following is used as at least one classifier or a component of at least one classifier in the statistical method: GATA6 Em isoform, NKX2-1 Em isoform, GATA6 Ad isoform, NKX2-1 Ad isoform, ratio of GATA6 Em isoform/GATA6 Ad isoform, ratio of NKX2-1 Em isoform/NKX2-1 Ad isoform.
Statistical algorithms for classification and for regression on measurement data are generally known to the skilled person. Examples of statistical algorithms can be found in the following textbooks:
Preferably, these algorithms are grossly partitioned into parametric approaches that explicitly model the data by one member of a parametrized family of probability distribuions (e.g., linear discriminant analysis or logit regression), and non-parametric approaches like Neural Networks or Support Vector Machines that do not rely on a distributional assumption.
According to an embodiment, said value of the GATA6 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the GATA6 Em isoform comprising the nucleic acid sequence of SEQ ID No: 1 or the GATA6 Em isoform comprising a nucleic acid sequence with up to 55 additions, deletions or substitutions of SEQ ID NO: 1.
According to an embodiment, said value of the NKX2-1 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the NKX2-1 Em isoform comprising the nucleic acid sequence of SEQ ID No: 2 or the NKX2-1 Em isoform comprising a nucleic acid sequence with up to 39 additions, deletions or substitutions of SEQ ID NO: 2.
According to an embodiment, said value of GATA6 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the GATA6 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 5 or the GATA6 Ad isoform comprising a nucleic acid sequence with up to 55 additions, deletions or substitutions of SEQ ID NO: 5.
According to an embodiment, said value of the NKX2-1 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the NKX2-1 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 6 or the NKX2-1 Ad isoform comprising a nucleic acid sequence with up to 38 additions, deletions or substitutions of SEQ ID NO: 6.
According to an embodiment, the statistical method further comprises the step of processing the measurement data, preferably normalizing, rescaling, dimension reducing, and/or noise reducing.
Preferably, the step of processing the measurement data, preferably normalizing, rescaling, dimension reducing, and/or noise reducing is performed before performing the at least one statistical algorithm for classification and for regression on measurement data of the subject.
Preferably, the normalizing of the measurement data comprises the normalizing of at least one of the following: microarray or RNA-Seq measurements.
Preferably the normalizing of the measurement comprises obtaining abundance estimates and/or detecting outlier and/or removing outlier.
Preferably, the reducing of the dimension and/or the reducing of the noise comprises transforming the measurement data into a space where discriminatory methods achieve a higher power.
Preferably, reducing the dimension and/or reducing the noise comprises at least one of the following: principal component analysis, non-linear variant principal component analysis, singular value decomposition, non-linear variant singular value decomposition, independent component analysis, non-linear independent component analysis, a kernel principal component analysis.
According to an embodiment, the statistical method further comprises the steps of cross-validation and/or bootstrapping.
According to an embodiment, the GATA6 Em isoform of said sample is set in relation to a GATA6 Em isoform of at least one control sample and then used as a classifier in the statistical method.
Preferably, set in relation comprises at least one of the following: normalizing the value of the GATA6 Em isoform of said sample with respect to the value of the GATA6 Em isoform of the control sample, subtracting the value of the GATA6 Em isoform of at least one control sample from the GATA6 Em isoform of said sample.
Preferably, said value of the GATA6 Em isoform of said at least one control sample is obtained by measuring in said sample of said subject the amount of a specific transcription factor isoform wherein said specific transcription isoform is the GATA6 Em isoform comprising the nucleic acid sequence of SEQ ID No: 1 or the GATA6 Em isoform comprising a nucleic acid sequence with up to 55 additions, deletions or substitutions of SEQ ID NO: 1.
According to an embodiment, the NKX2-1 Em isoform in said at least one sample is set in relation to a NKX2-1 Em isoform of at least one control sample and then used as a classifier in the statistical method.
Preferably, set in relation comprises at least one of the following: normalizing the value of the NKX2-1 Em isoform of said sample with respect to the value of the NKX2-1 Em isoform of the control sample, subtracting the value of the NKX2-1 Em isoform of at least one control sample from the NKX2-1 Em isoform of said sample.
Preferably, said value of the NKX2-1 Em isoform in said at least one control sample is obtained by measuring in said at least one sample of said subject the amount of a specific transcription factor isoform wherein said specific transcription isoform is the NKX2-1 Em isoform comprising the nucleic acid sequence of SEQ ID No: 2 or the NKX2-1 Em isoform comprising a nucleic acid sequence with up to 39 additions, deletions or substitutions of SEQ ID NO: 2;
According to an embodiment, a ratio of the GATA6 Em isoform and the GATA6 Ad isoform and a ratio of the NKX2-1 Em isoform and the NKX2-1 Ad isoform are used as a classifier.
According to an embodiment, the statistical method comprises a linear classifier.
Preferably, the statistical method comprises at least one of the following: a linear classifier, preferably a support vector machine and/or a linear discriminant analysis and/or decision trees, a regression method, preferably linear, logistic or probit regression, or a penalized version of the regression, preferably a penalized version of the linear, logistic or probit regression, more preferably a Lasso and/or ridge regression, or a generalized linear model, a neural network, or a regression tree, or ensemble methods built from the above algorithms in a process, preferably boosting.
Preferably, the support vector machine is a linear kernel support vector machine. Preferably, the linear kernel support vector machine is the one implemented in the following software: Evgenia Dimitriadou, Kurt Hornik, Friedrich Leisch, David Meyer and Andreas Weingessel (2010). e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.5-24. http://CRAN.Rproject.org/package=e1071.
Preferably, the SVM, does not assume that the data from the sample groups are drawn from a Gaussian distribution. The SVM can be considered as the more robust choice in comparison to the linear discrimination analysis. Preferably, the support vector machine finds a separating hyperplane between data from normal and cancerous samples, which is expected to yield a good generalization performance when applied to new, unseen data. Preferably, the distance to this hyperplane is determined by the following function:
LC
score=−α·log2(ratio of GATA6 Em isoform/GATA6 Ad isoform)−β·log2(ratio of NKX2-1 Em isoform/NKX2-1 Ad isoform)−γ,
wherein preferably α=0.607, β=1.431, γ=1.916.
Preferably, α=−0.607, β=−1.431, γ=−1.916
Preferably, the function comprises a prefactor (−1) such that the distance to the hyperplane is determined by the following function:
LC
score=(−1)−(−α·log2(ratio of GATA6Em isoform/GATA6Ad isoform)−β·log2(ratio of NKX2-1Em isoform/NKX2-1Ad isoform)−γ),
wherein preferably α=0.607, β=1.431, γ=1.916.
The amount of said specific transcription factor isoform(s) can be measured on the mRNA level.
The appended example shows that the expression ratio remained stable for both control donor as well as LC EBC samples until 75 ng of RNA starting material. Decreasing the starting material below 75 ng resulted in suboptimal detection of the Em-isoform in the control and the Ad-isoform in the LC group, which led to distorted ratios. If the amount of the transcription factor isoform(s) is determined/measured in accordance with the present invention, it is preferred that the starting material (mRNA/RNA) contains/is more than about 75 ng of RNA.
According to an embodiment, the amount of said specific transcription factor isoform(s) is measured via a polymerase chain reaction-based method, According to an embodiment, the amount of said specific transcription factor isoform(s) is measured via a polymerase chain reaction-based method, an in situ hybridization-based method, or a microarray. According to an embodiment, the amount of said specific transcription factor isoform(s) is measured via a polymerase chain reaction-based method. According to an embodiment, said polymerase chain reaction-based method is a quantitative reverse transcriptase polymerase chain reaction.
According to an embodiment, the step of measuring in a sample of said subject the amount of a specific transcription factor comprises the contacting of the sample with primers, wherein said primers can be used for amplifying at least one of the specific transcription factor isoforms. According to an embodiment, said primers are selected from the group of primers having a nucleic acid sequence as set forth in SEQ ID NOs 9 to 40, particularly one or more primers/primer pairs having a nucleic acid sequence as set forth in SEQ ID NOs 9 to 24. For example, one or more of the following primers/primer pairs can be used in accordance with the present invention:
According to an embodiment, the amount of said specific transcription factor isoform(s) can be measured on the polypeptide/protein level. According to an embodiment, the amount of said specific transcription factor isoform(s) is measured by an ELISA, a gel- or blot-based method, mass spectrometry, flow cytometry or FACS.
According to an embodiment, the cancer is a lung cancer. According to an embodiment, said lung cancer is non-small cell lung cancer (NSCLC) or small cell lung cancer (SCLC).
According to an embodiment, the sample comprises tumor cells. According to an embodiment, the sample is a biopsy sample, a breath condensate sample, a blood sample, a bronchoalveolar lavage fluid sample, a mucus sample or a phlegm sample. Preferably, the sample is a breath condensate sample.
According to an embodiment, the subject is a human subject. According to an embodiment, said human subject is a subject having an increased risk for developing cancer. A human subject having an increased risk for developing cancer can, for example, be a human subject that is a current or former smoker(s); and/or that was/is exposed to smoke, like environmental smoke, cooking fumes, and/or indoor smoky coal emissions; and/or that was/is exposed to asbestos, some metals (e.g. nickel, arsenic and cadmium), radon, and/or ionizing radiation. A human subject having an increased risk for developing cancer can, for example, be a human subject that has shown cancer-like lesions in a preceding computed tomography scan.
According to an embodiment, the method further comprises the detection of one or more additional markers in a sample of said subject. According to an embodiment, said one or more additional markers are one or more markers for classifying cancer. According to an embodiment, said one or more additional markers are one or more markers for classifying lung cancer into subtypes of lung cancer. According to an embodiment, said one or more markers for classifying lung cancer are differentially expressed.
According to an embodiment, said one or more markers for classifying lung cancer are one or more markers for classifying non-small cell lung cancer (NSCLC) into subtypes of NSCLC. According to an embodiment, said one or more markers for classifying NSCLC are selected from the group consisting of SFTPA1, SFTPB, NAPSA, hsa-let7-d, VEGFA, VEGFB, VEGFC, VEGFD, PLAUR, TP63, KRT5, KRT6A, KRT7, hsa-miR9, HMGA1 and CDH1. Exemplary nucleic acid sequences and amino acid sequences of these markers are provided in the present application.
The specific transcription factor isoform(s) and/or the additional markers (like SFTPA1, SFTPB, NAPSA, VEGFA, VEGFB, VEGFC, VEGFD, PLAUR, TP63, KRT5, KRT6A, KRT7, HMGA1 and/or CDH1) can be measured on the protein/polypeptide or the mRNA level. Additional markers like hsa-let7-d, hsa-miR9, can be measured on the mRNA level.
For example, the amount can be measured via a polymerase chain reaction-based method, an in situ hybridization-based method, or a microarray, or a quantitative reverse transcriptase polymerase chain reaction.
For example, the amount can be measured on the polypeptide/protein level, for example, by an ELISA, a gel- or blot-based method, mass spectrometry, flow cytometry or FACS.
For example, if the specific transcription factor isoform(s) and/or additional marker(s) is/are measured on the protein level, contacting and binding can be performed by taking advantage of immunoagglutination, immunoprecipitation (e.g. immunodiffusion, immunelectrophoresis, immune fixation), western blotting techniques (e.g. (in situ) immuno histochemistry, (in situ) immuno cytochemistry, affinitychromatography, enzyme immunoassays), and the like. These and other suitable methods of contacting proteins are well known in the art and are, for example, also described in Sambrook and Russell (2001, loc. cit.).
In case the specific transcription factor isoform(s) and/or additional marker(s) is a protein, quantification can be performed by taking advantage of the techniques referred to above, in particular Western blotting techniques. Generally, the skilled person is aware of methods for the quantitation of polypeptides. Amounts of purified polypeptide in solution can be determined by physical methods, e.g. photometry. Methods of quantifying a particular polypeptide in a mixture rely on specific binding, e.g of antibodies. Specific detection and quantitation methods exploiting the specificity of antibodies comprise for example immunohistochemistry (in situ). Western blotting combines separation of a mixture of proteins by electrophoresis and specific detection with antibodies. Electrophoresis may be multi-dimensional such as 2D electrophoresis. Usually, polypeptides are separated in 2D electrophoresis by their apparent molecular weight along one dimension and by their isoelectric point along the other direction.
For example, if the specific transcription factor isoform(s) and/or additional marker(s) is/are measured on the RNA/mRNA level, contacting and binding can be performed by taking advantage of Northern blotting techniques or PCR techniques/via a polymerase chain reaction-based method, like quantitative reverse transcriptase polymerase chain reaction or in-situ PCR, an in situ hybridization-based method, or a microarray. These and other suitable methods for binding (specific) mRNA are well known in the art and are, for example, described in Sambrook and Russell (2001, loc. cit.).
If the specific transcription factor isoform(s) and/or additional marker(s) is an mRNA, determination can be performed by taking advantage of northern blotting techniques, hybridization on microarrays or DNA chips equipped with one or more probes or probe sets specific for mRNA transcripts or PCR techniques referred to above, like, for example, quantitative PCR techniques, such as Real time PCR. A skilled person is capable of determining the amount of the component, in particular said gene products, by taking advantage of a correlation, preferably a linear correlation, between the intensity of a Raman signal and the amount of the component to be determined.
According to an embodiment, said subtype of NSCLC is classified as adenocarcinoma, if said one or more markers for classifying NSCLC into subtypes of NSCLC are one or more of SFTPA1, SFTPB and NAPSA, and
if the level of one or more of SFTPA1, SFTPB and NAPSA is increased compared to a control. Preferably the level of SFTPA1 is the mRNA level or the protein level of SFTPA1.
According to an embodiment, said subtype of NSCLC is classified as adenocarcinoma, if said marker for classifying NSCLC into subtypes of NSCLC is hsa-let7-d, and if the level of hsa-let7-d is decreased compared to a control. Preferably the level of hsa-let7-d is the RNA level of hsa-let7-d.
According to an embodiment, said subtype of NSCLC is classified as metastatic adenocarcinoma,
if said marker for classifying NSCLC into subtypes of NSCLC is VEGFA, VEGFB, VEGFC, VEGFD and/or PLAUR, and
if the level of VEGFA, VEGFB, VEGFC, VEGFD and/or PLAUR is increased compared to a control. Preferably the level of VEGFA, VEGFB, VEGFC, VEGFD and/or PLAUR is the mRNA level or the protein level of VEGFA, VEGFB, VEGFC, VEGFD and/or PLAUR.
According to an embodiment, said subtype of NSCLC is classified as squamous cell carcinoma, if said marker for classifying NSCLC into subtypes of NSCLC is one or more of TP63, KRT5, KRT6A, KRT7 and hsa-miR9, and
if the level of one or more of one or more of TP63, KRT5, KRT6A, KRT7 and hsa-miR9, is increased compared to a control. Preferably the level of TP63, KRT5, KRT6A and KRT7 is the mRNA level or the protein level of TP63, KRT5, KRT6A and KRT7. Preferably the level of hsa-miR9 is the RNA level of hsa-miR9.
According to an embodiment, said subtype of NSCLC is classified as large cell lung carcinoma, if said marker for classifying NSCLC into subtypes of NSCLC is HMGA1, and if the level of HMGA1 is increased compared to a control. Preferably the level of HMGA1 is the mRNA level or the protein level of HMGA1.
According to an embodiment, said subtype of NSCLC is classified as large cell lung carcinoma,
if said marker for classifying NSCLC into subtypes of NSCLC is CDH1, and
if the level of CDH1 is decreased compared to a control. Preferably the level of CDH1 is the mRNA level or the protein level of CDH1.
According to an embodiment, said one or more markers for classifying lung cancer are genomic alterations. A person skilled in the art knows how to determine genomic alterations, a mutation(s) or a polymorphism(s) in a gene by his common general knowledge and the teaching provided herein. Exemplary, non-limiting techniques for determining such genomic alteration(s), mutation(s) and/or polymorphism(s) are described below.
Genomic alterations, including mutations and polymorphisms, can be detected by DNA sequencing, including pyrosequencing and Sanger sequencing methods, PCR based methods including restriction fragment length polymorphisms, taqman probes and molecular beacons, or using DNA arrays. Genomic alterations including chromosomal changes, such as translocations or deletions can be identified by conventional cytogenetic stainings, fluorescent in situ hybridization, comparative genomic hybridization and array based comparative genomic hybridization, or PCR based analysis.
According to an embodiment, said one or more markers for classifying lung cancer are one or more markers for classifying non-small cell lung cancer (NSCLC) into subtypes of NSCLC.
According to an embodiment, said subtype of NSCLC is classified as adenocarcinoma,
if said marker for classifying NSCLC into subtypes of NSCLC is KRAS G12D or G12V G-->C/T transversion at codon for Exon 12, and
if said marker is present in the sample from the subject.
Preferably, the specific mutations of KRAS found in NSCLC are one or more of: G34T, G35A, G35T and G37T and G38T (the last 2 result in mutations of codon 13 which are also oncogenic)
These mutations are negative predictors of response to EGFR therapy in patients.
According to an embodiment, said subtype of NSCLC is classified as metastatic adenocarcinoma,
if said marker for classifying NSCLC into subtypes of NSCLC is KRAS G12D//TP53 mutations R172H Substitution in p53 (Li-Fraumeni syndrome), and
if said marker is present in the sample from the subject.
Preferably, metastatic adenocarcinoma is characterized/classified by a combination of KRAS and TP53 as defined above.
According to an embodiment, said subtype of NSCLC is classified as adenocarcinoma in never-smokers,
if said marker for classifying NSCLC into subtypes of NSCLC is KRAS G12D G-->G-->A (G35A) transition, and
if said marker is present in the sample from the subject.
According to an embodiment, said subtype of NSCLC is classified as adenocarcinoma or squamous cell carcinoma,
if said marker for classifying NSCLC into subtypes of NSCLC is TP53 mutations, translocations, and
if said marker is present in the sample from the subject.
Preferably, the most frequent mutations in TP53 for Adenocarinoma: G:C247T:A and for Squamous cell carincoma is G:C274T:A and for SCLC is G:C96T:A.
According to an embodiment, said subtype of NSCLC is classified as drug resistant adenocarcinoma (patients relapse after tyrosine kinase inhibitors),
if said marker for classifying NSCLC into subtypes of NSCLC is EGFR T790M mutation in exon 20, codon 790, and
if said marker is present in the sample from the subject.
According to an embodiment, said subtype of lung cancer is classified as small cell lung cancer (SCLC),
if said marker for classifying lung cancer into subtypes of lung cancer is/are TP53 mutations combined with mutations in RB1, and
if said marker is present in the sample from the subject.
The above mentioned additional markers are suitable markers to classify cancer into subtypes of cancer, and in particular lung cancer into subtypes of lung cancer. This is illustrated by the references below. Accordingly, the one or more additional markers can be suitably be used in accordance with the present invention for a refined analysis using the herein provided statistical method. For example, the expression of one or more of these additional markers can be determined in exhaled breath condensates from patients that are assessed to suffer from cancer or being prone to suffering from cancer in accordance with the statistical method can, in order to classify e.g. cancer subtype (preferably the NSCLC subtype) in the patients. The terms “transition” and “transversion” are used interchangeably herein.
For example, the following one or more markers can be used to classify NSCLC into subtypes of NSCLC:
SFTPA, SFTPB and/or NAPSA: (Garber, Troyanskaya et al. 2001, Ye, Findeis-Hosey et al. 2011, Turner, Cagle et al. 2012, Whithaus, Fukuoka et al. 2012, Taguchi, Hanash et al. 2013); and/or hsa-let7-d: (Lee and Dutta 2007, Kumar, Armenteros-Monterroso et al. 2014); and/or KRAS G12D and/or G12V: (Winslow, Dayton et al. 2011); and/or
TP53 mutations and/or TP53 translocations: (Kishimoto, Murakami et al. 1992)
The term KRAS G12D or G12V (or more particularly the term “KRAS G12D or G12V G-->C/T transversion at codon for Exon 12”) refers to an amino acid substitution at position 12 of the amino acid sequence of KRAS. The substitution is due to a transversion in the coding sequence of KRAS. Particularly the term “KRAS G12D or G12V G-->C/T transversion at codon for Exon 12”) can refer to a G(35)-->C/T transversion at position 35 of the DNA sequence of KRAS within codon 12. The DNA mutation is G→C/T at position 35 of the coding sequence of KRAS, which is changing codon 12 in the amino acid sequence of KRAS. Coding sequences of KRAS can be derived from databases like NCBI. Exemplary coding sequences of KRAS to be used herein are, for example, shown in the database under accession number GI 575403058 (Transcript variant a) or under GI 575403057 (Transcript variant b).
VEGFA, VEGFB, VEGFC, VEGFD, and/or PLAUR: (Shijubo, Uede et al. 1999, Garber, Troyanskaya et al. 2001, Su, Yang et al. 2006) (Han, Silverman et al. 2001, Stacker, Caesar et al. 2001, Li, Hu et al. 2014, Qi, Zhu et al. 2014); and/or
KRAS G12D mutations and/or TP53 mutations (such as R172H substitution in TP53 (Li-Fraumeni syndrome)): (Kishimoto, Murakami et al. 1992, Lang, Iwakuma et al. 2004)
The term “KRAS G12D//TP53 mutation(s) R172H Substitution in TP53 (Li-Fraumeni syndrome)” can refer to KRAS G12D mutation(s) and/or TP53 mutation(s) (such as R172H substitution in TP53 (Li-Fraumeni syndrome)).
The term KRAS G12D refers to an amino acid substitution at position 12 of the amino acid sequence of KRAS. The substitution is due to a transversion in the coding sequence of KRAS, like a G-->A (G35A) transition.
The term “TP53 mutation(s)” (or more particularly the term “TP53 mutation(s) R172H Substitution in TP53”) can refer to an amino acid substitution in the amino acid sequence of TP53. The substitution is due to a transition in the coding sequence of TP53. Particularly the term “TP53 mutation(s) R172H Substitution in TP53” can refer to a G to A transition at position 515 (G515A) of the sequence encoding TP53. Coding sequences of TP53 can be derived from databases like NCBI. An exemplary coding sequence of TP53 to be used herein is, for example, shown in the database under accession number GI 23491728.
KRAS G12D G-->A (G35A) transition: (Riely, Kris et al. 2008). The terms “KRAS G12D G-->G-->A (G35A) transition” and “KRAS G12D G-->A (G35A) transition” can be used interchangeably herein.
The term “KRAS G12D” or particularly the term “KRAS G12D G-->G-->A (G35A) transition”/“KRAS G12D G-->A (G35A) transition” refers to an amino acid substitution at position 12 of the amino acid sequence of KRAS. The substitution is due to a transition in the coding sequence of KRAS. The terms “KRAS G12D G-->G-->A (G35A) transition”/“KRAS G12D G-->A (G35A) transition” can refer to a KRAS G12D G-->A (G35A) transition. Particularly the term “KRAS G12D G-->G-->A (G35A) transition” refers to an amino acid substitution at position 12 of the amino acid sequence of KRAS which is due to a G-->A (G35A) transition in the coding sequence of KRAS. The amino acid change KRAS G12D results from a change at position 35 in the coding sequence of KRAS, in this case G35 to A.
Drug Resistant Adenocarcinoma (for Example Patients Relapse after Therapy with Tyrosine Kinase Inhibitors):
EGFR T790M mutation in exon 20, codon 790: (Pao, Miller et al. 2005)
The terms “EGFR T790M mutation in exon 20, codon 790” and “EGFR T790M mutation in codon 790” can be used interchangeably herein. The terms “EGFR T790M mutation in exon 20, codon 790” or “EGFR T790M mutation in codon 790” are also known as “EGFR C2369T mutation”.
The term “EGFR T790M mutation”, or particularly the term “EGFR T790M mutation in exon 20, codon 790”, refers to an amino acid substitution at position 790 of the amino acid sequence of EGFR. The amino acid substitution can be due to a transition in the coding sequence of EGFR. Particularly the terms “EGFR T790M mutation in exon 20, codon 790”/“EGFR T790M mutation in codon 790”/“EGFR C2369T mutation” can refer to a C to T transition at position 2369 (i.e. C2369T) of the sequence encoding EGFR. Coding sequences of EGFR can be derived from databases like NCBI. An exemplary coding sequence of EGFR to be used herein is, for example, shown in the database under accession number GI 41327737 (Transcript isoform a), GI 41327731 (Transcript isoform b), GI 41327733 (Transcript isoform c) or 41327735 (Transcript isoform d).
TP63, KRT5, KRT6 and/or KRT7: (Pelosi, Pasini et al. 2002, Rekhtman, Ang et al. 2011, Whithaus, Fukuoka et al. 2012); and/or
hsa-miR9: (White, Neiman et al. 2013)
TP53 mutations and/or TP53 translocations: (Kishimoto, Murakami et al. 1992)
HMGA1: (Hillion, Wood et al. 2009) and/or
For example, the following one or more markers can be used to classify lung cancer into the subtype small cell lung cancer (SCLC): TP53 mutations in combination with mutations in RB1: (Sutherland, Proost et al. 2011). Mutations in RB1 may refer to mutations in the tumor suppressor gene Retinoblastioma, RB1. The protein is a negative regulator of cell cylce.
The invention also provides a computer program product comprising one or more computer readable media having computer executable instructions for performing the steps of one of the aforementioned methods.
The present invention relates to a method of treating a subject, said method comprising
a) selecting a subject that is assessed to suffer from cancer or is assessed to be prone to suffering from cancer according to the herein provided statistical method;
b) administering to said cancer patient an effective amount of an anti-cancer agent and/or radiation therapy.
Preferably, the gene mutations can be used to distinguish patients' response to EGFR therapy as mentioned above.
The invention also provides an anti-cancer agent and/or radiation therapy for use in the treatment of a subject, wherein the subject is assessed to suffer from cancer or is assessed to be prone to suffering from cancer according to any of the statistical methods mentioned above. Preferably, the subject/patient is a human subject/patient. In other words, the invention provides an anti-cancer agent and/or radiation therapy, said agent or radiation therapy being selected on basis of the patient group determined by the statistical method provided herein.
For example, conventional chemotherapy (like cisplatin based protocols), radiotherapy (like conventional radiotherapy or radiosurgery), and/or more modern approaches employing tyrosine kinase inhibitors (TKIs), such as gefitinib, erlotinib and/or monoclonal antibodies directed against activating mutations of the tumor (ERGF, ALK or ROS1 mutations) can be used.
If the subject is assessed to suffer from non-small cell lung cancer (NSCLC) or is assessed to be prone to suffering from non-small cell lung cancer (NSCLC) according to any of the statistical methods mentioned above, the following treatment options can be used:
The treatment options for NSCLC are, for example, based on the stage of the disease. Standard treatments include surgery, platinum-based chemotherapy, radiotherapy, combined chemoradiotherapy and/or targeted therapy. The choice of the course of treatment can depend on the stage of the disease, its spread to the surrounding tissues, patient's overall medical condition, and/or especially the patient's pulmonary reserve.
If the subtype of NSCLC (like NSCLC stage I, II or III tumors/cancers) is, for example, adenocarcinoma, squamous cell carcinoma or large cell carcinoma, the following treatment options are conceivable:
For Stage I tumors, surgery is the most consistent and successful treatment for lung cancer patients. Tumors can be removed by lobectomy, segmental, wedge or sleeve resections or pneumectomy as found appropriate (Molina, Yang et al. 2008, Schuchert, Abbas et al. 2010, 2011, Cagle and Chirieac 2012). Five-year survival rate ranges between 40-67% favoring T1N0 or earlier (Martini, Bains et al. 1995). In the patients with potentially resectable tumors but who are unfit for surgery due to an unacceptably high perioperative risk or for patients with inoperable Stage I tumors, primary radiosurgery or conventional radiation therapy is suggested (Dosoretz, Katin et al. 1992, Gauden, Ramsay et al. 1995). Unfortunately, many patients develop local recurrent or second primary tumors after surgical resection. To prevent this, adjuvant chemo or radiation therapy following surgery is recommended pending on the stage prior to surgery (Martini, Bains et al. 1995).
Stage II cancers are routinely treated with surgical resections, however, prognosis is worse than that of Stage I cancers and the 5-year survival rate varies from 25-55% (Martini, Burt et al. 1992). However, patient survival is lower for squamous cell lung cancer. In some cases, neoadjuvant chemotherapy, i.e. preoperative chemotherapy is proposed to be beneficial to reduce tumor size to facilitate surgical resection and eliminate early micrometastases (Burdett, Stewart et al. 2007). In addition, post-operative adjuvant chemotherapy, for instance with cisplatin, may significantly improve prognosis and prevent local recurrences. For inoperable tumors or patients unfit for surgery, radiation therapy is recommended (Pignon, Tribodet et al. 2008).
Stage III NSCLC includes both locally and regionally advanced disease. For resectable NSCLC, surgery to remove the complete tumor and the surrounding lymph nodes is recommended, followed by post-operative chemotherapy. Further, neoadjuvant chemotherapy to shrink the tumor and eradicate micrometastases, thus facilitating surgery, is also an approach of choice (Burdett, Stewart et al. 2007). Further, similar to Stage II, patients are shown to benefit with adjuvant chemotherapy using cisplatin. For unresectable Stage III NSCLC, radiation therapy or a concurrent or sequential combination of chemo- with radiation therapy is recommended (Furuse, Fukuoka et al. 1999).
If the subtype of NSCLC (like NSCLC stage IV tumors/cancers) is, for example, metastatic NSCLC (such as forms of all NSCLC classes/subtypes, like metastatic adenocarcinoma), adenocarcinoma, squamous cell carcinoma or large cell carcinoma the following treatment options are conceivable:
For patients with metastatic NSCLC (Stage IV), treatment is usually aimed to prolong survival and for palliation of disease related symptoms. Standard treatment options include cytotoxic chemotherapy and targeted agents. However, treatment is selected based on comorbidity, performance status, histology, and molecular genetic features of the cancer. First line cytotoxic combination chemotherapy includes a combination of platinum-based chemotherapy (cisplatin or carboplatin) and paclitaxel, gemcitabine, docetaxel, vinorelbine, irinotecan, or pemetrexed (Le Chevalier, Arriagada et al. 1992, Wozniak, Crowley et al. 1998, Mok, Wu et al. 2009). Following the initial response to chemotherapy, maintenance chemotherapy using the initial combination of drugs, or continuing single-agent chemotherapy, or using a new ‘maintenance’ agent is evaluated. (Brodowicz, Krzakowski et al. 2006, Park, Kim et al. 2007, Paz-Ares, de Marinis et al. 2012). Further, based on the molecular analysis of the cancer, patients may benefit from single-agent EGFR tyrosine kinase inhibitors or EML4-ALK inhibitors, as first line treatment (if driver mutations have been encountered) or, even in absence of driver mutations, as second or third line treatment.
If the subtype of NSCLC is, for example, adenocarcinoma, the following treatment options are conceivable:
Among the currently used combinations, definite recommendations regarding drug dose, schedule or combination cannot be made. However, the exception for this is pemetrexed for lung adenocarcinoma (Scagliotti, Parikh et al. 2008). Adenocarcinoma patients, especially adenocarcinoma in never smokers/never smoker patients, benefit from using EGFR tyrosine kinase inhibitors, such as gefitinib (Mok, Wu et al. 2009).
If the subtype of NSCLC is, for example, sqamous cell carcinoma, the following treatment options are conceivable:
In contrast, in patients with squamous cell histology (like patients with squamous cell carcinoma), patient response is significantly better using a combination of cisplatin and gemcitabine versus cisplatin and pemetrexed (Scagliotti, Parikh et al. 2008).
Lastly, for patients with Stage IV NSCLC, palliative radiotherapy may be used to control vocal cord paralysis, hemoptysis, obstructive symptoms or pain related to bone metastases. Surgical intervention may also be recommended for patients with bronchial obstructions.
Standard treatment for recurrent drug resistant NSCLC includes palliative radiation therapy (Sundstrom, Bremnes et al. 2004) and/or combination chemotherapy, for patients who have previously received platinum based chemotherapy. Chemotherapy combinations include Docetaxel, Pemetrexed, Erlotinib after failure of both platinum-based and docetaxel chemotherapies, Gefitinib, Crizotinib for EML4-ALK translocations, EGFR inhibitors in patients with or without EGFR mutations, EML4-ALK inhibitors in patients with EML-ALK translocations (Hanna, Shepherd et al. 2004, Kim, Hirsh et al. 2008, Kwak, Bang et al. 2010, Shaw, Yeap et al. 2011).
If the subtype of NSCLC is, for example, large cell lung cancer/large cell carcinoma, the treatment plan depends on the stage and no definite recommendations can be made beforehand. For example, conventional therapy, like chemotherapy/radiotherapy as disclosed herein, can be contemplated.
If the subtype of lung cancer is, for example, small-cell lung cancer (SCLC), the following treatment options are conceivable:
For treatment purposes, small-cell lung cancer (SCLC) is usually staged as either limited or extensive disease. Limited stage SCLC means that the cancer is only on one side of the chest and includes the lobes and/or lymph nodes on the same side. The tumors are often confined to a small area and can be targeted by a single radiation field. On the other hand, extensive stage represents cancers that have spread to both sides of the chest and may include distant metastases to other organs.
Chemotherapy is the mainstay of treatment of SCLC. For limited stage disease, combined modality of chemotherapy and thoracic radiation therapy, called concurrent chemoradiation, is the most widely used treatment. Active drugs usually include a combination of platinum and etoposide. Based on the patient's health status, radiation therapy may not be recommended and in this case, the patients are treated with chemotherapy alone (Pignon, Arriagada et al. 1992, Warde and Payne 1992, Murray, Coy et al. 1993). Surgical resection for SCLC is limited to management of cases with very limited disease, i.e. small tumors pathologically confined to the lobe of origin. Surgery is generally followed by adjuvant chemotherapy (Osterlind, Hansen et al. 1985, Prasad, Naylor et al. 1989, Smit, Groen et al. 1994).
For patients with extensive stage disease, combination chemotherapy, including platinum and etoposide in doses that the least toxic effects is recommended (Okamoto, Watanabe et al. 2007). Further, radiation therapy to the site of distant metastases is also a standard treatment option for patients. This is especially preferred for metastases that are unlikely to be immediately palliated by chemotherapy, such as the brain and bone (Slotman, Faivre-Finn et al. 2007).
Response rates to chemotherapy are high for SCLC, up to 85-95% in limited disease and 75-80% in extensive disease. However, median survival still remains low, i.e. 14-20 months for limited disease and only 7-10 months for extensive disease. Long term survival is only seen in 5-10% of the patients. (Hoffman, Mauer et al. 2000).
In accordance with the present invention the methods, in particular the statistical methods, may comprise the use of FOXA2 Em isoform and/or ID2 Em isoform.
For example, the herein provided statistical method of assessing whether a subject suffers from cancer or is prone to suffering from cancer, may (further) comprise the step of
performing at least one statistical algorithm for classification and for regression on measurement data of the subject,
wherein the measurement data of the subject comprises at least one of the following: a value of FOXA2 Em isoform in at least one sample taken from the subject, a value ID2 Em isoform in said at least one sample, a value of FOXA2 Ad isoform in said at least one sample, ID2 Ad isoform in said at least one sample; and
wherein at least one of the following is used as at least one classifier or a component of at least one classifier in the statistical method: FOXA2 Em isoform, ID2 Em isoform, FOXA2 Ad isoform, ID2 Ad isoform, ratio of FOXA2 Em isoform/FOXA2 Ad isoform, ratio of ID2 Em isoform/ID2 Ad isoform.
The term “specific transcription factor Em isoform” according to the present application may relate to FOXA2 (Uniprot-ID: Q9Y261; Gene-ID: 3170) and/or ID2 (Uniprot-ID: Q02363; Gene-ID:3398). If, for example, the amount of a specific transcription factor is measured on mRNA level, the specific transcription factor can be mRNA molecules (or transcript or splice variants). In this context, the transcription factors can be defined as
In a certain aspect, the value of the FOXA2 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the FOXA2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 3 or the FOXA2 Em isoform comprising nucleic acid sequence with up to 68 additions, deletions or substitutions of SEQ ID NO: 3.
In a certain aspect, the value of the ID2 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the ID2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 4 or the ID2 Em isoform comprising nucleic acid sequence with up to 34 additions, deletions or substitutions of SEQ ID NO: 4.
In a certain aspect, the value of the FOXA2 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the FOXA2 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 7 or FOXA2 Ad isoform comprising the nucleic acid sequence with up to 74 additions, deletions or substitutions of SEQ ID NO: 7.
In a certain aspect, the value of the ID2 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the ID2 Ad isoform consisting of the nucleic acid sequence of SEQ ID No: 8 or ID2 Ad isoform consisting of nucleic acid sequence with up to 30 additions, deletions or substitutions of SEQ ID NO: 8.
In a certain aspect, the FOXA2 Em isoform of said sample is set in relation to a FOXA2 Em isoform of at least one control sample and then used as a classifier in the statistical method; and
said value of the FOXA2 Em isoform of said at least one control sample is obtained by measuring in said sample of said subject the amount of a specific transcription factor isoform, wherein said specific transcription isoform is the FOXA2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 3 or the FOXA2 Em isoform comprising nucleic acid sequence with up to 68 additions, deletions or substitutions of SEQ ID NO: 3.
In a certain aspect, the FOXA2 Ad isoform of said sample is set in relation to a FOXA2 Ad isoform of at least one control sample and then used as a classifier in the statistical method; and
said value of the FOXA2 Ad isoform of said at least one control sample is obtained by measuring in said sample of said subject the amount of a specific transcription factor isoform, wherein said specific transcription isoform is the FOXA2 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 7 or FOXA2 Ad isoform comprising the nucleic acid sequence with up to 74 additions, deletions or substitutions of SEQ ID NO: 7.
In a certain aspect, the ID2 Em isoform of said sample is set in relation to a ID2 Em isoform of at least one control sample and then used as a classifier in the statistical method; and
said value of the ID2 Em isoform of said at least one control sample is obtained by measuring in said sample of said subject the amount of a specific transcription factor isoform, wherein said specific transcription isoform is the ID2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 4 or the ID2 Em isoform comprising nucleic acid sequence with up to 34 additions, deletions or substitutions of SEQ ID NO: 4.
In a certain aspect, the ID2 Ad isoform of said sample is set in relation to a ID2 Ad isoform of at least one control sample and then used as a classifier in the statistical method; and
said value of the ID2 Ad isoform of said at least one control sample is obtained by measuring in said sample of said subject the amount of a specific transcription factor isoform,
wherein said specific transcription isoform is the ID2 Ad isoform consisting of the nucleic acid sequence of SEQ ID No: 8 or ID2 Ad isoform consisting of nucleic acid sequence with up to 30 additions, deletions or substitutions of SEQ ID NO: 8.
In certain aspects, a ratio of the FOXA2 Em isoform and the FOXA2 Ad isoform and a ratio of the ID2 Em isoform and the ID2 Ad isoform are used as a classifier.
The present invention also contemplates the use of obtaining the value of a transcription factor isoform in a sample e.g. by measuring the amount of a transcription factor isoform on the protein level.
If, for example, the amount of a specific transcription factor is measured on protein level, the specific transcription factor can be protein molecules. For example, they can be defined as
In a certain aspect, the value of the FOXA2 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the FOXA2 Em isoform comprising the polypeptide sequence of SEQ ID No: 52 or the FOXA2 Em isoform comprising polypeptide sequence with up to 43 additions, deletions or substitutions of SEQ ID NO: 52.
In a certain aspect, the value of the ID2 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the ID2 Em isoform comprising the polypeptide sequence of SEQ ID No: 53 or the ID2 Em isoform comprising polypeptide sequence with up to 13 additions, deletions or substitutions of SEQ ID NO: 53.
In a certain aspect, the value of the FOXA2 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the FOXA2 Ad isoform comprising the polypeptide sequence of SEQ ID No: 56 or FOXA2 Ad isoform comprising the polypeptide sequence with up to 43 additions, deletions or substitutions of SEQ ID NO: 56.
In a certain aspect, the value of the ID2 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the ID2 Ad isoform consisting of the polypeptide sequence of SEQ ID No: 57 or ID2 Ad isoform consisting of polypeptide sequence with up to 13 additions, deletions or substitutions of SEQ ID NO: 57.
If, for example, the amount of a specific transcription factor is measured on protein level, the specific transcription factors can be proteins molecules. For example, they can be defined as
In a certain aspect, the value of the GATA6 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the GATA6 Em isoform comprising the polypeptide sequence of SEQ ID No: 50 or the GATA6 Em isoform comprising the polypeptide sequence with up to 30 additions, deletions or substitutions of SEQ ID NO: 50
In a certain aspect, the value of the NKX2-1 Em isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the NKX2-1 Em isoform comprising the polypeptide sequence of SEQ ID No: 51 or the NKX2-1 Em isoform comprising the polypeptide sequence with up to 14 additions, deletions or substitutions of SEQ ID NO: 51
In a certain aspect, the value of the GATA6 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the GATA6 Ad isoform comprising the polypeptide sequence of SEQ ID No: 54 or the GATA6 Ad isoform polypeptide sequence with up to 23 additions, deletions or substitutions of SEQ ID NO: 54
In a certain aspect, the value of the NKX2-1 Ad isoform in said at least one sample is obtained by measuring the amount of a specific transcription factor isoform in said at least one sample of said subject, wherein said specific transcription isoform is the NKX2-1 Ad isoform comprising the polypeptide sequence of SEQ ID No: 55 or the NKX2-1 Ad isoform comprising the polypeptide sequence with up to 15 additions, deletions or substitutions of SEQ ID NO: 55.
Genes can contain single nucleotide polymorphisms (SNPs). The specific transcription factor Em isoform sequences of the present invention encompass (genetic) variants thereof, for example, variants having SNPs. Without deferring from the gist of the present invention, all naturally occurring sequences of the respective isoform independent of the number and nature of the SNPs in said sequence can be used herein. To relate to currently known SNPs, the transcription factor Em isoforms of the present invention are defined such that they contain up to 55 (in the case of GATA6), up to 39 (in the case of NKX2-1), up to 68 (in the case of FOXA2) or up to 34 (in the case of ID2) additions, deletions or substitutions of the nucleic acid sequences defined by SEQ ID NOs: 1, 2, 3 and 4, respectively. Thus, respective Em transcripts of carriers of different nucleotides at the respective SNPs are covered by the present application.
The FOXA2 Em isoform according to the invention is the FOXA2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 3 or the FOXA2 Em isoform comprising a nucleic acid sequence with up to 68; preferably up to 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53 52, 51, 50, 49, 48 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21 or 20; even more preferably up to 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7. 6, 5, 4, 3 or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 3. The FOXA2 Em isoform can also be defined as the FOXA2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 3 or the FOXA2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 3 with additions, deletions or substitutions at any of positions 168; 208; 289; 361; 368; 374; 379; 383; 404; 459; 481; 483; 494; 529; 564; 577; 584; 590; 610; 623; 641; 650; 659; 674; 773; 845; 1040; 1075; 1186; 1188; 1240; 1242; 1243; 1304; 1374; 1391; 1408; 1414; 1432; 1458; 1475; 1487; 1522; 1539; 1582; 1583; 1594; 1627; 1631; 1687; 1723; 1737; 1738; 1754; 1812; 1831; 1838; 1940; 1966; 1970; 2070; 2083; 2084; 2093; 2105; 2112; 2200 and 2388. The FOXA2 Em isoform according to the invention can also be defined as the FOXA2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 3 or the FOXA2 Em isoform comprising a nucleic acid sequence with at least 93% homology to SEQ ID No: 3, preferably up to 94%, 95%, 96%, 97% or 98% homology to SEQ ID No: 3; even more preferably up to 99% homology to SEQ ID No: 3.
The ID2 Em isoform according to the invention is the ID2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 4 or the ID2 Em isoform comprising a nucleic acid sequence with up to 34; preferably up to 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, or 10; even more preferably up to 9, 8, 7, 6, 5, 4, 3, or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 4. The ID2 Em isoform can also be defined as the ID2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 4 or the ID2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 4 with additions, deletions or substitutions at any of positions 6; 43; 53; 55; 154; 195; 209; 224; 237; 263; 286; 360; 399; 405; 485; 501; 544; 547; 605; 662; 665; 716; 757; 871; 876; 975; 1085; 1115; 1119; 1149; 1151; 1251; 1333 and 1350. The ID2 Em isoform according to the invention can also be defined as the ID2 Em isoform comprising the nucleic acid sequence of SEQ ID No: 4 or the ID2 Em isoform comprising a nucleic acid sequence with at least 51% homology to SEQ ID No: 4, preferably up to 55%, 60%, 65%, 70%, 75%, 80%, 85% or 90% homology to SEQ ID No: 4; even more preferably up to 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% homology to SEQ ID No: 4.
Preferably, the above referred “addition(s), deletion(s) or substitution(s)” of the transcription factor isoforms are substitutions.
The person skilled in the art understands that a subject which is prone to suffering from cancer is a subject which has an increased likelihood of developing cancer within the next 30 years or preferably within the next 20 or 10 years or even more preferably within the next 9, 8, 7, 6, 5, 4, 3 or 2 years or even furthermore preferably within the next year. An increased likelihood of a subject of developing cancer can be understood as that said subject has an increased likelihood of developing cancer within a given time period as if compared to the average likelihood that a subject of the same age or a subject of the same age and the same gender develops cancer.
The term “sample” according to the present invention relates to any kind of sample which can be obtained from a subject, preferably from a human subject. The sample is a biological sample. A sample according to the present invention can be for example, but is not limited to, a blood sample, a breath condensate sample, a bronchoalveolar lavage fluid sample, a mucus sample or a phlegm sample. Preferably, the sample according to the present invention is a biopsy, a blood sample or a breath condensate sample. More preferably, the sample according to the present invention is a biopsy or a breath condensate sample. Particularly preferred is (a) (a) breath condensate sample(s).
The term “breath condensate sample” as used herein refers to an “exhaled breath condensate (sample)”. The term “exhaled breath condensate (sample)” can be abbreviated as “EBC”. Accordingly, the terms “breath condensate sample”, “exhaled breath condensate”, “exhaled breath condensate sample” and “EBC” are used interchangeably herein. The use of “breath condensate sample”, in particular “exhaled breath condensate (sample)” allows the non-invasive obtaining of samples from a subject/patient and is therefore advantageous.
The herein provided diagnostic method can lead to fast medical intervention for example by means of corresponding anti-cancer therapy, like anti-cancer medication or radiation therapy. Early stage anti-cancer therapies include, but are not limited to, radiation therapy, such as external radiation therapy, photodynamic therapy (PDT) using an endoscope and surgery (i.e. wedge resection or segmental resection for carcinoma in situ and sleeve resection or lobectomy for StageI). In addition, chemotherapy is used alone or after surgery. The chemotherapy drugs may, inter alia, comprise compounds selected from the group consisting of Cisplatin, Carboplatin, Paclitaxel (Taxol®), Albumin-bound paclitaxel (nab-paclitaxel, Abraxane®), Docetaxel (Taxotere®), Gemcitabine (Gemzar®), Vinorelbine (Navelbine®), Irinotecan (Camptosar®, CPT-11), Etoposide (VP-16®), Vinblastine and Pemetrexed (Alimta®).
The herein provided methods are primarily useful in the assessment whether a subject suffers from cancer or is prone to suffering from cancer before the subject undergoes therapeutic intervention. In other words, the sample of the subject is obtained from the subject and analyzed prior to therapeutic intervention, like conventional chemotherapy. If the subject is assessed “positive” in accordance with the present invention, i.e. assessed to suffer from cancer or prone to suffering from cancer, the appropriate therapy/therapeutic intervention can be chosen. For example, a subject may be suspected of suffering from cancer and the present methods can be used to assess whether the subject suffers indeed from said cancer in addition or in the alternative to conventional diagnostic methods.
Following positive diagnosis with the herein provided inventive method, the diagnosis may be elucidated/further verified with low-dose helical computed tomography and/or Chest X-Ray, by bronchoscopy and/or histological assessment. In early stage or Grade I tumors, surgery to to remove the lobe or the section of the lung that contains the tumor would be the first choice of treatment. It is feasible to supplement the surgery with chemotherapy, known as ‘adjuvant chemotherapy’, to prevent cancer relapse (Howington J A et al. (2013) CHEST Journal 143: e278S-e313S). At later stages, surgery is no longer feasible and a combination of chemotherapy and radiation are advised. Further, for metastatic lesions, chemotherapy and radiation are suggested, mainly for palliation of the symptoms.
The term “isoform” according to the present invention encompasses transcript variants (which are mRNA molecules) as well as the corresponding polypeptide variants (which are polypeptides) of a gene. Such transcription variants result, for example, from alternative splicing or from a shifted transcription initiation. Based on the different transcript variants, different polypeptides are generated. It is possible that different transcript variants have different translation initiation sites. A person skilled in the art will appreciate that the amount of an isoform can be measured by adequate techniques for the quantification of mRNA as far as the isoform relates to a transcript variant which is an mRNA. Examples of such techniques are polymerase chain reaction-based methods, in situ hybridization-based methods, microarray-based techniques and whole transcriptome shotgun sequencing. Further, a person skilled in the art will appreciate that the amount of an isoform can be measured by adequate techniques for the quantification of polypeptides as far as the isoform relates to a polypeptide. Non-limiting examples of such techniques for the quantification of polypeptides are ELISA (Enzyme-linked Immunosorbent Assay)-based, gel-based, blot-based, mass spectrometry-based, and flow cytometry-based methods.
Genes can contain single nucleotide polymorphisms (SNPs). The specific transcription factor Em isoform sequences of the present invention encompass (genetic) variants thereof, for example, variants having SNPs. Without deferring from the gist of the present invention, all naturally occurring sequences of the respective isoform independent of the number and nature of the SNPs in said sequence can be used herein. To relate to currently known SNPs, the transcription factor Em isoforms of the present invention are defined such that they contain up to 55 (in the case of GATA6), up to 39 (in the case of NKX2-1), additions, deletions or substitutions of the nucleic acid sequences defined by SEQ ID NOs: 1 and 2 respectively. Thus, respective Em transcripts of carriers of different nucleotides at the respective SNPs are covered by the present application.
The GATA6 Em isoform according to the invention is the GATA6 Em isoform comprising the nucleic acid sequence of SEQ ID No: 1 or the GATA6 Em isoform comprising a nucleic acid sequence with up to 55; preferably up to 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21 or 20; even more preferably up to 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7. 6, 5, 4, 3 or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 1. The GATA6 Em isoform can also be defined as the GATA6 Em isoform comprising the nucleic acid sequence of SEQ ID No: 1 or the GATA6 Em isoform comprising the nucleic acid sequence of SEQ ID No: 1 with additions, deletions or substitutions at any of positions 163; 293; 320; 327; 339; 430; 462; 480; 759; 1128; 1256; 1304; 1589; 1597; 1627; 1651; 1652; 1803; 1844; 1849; 1879; 1882; 1911; 1940; 1949; 1982; 2000; 2002; 2008; 2026; 2031; 2106; 2137; 2142; 2163; 2294; 2390; 2391; 2627; 2691; 3036; 3102; 3240; 3265; 3266; 3290; 3358; 3366; 3578; 3632; 3646; 3670; 3690; 3708 and 3735. The GATA6 Em isoform according to the invention can also be defined as the GATA6 Em isoform comprising the nucleic acid sequence of SEQ ID No: 1 or the GATA6 Em isoform comprising a nucleic acid sequence with at least 85% homology to SEQ ID No: 1, preferably up to 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% or 98% homology to SEQ ID No: 1; even more preferably up to 99% homology to SEQ ID No: 1.
The NKX2-1 Em isoform according to the invention is the NKX2-1 Em isoform comprising the nucleic acid sequence of SEQ ID No: 2 or the NKX2-1 Em isoform comprising a nucleic acid sequence with up to 39; preferably up to 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, or 10; even more preferably up to 9, 8, 7, 6, 5, 4, 3, or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 2. The NKX2-1 Em isoform can also be defined as the NKX2-1 Em isoform comprising the nucleic acid sequence of SEQ ID No: 2 or the NKX2-1 Em isoform comprising the nucleic acid sequence of SEQ ID No: 2 with additions, deletions or substitutions at any of positions 269; 281; 305; 304; 420; 425; 439; 441; 450; 486; 781; 785; 825; 950; 1169; 1305; 1344; 1448; 1458; 1467; 1489; 1552; 1633; 1634; 1640; 1641; 1643; 1667; 1673; 1678; 1748; 1750; 1831; 1893; 1916; 1917; 1934; 2099 and 2319. The NKX2-1 Em isoform according to the invention can also be defined as the NKX2-1 Em isoform comprising the nucleic acid sequence of SEQ ID No: 2 or the NKX2-1 Em isoform comprising a nucleic acid sequence with at least 90% homology to SEQ ID No: 2, preferably up to 91%, 92%, 93%, 94%, 95%, 96%, 97% or 98% homology to SEQ ID No: 2; even more preferably up to 99% homology to SEQ ID No: 2.
Preferably, the above referred “addition(s), deletion(s) or substitution(s)” of the transcription factor isoforms are substitutions.
Tables 1, 2, 3, 4, 5, 6, 7 and 8 below provide information on different SNPs of the transcription factors of the present invention. The present invention relates to the respective isoforms independently from the various SNPs which may occur at the different positions of the mRNAs or polypeptides. The SNPs of tables 1, 2, 3, 4, 5, 6, 7 and 8 may occur in the isoforms of the present invention in any combination. For example, a (genetic) variant of the GATA6 Em isoform to be used herein may comprise a nucleic acid sequence of SEQ ID NO:1, whereby the “G” residue at position 293 of SEQ ID NO:1 is substituted by “A”. Further variants of the isoforms to be used herein are apparent from Tables 1 to 8 to the person skilled in the art. The respective SNP information has been retrieved using dbSNP (short genetic variations) database of the NCBI. The SNP information is based on Contig Label GRCh37.p5. A person skilled in the art will understand that also SNPs which are not mentioned in tables 1 to 8 are encompassed by the present invention.
A control sample according to the present invention is a sample from a healthy control subject. Such a sample can be obtained for example from a subject known to be a healthy subject. It is also possible to generate a control sample according to the present invention as a mixture of samples obtained from several healthy subjects, for example from a group of 10, 20, 30, 50, 100 or even up to 1000 healthy subjects. A control sample according to the present invention can be generated for example from age-matched and or gender-matched healthy control subjects. A control sample according to the present invention can also be generated for example in vitro to mimic a control sample obtained from one or several healthy subjects.
Control samples can, inter alia, be healthy tissues (i.e. biopsies) from diseased individuals/subjects. “Healthy tissue from diseased individuals/subjects” can refer to tissue that is pathologically classified as “normal” or “healthy” and/or that is distant or adjacent to a (suspected) tumor. For example, the “healthy tissue from diseased individuals/subjects” can be obtained e.g. by biopsy from adjacent healthy tissue of (suspected) cancer patients.
For example, the “healthy tissue” can be obtained from the subject(s) to be assessed in accordance with the present invention for suffering from cancer or being prone to suffering from cancer. In another example, the “healthy tissue” can be obtained from other diseased patients (e.g. patients that have already been diagnosed to suffer from cancer by conventional means and methods or patients that have a history of cancer); in that case, “healthy tissue” is not obtained from subject(s) to be assessed in accordance with the present invention for suffering from cancer or being prone to suffering from cancer.
Thus, also “healthy tissue from (a) diseased individual(s)” can be used as a control sample in accordance with the present invention.
Control samples can, inter alia, be EBCs from healthy individuals. The term “healthy individuals” as used herein can refer to individuals with no history of cancer, i.e. individuals that did not suffer from cancer or that do currently (i.e. at the time the control sample is obtained) not suffer from cancer. Thus, “healthy tissue/sample” (i.e. tissue (e.g. a biopsy) or another sample (e.g. EBC) obtained from a healthy individual” can be used as a control sample in accordance with the present invention.
A subject according to the present invention is preferably a human subject. The subject according to the present invention can be a human subject which has an increased likelihood of suffering from cancer. Such an increased likelihood of suffering from cancer can for example result from certain exposures to cancerogens, for example through the habit of smoking.
The “amount of said specific transcription isoform” according to the present invention can be a relative amount or an absolute amount. The relative amount can be determined relative to a control sample. To determine the “amount of said specific transcription isoform”, the absolute or relative amount of a reference gene or reference protein can be determined in the sample from the subject and in the control sample. Non-limiting examples of reference genes/proteins are TUBA1A1 (Uniprot-ID: Q71U36, Gene-ID: 7846), HPRT1 (Uniprot-ID: P00492, Gene-ID: 3251), ACTB (Uniprot-ID: P60709, Gene-ID: 60), HMBS (Uniprot-ID: P08397, Gene-ID: 3145), RPL13A (Uniprot-ID: Q9BSQ6, Gene-ID: 23521) and UBE2A (Uniprot-ID: P49459, Gene-ID: 7319).
The herein provided method can be used to stratify/assess subjects according to the tumor/cancer grade. It can be helpful to assess whether a patient is suffering from Grade I, Grade II or Grade III tumor/cancer in order to decide which therapeutic intervention is warranted.
The definition of Grade I, Grade II and Grade III tumor is based on TNM classification recommended by the American Joint Committee on Cancer (Goldstraw P. et al. (2007) J Thorac Oncol. 2(8):706-14; Beadsmoore C J and Screaton N J (2003) Eur J Radiol. 45(1):8-17; Mountain C F (1997) Chest. 111(6):1710-7.), which is incorporated herein by reference.
Herein, lung cancer is preferred, in particular non-small cell lung cancer or small cell lung cancer. Particularly preferred is non-small cell lung cancer.
It is known by the person skilled in the art that genes can contain single nucleotide polymorphisms. The specific transcription factor Em isoform sequences of the present invention encompass all naturally occurring sequences of the respective isoform independent of the number and nature of the SNPs in said sequence. To relate to currently known SNPs, the specific transcription factor Ad isoform sequences of the present invention are defined such that they contain up to 55 (in the case of GATA6) or up to 38 (in the case of NKX2-1), up to 74 (in the case of FOXA2) or up to 30 (in the case of ID2) additions, deletions or substitutions of the nucleic acid sequences defined by SEQ ID NOs: 5, 6, 7 and 8, respectively, to also cover the respective Ad transcripts of carriers of different nucleotides at the respective SNPs. The SNPs of tables 2, 4, 6 and 8 may occur in the Ad isoforms of the present invention in any combination. For example, a (genetic) variant of the GATA6 Ad isoform to be used herein may comprise a nucleic acid sequence of SEQ ID NO:5, whereby the “C” residue at position 694 of SEQ ID NO:5 is substituted by “T”. Further variants of the isoforms to be used herein are apparent from Tables 1 to 8 to the person skilled in the art.
The GATA6 Ad isoform according to the invention is the GATA6 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 5 or the GATA6 Ad isoform comprising a nucleic acid sequence with up to 55; preferably up to 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21 or 20; even more preferably up to 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7. 6, 5, 4, 3 or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 5. The GATA6 Ad isoform can also be defined as the GATA6 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 5 or the GATA6 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 5 with additions, deletions or substitutions at any of positions 138; 228; 255; 262; 274; 365; 397; 415; 694; 1063; 1191; 1239; 1524; 1532; 1562; 1586; 1587; 1738; 1779; 1784; 1814; 1817; 1846; 1875; 1884; 1917; 1935; 1937; 1943; 1961; 1966; 2041; 2072; 2077; 2098; 2229; 2325; 2326; 2562; 2626; 2971; 3037; 3175; 3200; 3201; 3225; 3293; 3301; 3513; 3567; 3581; 3605; 3625; 3643 or 3670. The GATA6 Ad isoform according to the invention can also be defined as the GATA6 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 5 or the GATA6 Ad isoform comprising a nucleic acid sequence with at least 85% homology to SEQ ID No: 5, preferably up to 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% or 98% homology to SEQ ID No: 5; even more preferably up to 99% homology to SEQ ID No: 5.
The NKX2-1 Ad isoform according to the invention is the NKX2-1 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 6 or the NKX2-1 Ad isoform comprising a nucleic acid sequence with up to 38; preferably up to 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, or 10; even more preferably up to 9, 8, 7, 6, 5, 4, 3, or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 6. The NKX2-1 Ad isoform can also be defined as the NKX2-1 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 6 or the Nkx2-1 isoform Ad comprising the nucleic acid sequence of SEQ ID NO: 6 with additions, deletions or substitutions at any of positions 12; 125; 265; 270; 284; 286; 295; 331; 626; 630; 670; 795; 1014; 1150; 1189; 1293; 1303; 1312; 1334; 1397; 1478; 1479; 1478; 1485; 1486; 1488; 1512; 1518; 1523; 1593; 1595; 1676; 1738; 1761; 1762; 1779; 1944 or 2164. The NKX2-1 Ad isoform according to the invention can also be defined as the NKX2-1 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 6 or the NKX2-1 Ad isoform comprising a nucleic acid sequence with at least 90% homology to SEQ ID No: 6, preferably up to 91%, 92%, 93%, 94%, 95%, 96%, 97% or 98% homology to SEQ ID No: 6; even more preferably up to 99% homology to SEQ ID No: 6.
The FOXA2 Ad isoform according to the invention is the FOXA2 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 7 or the FOXA2 Ad isoform comprising a nucleic acid sequence with up to 74; preferably up to 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53 52, 51, 50, 49, 48 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21 or 20; even more preferably up to 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7. 6, 5, 4, 3 or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 7. The FOXA2 Ad isoform can also be defined as the FOXA2 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 7 or the FOXA2 Ad isoform comprising the nucleic acid sequence of SEQ ID NO: 7 with additions, deletions or substitutions at any of positions 5; 37; 65; 68; 70; 88; 128; 195; 276; 348; 355; 361; 366; 370; 391; 446; 468; 470; 481; 516; 551; 564; 571; 577; 597; 610; 628; 637; 646; 661; 760; 832; 1027; 1062; 1173; 1175; 1227; 1229; 1230; 1291; 1361; 1378; 1395; 1401; 1419; 1445; 1462; 1474; 1509; 1526; 1569; 1570; 1581; 1614; 1618; 1674; 1710; 1724; 1725; 1741; 1799; 1818; 1825; 1927; 1953; 1957; 2057; 2070; 2071; 2080; 2092; 2099; 2187 or 2375. The FOXA2 Ad isoform according to the invention can also be defined as the FOXA2 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 7 or the FOXA2 Ad isoform comprising a nucleic acid sequence with at least 93% homology to SEQ ID No: 7, preferably up to 92%, 93%, 94%, 95%, 96%, 97% or 98% homology to SEQ ID No: 7; even more preferably up to 99% homology to SEQ ID No: 7.
The ID2 Ad isoform according to the invention is the ID2 Ad isoform consisting the nucleic acid sequence of SEQ ID NO: 8 or the ID2 Ad isoform consisting of a nucleic acid sequence with up to 30; preferably up to 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, or 10; even more preferably up to 9, 8, 7, 6, 5, 4, 3, or 2; or even furthermore preferably only 1 addition(s), deletion(s) or substitution(s) of SEQ ID NO: 8. The ID2 Ad isoform can also be defined as the ID2 Ad isoform consisting the nucleic acid sequence of SEQ ID NO: 8 or the ID2 Ad isoform consisting the nucleic acid sequence of SEQ ID NO: 8 with additions, deletions or substitutions at any of positions 93; 134; 148; 163; 176; 202; 225; 299; 338; 344; 424; 440; 483; 486; 544; 601; 604; 655; 696; 810; 815; 914; 1024; 1054; 1058; 1088; 1090; 1190; 1272 or 1289. The ID2 Ad isoform according to the invention can also be defined as the ID2 Ad isoform comprising the nucleic acid sequence of SEQ ID No: 8 or the ID2 Ad isoform comprising a nucleic acid sequence with at least 51% homology to SEQ ID No: 8, preferably up to 55%, 60%, 65%, 70%, 75%, 80%, 85% or 90% homology to SEQ ID No: 8; even more preferably up to 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% homology to SEQ ID No: 8.
The term “cancer patient” as used herein refers to a patient that is suspected to suffer from cancer or being prone to suffer from cancer. The cancer to be treated in accordance with the present invention can be a solid cancer or a liquid cancer. Non-limiting examples of cancers which can be treated according to the present invention are lung cancer, ovarian cancer, colorectal cancer, kidney cancer, bone cancer, bone marrow cancer, bladder cancer, prostate cancer, esophagus cancer, salivary gland cancer, pancreas cancer, liver cancer, head and neck cancer, CNS (especially brain) cancer, cervix cancer, cartilage cancer, colon cancer, genitourinary cancer, gastrointestinal tract cancer, pancreas cancer, synovium cancer, testis cancer, thymus cancer, thyroid cancer and uterine cancer.
Preferably, the cancer patient according to the present invention is a patient suffering from lung cancer, such as non-small cell lung cancer (NSCLC) or small cell lung cancer (SLC). Particularly preferably, the patient suffers non-small cell lung cancer (NSCLC). Even more preferably, the cancer patient is a patient suffering from adenocarcinoma. The patient may also suffer from a squamous cell carcinoma or a large cell carcinoma. The adenocarcinoma can be a bronchoalveolar carcinoma.
The amount of the specific transcription factor isoform according to the invention can be measured for example by a polymerase chain reaction-based method, an in situ hybridization-based method, or a microarray. If the amount of the specific transcription factor isoform according to the invention is measured via a polymerase chain reaction-based method, it is preferably measured via a quantitative reverse transcriptase polymerase chain reaction.
The method of assessing whether a subject suffers from cancer or is prone to suffering from cancer according to the invention may comprise the contacting of a sample with primers, wherein said primers can be used for amplifying the respective specific transcription factor isoforms.
Primers for the polymerase chain reaction-based measurement of the amount of the specific transcription factor isoforms according to the invention may encompass the use of primers being selected from the Table 9.
The diagnostic methods can be used, for example, in combination with (i.e. subsequently prior to or simultaneously with) other diagnostic techniques, like CT (short for computer tomography) and CXR (short for chest radiograph, colloquially called chest X-ray (CXR)).
The herein provided methods for the diagnosis of a patient group and the therapy of this selected patient group is particularly useful for high risk subjects/patients or patient groups, such as those that have a hereditary history and/or are exposed to tobacco smoke, environmental smoke, cooking fumes, indoor smoky coal emissions, asbestos, some metals (e.g. nickel, arsenic and cadmium), radon (particularly amongst miners) and ionizing radiation. These subjects/patients may particularly profit from an early diagnosis and, hence, treatment of the cancer in accordance with the present invention.
A method of treating a patient according to the present invention may comprise
The present invention also provides a method of treating a patient, said method comprising
The present invention relates to a pharmaceutical composition comprising an agent for the treatment or the prevention of cancer, wherein for the patient suffering from cancer has been determined by a statistical method of the present invention and wherein the method of treatment comprises the step of determining whether or not the patient suffers from cancer. Preferably, the pharmaceutical composition according to the present invention comprises an agent for the treatment or the prevention of lung cancer, wherein for the patient lung cancer has been determined by a method of the present invention and wherein the method of treatment comprises the step of determining whether or not the patient suffers from lung cancer
For example, the pharmaceutical composition to be used herein in the treatment of patients selected according to the statistical methods provide herein can an inhibitor of
It is surprisingly found that the Em isoforms of the transcription factors of the present invention have an oncogenic potential (see Examples 4, 6 and 7). Further, it is shown that their reduction leads to the prevention of the development of tumors and allows treating cancer (see example 7). Thus, the present invention relates to inhibitors of the Em isoforms of the transcription factors GATA6, NKX2-1, FOXA2 and ID2. In particular, the present invention relates to agents that allow reducing the amount of the Em isoform of the transcription factors GATA6, NKX2-1, FOXA2 and ID2. The present invention also relates to activators of the Ad isoform of the transcription factors GATA6, NKX2-1, FOXA2 and ID2. Examples of such activators are agents, which activate the promoter of the Ad isoform of the respective transcription factors.
The inhibitors of
The person skilled in the art knows how to design siRNAs and shRNAs, which specifically target the specific transcription factor Em isoforms of the present invention. Examples of such specific siRNAs and shRNAs targeting the specific transcription factor Em isoforms of the present invention are depicted in Tables 10 and 11.
The amount of the specific transcription factor isoform according to the present invention can be determined on the polypeptide level.
The amount of the specific transcription factor isoforms according to the invention can be assessed on the polypeptide level using known quantitative methods for the assessment of polypeptide levels. For example, ELISA (Enzyme-linked Immunosorbent Assay)-based, gel-based, blot-based, mass spectrometry-based, or flow cytometry-based methods can be used for measuring the amount of the specific transcription factor isoforms on the polypeptide level according to the invention.
It is apparent to the person skilled in the art that the specific transcription factor isoforms of the present invention can show certain sequence varieties between different subjects of the same ancestry and in particular between subjects of different ancestry. Non-limiting examples of the polymorphisms of the cancer specific isoforms of the present invention are given in Tables 12 and 13.
In certain aspects, the present invention provides a kit for use in carrying out the statistical method of the present invention. The kit of the present invention may comprise primers and further reagents necessary for a qPCR analysis. The respective primers may be selected from the list in Table 9.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. It will be understood that changes and modifications may be made by those of ordinary skill within the scope and spirit of the following claims. In particular, the present invention covers further embodiments with any combination of features from different embodiments described above and below.
The invention also covers all further features shown in the figures individually although they may not have been described in the afore or following description. Also, single alternatives of the embodiments described in the figures and the description and single alternatives of features thereof can be disclaimed from the subject matter of the other aspect of the invention.
Furthermore, in the claims the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single unit may fulfill the functions of several features recited in the claims. The terms “essentially”, “about”, “approximately” and the like in connection with an attribute or a value particularly also define exactly the attribute or exactly the value, respectively. Any reference signs in the claims should not be construed as limiting the scope.
The present invention is further described by reference to the following non-limiting figures and examples. Unless otherwise indicated, established methods of recombinant gene technology were used as described, for example, in Sambrook, Russell “Molecular Cloning, A Laboratory Manual”, Cold Spring Harbor Laboratory, N.Y. (2001)) which is incorporated herein by reference in its entirety.
The Figures show:
Total RNA isolation with the RNeasy Micro kit was compared using 200, 350, 500 and 1000 μl starting EBC volume. Data are represented as in B, n=4. (D) At least 75 ng of starting RNA is required for reliable diagnosis using EBC for isoform specific expression analysis. Different amounts of RNA (x-axis, ng) were used for cDNA synthesis by RT reaction and subsequently isoform specific expression analysis. The GATA6 (left) and NKX2-1 (right) Em/Ad ratio is plotted for both control (square) and lung cancer samples (triangle).
Amplification efficiency for each primer pair was calculated using serial dilutions of the cDNA template. Primer efficiency was assessed by plotting the cycle threshold values (Ct, y-axis) against the logarithm (base 10) of the fold dilution (log (Quantity), x-axis). Primer efficiency was calculated using the slope of the linear function. Data points represent mean Ct values of triplicates. (B) Dissociation curve analysis of the PCR products was performed by constantly monitoring the fluorescence with increasing temperatures from 60° C. to 95° C. Melt curves were generated by plotting the negative first derivative of the fluorescence (−d/dT (Fluorescence) 520 nm) versus temperature (degree Celsius, ° C.). (C) Specific PCR amplification was also demonstrated by agarose gel electrophoresis. PCR products after quantitative RT-PCR were analyzed by agarose gel electrophoresis. +, specific PCR reaction using EBC template; −, no RT control; M, 100 bp DNA ladder. (D) Sequencing of the PCR products of GATA6 Em and Ad demonstrates specific PCR amplification of both isoforms using EBC as template. Five clones for each primer pair (GATA6 Em and Ad) were sequenced and aligned to the reference sequence (top row, yellow highlighted). Sequence similarities are represented as dots.
Amplification efficiency for each primer pair was calculated using serial dilutions of the cDNA template. Primer efficiency was assessed by plotting the cycle threshold values (Ct, y-axis) against the logarithm (base 10) of the fold dilution (log (Quantity), x-axis). Primer efficiency was calculated using the slope of the linear function. Data points represent mean Ct values of triplicates. (B) Dissociation curve analysis of the PCR products was performed by constantly monitoring the fluorescence with increasing temperatures from 60° C. to 95° C. Melt curves were generated by plotting the negative first derivative of the fluorescence (−d/dT (Fluorescence) 520 nm) versus temperature (degree Celsius, ° C.). (C) Specific PCR amplification was also demonstrated by agarose gel electrophoresis. PCR products after quantitative RT-PCR were analyzed by agarose gel electrophoresis. +, specific PCR reaction using EBC template; −, no RT control; M, 100 bp DNA ladder. (D) Sequencing of the PCR products of NKX2-1 Em and Ad demonstrates specific PCR amplification of both isoforms using EBC as template. Five clones for each primer pair (NKX2-1 Em and Ad) were sequenced and aligned to the reference sequence (top row, yellow highlighted). Sequence similarities are represented as dots.
BACKGROUND: Identification of reliable biomarkers and development of non-invasive detection methods for lung cancer are critical to improve prognosis of the disease.
METHODS: RNA isolation was performed from human lung tissue and exhaled breath condensates from control donors and lung cancer patients. The Em/Ad expression ratio of GATA6 and NKX2-1 was determined by qRT-PCR. Statistical analysis using R was performed to determine the separating line for the two groups of samples and to evaluate the efficiency of our diagnostic method.
RESULTS: We show that two different mRNAs are expressed from both GATA6 and NKX2-1. The expression of both transcripts from the same gene is complementary and differentially regulated during both embryonic lung development and lung cancer. One transcript is expressed during early embryonic lung development (Em-isoform), while the second transcript is expressed in later stages and in the adult lung (Ad-isoform). We detected an enrichment of the Em-isoform in lung cancer tissues, suggesting that the detection of these transcripts could be a powerful tool for early lung cancer diagnosis. The Em- to Ad-expression ratio of both GATA6 and NKX2-1 in RNA from exhaled breath condensates can be used as a non-invasive, specific and sensitive diagnostic tool. A SVM classifier was used to combine the Em/Ad ratios of GATA6 and NKX2-1 of each EBC sample to create a more powerful tool for the diagnosis of lung cancer.
CONCLUSIONS: The SVM calculates a simple linear score, LC score, that could be used as a clinical score for lung cancer detection.
Exhaled breath condensate: Exhaled breath condensate (EBC) is a non-invasive method of sampling the airways, allowing biomarkers of airway inflammation and oxidative stress to be measured. It is collected by cooling the exhaled breath to −20° C., resulting in condensation of the aerosol particles.
Gene expression analysis: Determination of the level of messenger RNA (mRNA) transcribed from specific genes. Different techniques can be used for this type of analysis, such as quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), Northern Blot, arraybased expression analysis and, more recently, RNA sequencing. In the present manuscript we focus on qRT-PCR based expression analysis that consists of total RNA isolation, RT reaction for the synthesis of cDNA and qPCR amplification using gene specific primers.
Isoform: Different versions of mRNA from the same gene that arise by either alternative splicing or differential promoter usage.
Polymerase chain reaction: A laboratory technique used to amplify DNA sequences. Short, synthetic complementary DNA sequences called primers are used to selectively amplify the specific portion of the genome. The temperature of the sample is repeatedly raised and lowered to facilitate the copying of the target DNA sequence by a DNA-replication enzyme. Theoretically, the technique doubles the amount of target DNA molecule per cycle.
TNM staging criteria: The TNM system is one of the most widely used cancer staging systems.
It is based on the size and/or extent (reach) of the primary tumor (T), the amount of spread to nearby lymph nodes (N), and the presence of metastasis (M) or secondary tumors formed by the spread of cancer cells to other parts of the body. A number is added to each letter to indicate the size and/or extent of the primary tumor and the degree of cancer spread.
10-fold cross validation: A validation method in which the model is fitted on 90 percent of the samples and then the classification of the remaining 10 percent of the samples is predicted. The procedure is repeated 10 times such that each sample acts as a test sample once. The average error rate of all 10 parts is an estimate of the method's classification error.
We postulated that many of the mechanisms involved in embryonic development are recapitulated during LC initiation. To this end, two transcription factors that are key regulators of embryonic lung development, GATA6 (GATA Binding Factor 6) and NKX2-1 (NK2 homeobox 1, also known as Ttf-1, Thyroid transcription factor-1)7-10, and have been implicated in LC formation and metastasis11-16 were analyzed. Here we show that two different mRNAs are expressed from each the GATA6 and the NKX2-1 gene. Furthermore, the expression of both transcripts from the same gene is complementary and differentially regulated during embryonic lung development as well as in LC. One transcript is expressed in early stages of embryonic lung development (Em-isoform), whereas the second transcript is expressed in later developmental stages and in the adult lung (Ad-isoform). We detected an enrichment of the Em-isoform in LC, even at early stages, making the detection of these embryonic specific transcripts a powerful tool for cancer diagnosis. Moreover, we demonstrate that isoform specific expression analysis of GATA6 and NKX2-1 in exhaled breath condensates (EBCs) can be used as a non-invasive, specific and sensitive method for both early LC diagnosis.
The patients were studied according to protocols approved by the institutional review board and ethical committee of Regional Hospital of High Specialties of Oaxaca (HRAEO) which belongs to the Ministry of Health in Mexico (HRAEO—CIC-CEI 006/13), Union Hospital Hong Kong (EC003) and Medicine Faculty of the Justus Liebig University in Giessen, Germany (AZ.111/08-eurIPFreg). All cases were reviewed by an expert panel of pulmonologists and oncologists in the different cohorts according to the current diagnostic criteria for morphological features and immunophenotypes recommended by the International Union Against Cancer (UICC, 7th edition).
LC tissue was obtained from 63 patients who had primary lung tumors in the last five years (Table 1). Control lung tissue was taken from macroscopically healthy adjacent regions of the lung of 15 patients. Control donor lung tissue was also obtained from 19 age-matched individuals, who have had no diagnosis or family history of LC.
EBCs were also collected from 48 LC patients that were currently undergoing diagnostic evaluation for LC (Table 1). EBC collection was performed prior to transbronchial biopsy. Further, control EBC was also collected from 22 age matched control individuals with no prior history of LC or any other lung diseases. All participants provided written informed consent.
In this study we used human lung adenocarcinoma cell lines (A549; CCL-185 and A427; HTB-53) and a human bronchoalveolar carcinoma cell line (H322; CRL-5806). In addition, Mus musculus Lewis Lung cancer cell line (LLC1; CRL-1642) were used in a mouse model of experimental metastasis17, wherein 1 million LLC1 cells were injected into the tail vein of experimental mice (n=5) in 100 μl sterile phosphate buffer saline (PBS). Control mice (n=3) were injected with 100 μl sterile PBS.
Gene Expression Analysis by qRT-PCR
Total RNA was isolated from cell lines using the RNeasy Mini kit (Qiagen). Human lung tissue samples were obtained as formalin fixed paraffin embedded (FFPE) tissues, from which total RNA was isolated using the RecoverAll™ Total Nucleic Acid Isolation Kit for FFPE (Ambion).
Total RNA isolation from EBC was performed using 500 μl of sample with the RNeasy Micro Kit (Qiagen). Complementary DNA (cDNA) was synthetized using the High Capacity cDNA Reverse Transcription kit (Applied Biosystem) and quantitative real time PCR reactions were performed using SYBR® Green on the Step One plus Real-time PCR system (Applied Biosystems) using the primers specified in the Supplementary Table 2.
Log-transformed Em/Ad ratios of GATA6 and NKX2-1 were used as independent variables to predict LC. A linear kernel support vector machine (SVM)39 was used to construct a linear classifier. SVM learning was done with the default parameters, without any adjustments. We preferred SVM to linear discriminant analysis (LDA), which might be the more obvious choice for low dimensional classification tasks, because the control and the LC samples did not show a Gaussian-like distribution, which is an underlying assumption of LDA. The SVM finds a robust separating line and the distance to this line is our decision score, which we call LC score. The LC score can be conveniently calculated as
or comprising a prefactor of (−1) for illustrative purposes of
In silico analysis of GATA6 and NKX2-1 revealed a common gene structure (
To confirm that a similar increase in the expression levels of the Em-isoforms of GATA6 and NKX2-1 occurs in LC patients, we analyzed human lung tissues from control donors and LC patients (
EBC is a promising source of biomarkers for lung diseases since the condensed droplets contain a mixture of nonvolatile biomarkers such as adenosine, prostaglandins, leukotriene, cytokines, etc. and water soluble volatile biomarkers such as nitrogen oxides18-27. We optimized different steps and parameters to establish a reliable protocol for qRT-PCR based expression analysis in EBCs (
To further validate our findings, EBC based expression analysis was directly compared with LC tissues from the same patient (
While the single GATA6 or NKX2-1 isoform ratios predicted LC fairly well (
Early lung cancer diagnosis is crucial to improve patient prognosis and reduce the extremely high case-fatality-rate (95%)28. Our work demonstrated that RNA isolated from EBC can be used for qRT-PCR based isoform specific expression analysis of GATA6 and NKX2-1 to determine the Em- by Ad-expression ratio as a non-invasive, specific and sensitive method for early LC diagnosis. We have analyzed 97 human lung tissue samples and 70 EBCs from three cohorts located in different continents and detected increased Em/Ad of GATA6 and NKX2-1 in NSCLC samples independent of the ethnic group, gender and NSCLC subtype. When compared to standard expression analysis, the use of isoform ratios incorporate an additional normalization step to our diagnosis method that makes it robust and reproducible by reducing variability coming from both biological and/or technical parameters.
Although the single Em/Ad ratios of GATA6 or NKX2-1 were sufficient to detect LC (
Currently, CT and CXR are used to screen such high risk groups. CT imaging has been shown to be considerably superior to CXR in the identification of small pulmonary nodules32. However, despite the success of CT imaging for early LC diagnosis, it suffers from serious limitations, including a high detection rate of benign non calcified nodules (>90% of participants) resulting in follow-up CT scans, biopsies and frequently unnecessary resection of the benign non calcified nodules33. Routine implementation of EBC based molecular diagnosis may improve and complement the success of CT and CXR for early LC diagnosis, and especially help to distinguish between false and true positives.
Microarray based analysis of LC samples not only led to identification of gene expression profiles that are associated with NSCLC subtypes34,35, but also accurately predicted the clinical outcome36,37. Although the method proposed here did not discriminate between different NSCLC subtypes, it may be superior to previous approaches of molecular and clinical LC diagnosis due to its higher sensitivity and accuracy, straightforward and fast protocol, noninvasiveness and relative low price. However, a combination of the method proposed here with the existing clinical and molecular methods of LC diagnosis will help to safely settle a LC diagnosis at an earlier, hence curable, stage of the disease. The method of LC diagnosis proposed here could be further refined to discriminate between different NSCLC subtypes by incorporating EBC based expression analysis of known markers of the different subtypes. Furthermore, it might be combined with other markers for the detection of hyper-proliferative non-cancer related diseases as idiopathic pulmonary fibrosis (IPF) or chronic obstructive pulmonary disease (COPD). Interestingly, the current method could be extended to cancer detection in other organs utilizing the expression ratio of developmentally regulated transcript isoforms of the corresponding members of the GATA and/or NKX families of transcription factors in the respective tissue. Lastly, it could be used to monitor the response of a patient to specific treatments in order to fine-tune the therapy to improve the prognosis.
The following alternative Supplement Table 3 shows also values for the individual ratios of GATA6, NKX2-1 and the LC score, wherein the LC score has been calculated using a a prefactor of (−1) for illustrative purposes.
EBC consists of three main components (
The specificity of the different qRT-PCR products detected in the EBCs (
The classical methods for lung cancer diagnosis were directly compared with EBC based expression analysis. Pulmonary nodules were clearly identified by CXR (Supplementary
Samples were collected in three different cohorts located in different continents (America, Asia and Europe), allowing us to investigate ethnic differences. Inclusion criteria for the present study were primary lung tumor samples including lung adenocarcinoma (Grades 1, 2, 3), lung squamous cell carcinoma (Grades 1, 2, 3), large cell carcinoma and adenosquamous carcinoma (Table 1). All tumors were graded according to the Bloom-Richardson and the TNM grading system recommended by the International Union Against Cancer (UICC, 7th edition). Secondary lung tumors and lung cancer samples older than 5 years were excluded.
In accordance with the general prevalence, the majority of the samples here represented adenocarcinoma (73.0% and 54.1% for lung cancer tissue and EBC, respectively), followed by squamous cell carcinoma (14.2% and 20.8% for lung cancer tissue and EBC, respectively) (Table 1). Correlating with the disease incidence, the majority of the patients were in the age group of 50-70 years and both male and female patients were equally represented (Supplementary Table 1). Further, the majority of the patients were in the early stage of the disease (Stage I-II) and only a very small minority (6% and 8% for tissues and EBC respectively) had a recurrent disease (Supplementary Table 1).
EBC collection was performed using the RTube (Respiratory Research) as described online (http://www.respiratoryresearch.com/products-rtube-how.htm) with some modifications. As a precaution to avoid contaminants from the mouth, donors were asked to refrain from eating, drinking (except water) and smoking up to 3 hours before EBC collection and were asked to rinse their mouth with fresh water just prior to collection. All donors used a nose clamp to avoid nasal contaminants and breathing was only through the mouthpiece. EBCs were collected for 10 min for each donor and immediately stored at −80° C. in 500 μl aliquots. All steps during the collection and processing of EBCs were performed under RNase-free conditions, which is critical to ensure the integrity and high quality of the samples.
Cell lines were cultured in medium and conditions recommended by the American Type Culture Collection (ATCC). Cells were used for the preparation of RNA (QIAGEN RNeasy plus mini kit) and protein extracts.
Five to 6 weeks old C57BL6 mice were used throughout this study. Animals were housed under controlled temperature and lighting [12/12-hour light/dark cycle], fed with commercial animal feed and water ad libitum. For the mouse model of experimental metastasis, LLC1 cell suspension of 1 million cells/100 μl was prepared in sterile phosphate buffer saline (PBS). Control mice (n=3) were injected with 100 μl PBS whereas experimental mice (n=5) with 100 μl of cell suspension into the tail vein of each mouse. The development of tumors was monitored 21 days post injection. Lung tissue was harvested from each mouse separately for RNA isolation and isoform specific expression analysis.
Mouse work was performed in compliance with the German Law for Welfare of Laboratory Animals. The permission to perform the experiments presented in this study was obtained from the Regional Council (Regierungspräsidium in Darmstadt, Germany). The numbers of the permissions are V54-19c20/15-B2/345; IVMr46-53r30.03.MPP04.12.02 and IVMr46-53r30.03.MPP06.12.01. Animals were killed for scientific purposes according to the law mentioned above which comply with national and international regulations.
Cell line and mouse experiments were performed three times. Statistical analyses were performed using Excel Solver. Samples were analyzed at least in triplicates. The data are represented as mean±Standard Error (mean±s.e.m). For human samples, each point on the graph represents an individual sample while the horizontal line represents the median±Standard Error (median±s.e.m.). One-way analysis of variance (ANOVA) was used to determine the levels of difference between the groups and P values for significance.
Gene Expression Analysis by qRT-PCR
Total RNA was isolated from cell lines using the RNeasy Mini kit (Qiagen. Human lung tissue samples were obtained as formalin fixed paraffin embedded (FFPE) tissues and 8 sections of 10 μm thickness were used for total RNA isolation using the RecoverAll™ Total Nucleic Acid Isolation Kit for FFPE (Ambion). Total RNA isolation from EBC was performed using 500 μl of sample and the RNeasy Micro Kit (Qiagen). Complementary DNA (cDNA) was synthetized using the High Capacity cDNA Reverse Transcription kit (Applied Biosystem) and 0.5-0.7m (EBC) or 1 μg (cell lines, mice and human lung cancer tissue) total RNA. Quantitative real time PCR reactions were performed using SYBR® Green on the Step One plus Real-time PCR system (Applied Biosystems) using the primers specified in the Supplementary Table 2. Briefly, 1× concentration of the SYBR green master mix, 250 nM each forward and reverse primer and 3.5 μl (EBC) or 1 μl (cell lines, mice and human lung cancer tissue) from a 6 fold diluted RT reaction were used for the gene specific qPCR reaction. The PCR results were normalized with respect to the housekeeping gene alpha 1a Tubulin (TUBA1A).
Further validation of the LC score classifier was performed on an independent set of samples (EBCs) consisting of 22 previously unseen samples (10 controls and 12 LC patient EBCs,
The log 2-transformed Em/Ad ratio of GATA6 (x-axis) and NKX2-1 (y-axis) of controls (light grey circles) and LC patients (black circles) for the new validation set were plotted. The solid line represents the decision boundary determined by a linear support vector machine (SVM) classifier combining the Em/Ad ratios of GATA6 and NKX2-1 of each sample. Filled circle, sample classified correctly; empty circle, sample classified wrong. LC score is the distance to the boundary.
Discriminatory power of the Em/Ad ratios of GATA6 (grey line), NKX2-1 (grey dashed line) and the improved LC score (black line) assessed by receiver operating characteristic (ROC) curve analysis based on both sets of EBCs together (training and validation). The orange diamond represents the “point of operation” (performance) of the SVM classifier.
The present invention refers to the following nucleotide and amino acid sequences:
The sequences provided herein are available in the NCBI database and can be retrieved from www.ncbi.nlm.nih.gov/sites/entrez?db=gene; Theses sequences also relate to annotated and modified sequences. The present invention also provides techniques and methods wherein homologous sequences, and variants of the concise sequences provided herein are used. Preferably, such “variants” are genetic variants.
The following exemplary sequences relate to additional marker(s) that can be used in accordance with the present invention for classifying cancer, for example, for classifying lung cancer into subtypes of lung cancer.
The following markers are upregulated in adenocarcinoma:
Homo
sapiens tumor protein p63 (TP63), transcript variant 2
Homo
sapiens tumor protein p63 (TP63), transcript variant 3
Homo
sapiens tumor protein p63 (TP63), transcript variant 4
Homo
sapiens tumor protein p63 (TP63), transcript variant 5
Homo
sapiens tumor protein p63 (TP63), transcript variant 6
Homo
sapiens tumor protein p63 (TP63), transcript variant 1
Homo
sapiens microRNA
Homo
sapiens microRNA
Homo
sapiens microRNA
The following markers are upregulated in metastatic adenocarcinoma:
The following marker is upregulated in Large cell lung cancer
Homo
sapiens high mobility group AT-hook 1 (HMGA1), transcript variant 1
The following table provides more detailed information in relation to genomic alterations:
All references cited herein are fully incorporated by reference. Having now fully described the invention, it will be understood by a person skilled in the art that the invention may be practiced within a wide and equivalent range of conditions, parameters and the like, without affecting the spirit or scope of the invention or any embodiment thereof.
Number | Date | Country | Kind |
---|---|---|---|
14003697.1 | Nov 2014 | EP | regional |
14195027.9 | Nov 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/075615 | 11/3/2015 | WO | 00 |