The current disclosure relates generally to molecular biology and medicine. Particularly it concerns the field of oncology. More particularly, the disclosure relates to methods, compositions and kits involving diagnosis and treatment of metastatic cancer, including metastatic colorectal cancer.
Metastases are the leading cause of cancer-related deaths and are frequently widely disseminated, which has led to the prevailing view that metastases are always widespread. The oligometastasis hypothesis suggests that metastatic spread is a spectrum of virulence where some metastases are limited both in number and organ involvement and potentially curable with surgical resection or other loco-regional therapies1,2. This paradigm is in stark contrast to the outcomes of patients with solid tumors where widespread metastases are largely fatal despite recent advances in systemic therapy. To date, the oligometastasis concept has been challenged, in large part, due to the lack of supporting molecular data to identify metastases associated with restricted spread3,4.
Limited metastasis is relatively common. Data from clinical trials and single institution analyses of lung, breast, colorectal, prostate and renal cancers suggest that as many as 40-60% of patients with metastasis present with or develop limited disease5-8. Patients with limited liver metastases from colorectal cancer (CRC) have been consistently demonstrated to achieve prolonged survival after hepatic resection9,10 and provide an opportunity to investigate the molecular basis for oligometastasis. While there have been extensive investigations into the molecular subtypes of primary human cancers, little is known regarding molecular subtypes of metastasis and their relation to clinical outcomes.
There is a need for the identification of a molecular basis for differing outcomes among metastatic cancer patients, for the ability to identify molecular subtypes of metastatic cancer that are predictive of the clinical outcome, and for the identification of integrated molecular patterns in liver metastases that are associated with long-term survival. There is also a need for methods to differentially identify patients with potentially curable oligometastatic disease from those whose few metastases are a part of a large cascade of widespread disease. The identification of these subtypes of metastatic disease can help in deciding on an appropriate treatment plan for a particular patient.
The inventors have discovered a molecular basis for oligometastasis that is predictive of clinical outcome and have developed methods of diagnosis, prognosis, and treatment that use the molecular classification of metastatic tissue to identify curable metastatic cancer and otherwise guide treatment decisions. Using integrated analysis of gene and miRNA expression data in metastatic tissue samples, the inventors identified three molecular subtypes of colorectal cancer metastases. The three subtypes correlate with different clinical outcomes, and knowing the subtype of the metastasis informs treatment decisions and helps provide an accurate assessment of patient prognosis. This discovery applies in metastatic cancers beyond only colorectal liver cancer—methods disclosed herein can be used to identify molecular subtypes of other metastatic cancers and to guide prognosis and treatment decisions for patients having such cancers.
Disclosed herein is a method comprising measuring expression levels of one or more genes listed in Table 10A or one or more miRNAs listed in Table 11A in a sample comprising tissue from a metastasis from a primary cancer tumor. These tables list genes and miRNAs whose expression is particularly valuable in classifying molecular subtypes of metastases. In some embodiments, expression of other genes and miRNAs are also measured. For example, any of the methods disclosed herein may involve measuring the expression of one or more genes listed in Tables 3A-C, which list genes that are differentially expressed in SNF1, SNF2, and SNF3 liver metastases from colorectal cancer primary tumors. Any of the methods disclosed herein may also include measuring expression of one or more miRNAs listed in Tables 4A-4C, which lists miRNAs that are differentially expressed in SNF1, SNF2, and SNF3 liver metastases from colorectal cancer primary tumors. Any of the methods disclosed herein may also include measuring expression of the genes listed in Table 7 (immune genes overexpressed in SNF2 metastases). In some embodiments, the methods disclosed herein also include determining whether one or more of the genes listed in Table 8 are mutated or whether one or more of the genomic alterations listed in Table 9 are present. In some embodiments, expression of both genes and miRNAs are measured as part of a method disclosed herein. The methods disclosed herein can be used specifically in the context of metastatic colorectal cancer. Thus, in some embodiments, the metastasis may be a liver metastasis, and the cancer may be colorectal cancer. The metastasis that is tested may also be in other parts of the body besides the liver, including the lung, peritoneum, brain, or bone. The methods disclosed herein can also be used in the context of other metastatic cancers including, for example, liver cancer, testicular cancer, biliary cancer, ovarian cancer, urinary tract cancer, pancreatic cancer, prostate cancer, esophageal cancer, gastric cancer, head and neck cancer, cervical cancer, lung cancer, neuroendocrine cancer, kidney cancer, breast cancer, and melanoma.
In some embodiments, the expression levels of at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, or 113 of the genes listed in Table 10A are measured, or any range derivable therein. In some embodiments, the expression levels of at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, or 113 of the genes listed in Table 10A are excluded from being measured, or any range derivable therein. In some embodiments, the expression levels of at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, or 53 of the miRNAs listed in Table 11A are measured, or any range derivable therein. In some embodiments, the expression levels of at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, or 53 of the miRNAs listed in Table 11A are excluded from being measured, or any range derivable therein.
It is contemplated that expression levels of any subset of the genes or miRNAs listed in Tables 3A-C, 4A-C, 10A, and 11A may be measured or may be excluded from being measured as part of a method disclosed herein. Certain subsets of these genes and miRNAs may be chosen for their greater usefulness in making classifications and differentiating between different types of metastases. A subset of genes or miRNAs that are to be examined as part of an assay to identify a sample metastasis as belonging to a particular molecular subtype may be identified by an analysis such as a nearest shrunken centroid analysis to identify subsets of genes and/or miRNAs, or a combination of genes and miRNAs, whose expression levels best characterize each subtype. Methods disclosed herein may include performing such an analysis to identify a set of genes and/or miRNAs that can provide for accurate and sensitive subtyping of individual metastases. In some embodiments, expression of a subset of at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or 1000 genes or miRNAs listed in Tables 3A-C and 4A-C, or any range derivable therein, are examined.
In some embodiments, the expression levels of one or more genes or one or more miRNAs are within a predetermined amount of the mean expression levels of the one or more genes or miRNAs, on a gene-by-gene and miRNA-by-miRNA basis, in metastases of a cohort of patients having an oligometastatic phenotype, of a cohort of patients who are likely to be healed without the administration of systemic cancer therapy, or of a cohort of patients having a mean ten-year overall survival expectation that is at least 60%. The mean levels may be determined by measuring the expression levels of genes in metastases of patients in the cohort and calculating a mean expression level for each gene. In some embodiments, the patients are patients having metastatic cancer or having metastatic colorectal cancer. Classification of a metastasis may be done by comparing the measured expression levels of genes and/or miRNAs to reference expression levels of the same genes and/or miRNAs. The reference expression levels may be identified as the mean expression levels in metastases of a cohort of patients having characteristics associated with a metastatic subtype, such as a cohort having a mean ten-year overall survival expectation that is at least 60%, or other characteristics of a molecular subtype, such as the characteristics of an SNF1, SNF2, or SNF3 subtype described herein. The reference expression levels of such cohorts, and of any patient cohorts described herein, may be established by measuring the expression levels in metastases of at least, at most, or exactly 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1000 subjects in the cohort, or any range derivable therein. In some embodiments, the cohort of patients comprises a representative sample of metastatic cancer patients, including metastatic colorectal cancer patients, having a certain characteristic, such as an oligometastatic phenotype, a relatively high likelihood of being successfully treated with immune checkpoint therapy, a mean ten-year survival expectation of at least 60%, or other characteristics of metastatic subtypes identified herein. If the expression levels of the genes and/or miRNAs measured in a sample metastasis are sufficiently close to the reference expression levels of a metastatic subtype, then the sample metastasis can be classified as being of that subtype. The degree of closeness in expression levels required to be classified as a match may be predetermined using a statistical analysis. In some embodiments, the predetermined amount of closeness is within one standard deviation of the mean expression level of the reference cohort. In some embodiments, the predetermined amount is within 0.1, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0, 10, 15, or 20% of the reference expression level, or any range derivable therein. In some embodiments, a sample metastasis may be classified as belonging to a molecular subtype despite the expression levels of one or more genes or miRNAs deviating from a reference expression level by a substantial amount. For instance, if a substantial number of other gene or miRNA expression levels sufficiently match the reference expression, then the sample metastasis may be classified as belonging to the subtype. A computer-based classifier programmed to perform a statistical analysis may be used to determine whether expression levels of a sufficient number of genes and/or miRNAs in a sample metastasis are sufficiently close to the reference expression levels of a particular molecular subtype to classify the sample as belonging to that subtype.
It is contemplated that the methods described herein may involve a comparison between expression levels measured for a sample metastasis and reference expression levels that are indicative of metastatic subtypes or any of the characteristics of metastatic subtypes described herein. Thus, in some embodiments, the measured expression level for a gene or miRNA is lower than, higher than, close to, higher by a predetermined amount than, lower by a predetermined amount than, or within a predetermined amount of the expression level of the gene or miRNA in metastases from a cohort of metastatic cancer patients having any one of the following characteristics: (i) a mean ten-year overall survival expectation of at least 60%; (ii) a relatively high or low likelihood of experiencing metastatic recurrence after hepatic resection; (iii) a relatively high or low likelihood of being successfully treated without systemic cancer treatments; (iv) a relatively low likelihood of being successfully treated with local cancer treatments; (v) a relatively high likelihood of being successfully treated with immune checkpoint therapy; (vi) a mean ten-year overall survival expectation of less than 50%, 35%, or 20%; (vii) a relatively high degree of infiltration of immune cells; among other characteristics of the metastatic subtypes described herein. In some embodiments, wherein the expression levels of one or more genes listed in Table 10A or one or more miRNAs listed in Table 11A deviate by a predetermined amount from the mean expression levels of the one or more genes or the one or more miRNAs in metastases of a cohort of metastatic colorectal cancer patients having a mean ten-year overall survival expectation that is less than 50%. In some embodiments, the expression levels of one or more genes listed in Table 10B are higher by a predetermined amount than the mean expression level of the one or more genes in metastases of a cohort of metastatic colorectal cancer patients having a mean ten-year overall survival expectation that is less than 50%. In some embodiments, the measured expression levels of one or more genes listed in Table 10C are lower by a predetermined amount than the mean expression level of the one or more genes in metastases of a cohort of metastatic colorectal cancer patients having a mean ten-year overall survival expectation that is less than 50%. In some embodiments, the measured expression levels of one or more miRNAs listed in Table 11B are higher by a predetermined amount than the mean expression level of the one more more miRNAs in metastases of a cohort of metastatic colorectal cancer patients having a mean ten-year overall survival expectation that is less than 50%. In some embodiments, the measured expression levels of one or more miRNAs listed in Table 11C is lower by a predetermined amount than the mean expression level of the one or more miRNAs in metastases of a cohort of metastatic colorectal cancer patients having a mean ten-year overall survival expectation that is less than 50%. In any of the methods described herein, a cohort of patients may be a cohort of metastatic cancer patients, colorectal cancer patients, or metastatic colorectal cancer patients.
In some embodiments, the method further comprises calculating a Clinical Risk Score (“CRS”) for the patient, which is calculated using the following adverse clinical and pathological features: (1) disease-free interval between primary tumor diagnosis and development of metastasis <12 months, (2) number of liver metastases >1, (3) largest liver metastasis >5.0 cm, (4) lymph node-positive primary CRC, and (5) CEA>200 ng/mL. A patient with none of these features has a CRS of 0; a patient with one of these features has a CRS of 1; and so on up to a maximum CRS of 5.
In some embodiments, the method further comprises administering a cancer therapy to the patient. The cancer therapy may be chosen based on the gene or miRNA expression measurements, alone or in combination with the clinical risk score calculated for the patient. In some embodiments, the cancer therapy comprises a local cancer therapy. In some embodiments, the cancer therapy excludes a systemic cancer therapy. In some embodiments, the cancer therapy excludes a local therapy. In some embodiments, the cancer therapy comprises a local cancer therapy without the administration of a system cancer therapy. In some embodiments, the cancer therapy comprises an immunotherapy, which may be an immune checkpoint therapy. Any of these cancer therapies may also be excluded. Combinations of these therapies may also be administered. In some embodiments, the gene or miRNA expression measurement and analysis may indicate that one or more cancer therapies would be likely to be effective or ineffective. A particular advantage of methods disclosed herein is that they allow doctors for the first time to make a treatment decision based on the molecular subtype of a metastasis. The discoveries disclosed herein indicate that some metastatic subtypes, such as SNF2, for example, are more likely to respond to a local therapy such as resection, radiation therapy, and the like, without the need for a systemic cancer therapy, whereas it was previously thought that any metastatic cancer requires a systemic therapy. The discoveries disclosed herein also allow doctors to identify metastatic cancer for which a local therapy may not be helpful and/or for which systemic therapies, such as DNA damaging drugs, are appropriate.
Measuring the expression of genes and/or miRNAs may be done by a variety of methods. In some embodiments, the measurement comprises performing PCR using RNA obtained from a sample of metastatic tissue as a template. The method may include the use of sets of PCR primers that are complementary to sequences of genes or miRNAs listed in Tables 3A-C, 4A-C, 10A-C, or 11A-C, including any subsets thereof. Measuring expression may also comprise hybridizing nucleic acids to a microarray. The microarray may include nucleic acid sequences that correspond to or are complementary to sequences of genes or miRNAs listed in Tables 3A-C, 4A-C, 10A-C, or 11A-C, including any subsets thereof. Methods may also include the use of nucleic acid probes that correspond to or are complementary to sequences of genes or miRNAs listed in Tables 3A-C, 4A-C, 10A-C, or 11A-C. Any of the primers or probes used may be labeled or modified with fluorescent labels or other moieties that allow the primers or probes to be detected. In some embodiments, measuring expression comprises performing RNA sequencing.
Also disclosed is a method of treating metastatic cancer in a patient, the method comprising administering to the patient a local cancer therapy without administering systemic cancer therapy or administering to the patient an immunotherapy, wherein the patient has been determined to have expression levels of one or more genes listed in Table 10A or one or more miRNAs listed in Table 11B that are within a predetermined amount of the mean expression levels in metastases of a cohort of metastatic cancer patients having a mean overall ten-year survival expectation that is at least 60%. In some embodiments, the patient has been determined to have expression levels of at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein, and/or at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein, that are within a predetermined amount of the mean expression level in metastases of a cohort of metastatic cancer patients having a mean overall ten-year survival expectation that is at least 60%. In some embodiments, the treatments are administered to a patient that has been determined to have expression levels of one or more genes and/or miRNAs that are indicative of an oligometastatic phenotype or of other characteristics of SNF2 metastases. In some embodiments, the patient has been determined to have expression levels of at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein, and/or at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein, that are within a predetermined amount of the mean expression level of a cohort of metastatic cancer patients having a mean overall ten-year survival expectation that is at least 60%.
Also disclosed is a method of treating metastatic cancer in a patient, the method comprising administering to the patient a local cancer therapy without administering systemic cancer therapy, wherein the patient has been determined to have an mRNA and/or miRNA expression profile indicating an oligometastatic phenotype or a specific metastatic subtype that is likely to be successfully treated with local cancer therapy. In some embodiments, the mRNA expression profile is determined by determining the expression of one or more genes listed in Table 10A and the miRNA expression profile is determined by determining the expression of one or more genes listed in Table 11A. In some embodiments, the expression profile is determined by determining the expression levels of at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein, and/or at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein. In some embodiments, the expression profile indicates a ten-year survival expectation of greater than 60% or less than 50, 35, or 20%, an increased likelihood of successful treatment with administration of local cancer therapies, an increased infiltration of immune cells, or other characteristics of any metastatic subtype as described herein.
Also disclosed is a method of treating cancer in a patient having a metastasis from a primary cancer tumor, the method comprising: administering to the patient an immune checkpoint therapy or administering to the patient a local cancer therapy without administering a systemic cancer therapy, wherein the patient has been identified based on the expression levels of one or more mRNA and/or miRNA species in the metastasis as belonging to a group of patients with one or more of the following characteristics: (a) a mean ten-year overall survival expectation of at least 60%; (b) a likelihood of experiencing metastatic recurrence after hepatic resection that is lower than the likelihood for patients outside of the group; and (c) a level of immune cell infiltration into the metastasis that is higher than the mean level for patients outside the group. In some embodiments, the one or more mRNA species comprise one or more transcripts of the genes listed in Table 10A. In some embodiments, the one or more miRNA species comprise one or more transcripts of the miRNAs listed in Table 11A. In some embodiments, the mRNA or miRNA species comprise at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein, and/or at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein. In some embodiments, the metastasis is a liver metastasis and the cancer is colorectal cancer.
Also disclosed is a method of diagnosing a patient having a metastasis from a primary colorectal cancer tumor, the method comprising: (a) measuring the expression levels in the metastasis of one or more of the genes or of one or more miRNAs; (b) identifying the patient as having an oligometastatic phenotype, as being a responder to immune checkpoint cancer therapy, or as having a ten-year survival expectation of greater than 60% if the expression level of one or more of the genes or miRNAs is within a predetermined amount of a first reference expression level or deviates from a second reference expression level by a predetermined amount. In some embodiments, the first reference expression level represents the mean expression level in metastases of a cohort of metastatic cancer patients having an oligometastatic phenotype, being responders to immune checkpoint cancer therapy, and/or or having mean ten-year survival expectation of greater than 60%. In some embodiments, the second reference expression level represents the mean expression level in metastases of a cohort of metastatic cancer patients having a mean ten-year survival expectation of less than 50%. In some embodiments, the one or more genes and/or miRNAs comprise at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein, and/or at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein.
Also disclosed is a method of diagnosing and treating a patient having a metastasis from a primary colorectal cancer tumor, the method comprising: (a) obtaining a tissue sample from the metastasis; (b) measuring the expression of one or more genes and/or miRNAs in the sample; (c) comparing the measured expression level of each gene or miRNA to a reference expression level for that gene or miRNA; (d) identifying the metastasis as an SNF1, SNF2, or SNF3-type metastasis based on the measured expression levels; and (e) administering to the patient an appropriate therapy based on the type of metastasis identified in step (d). In some embodiments, the one or more genes and/or miRNAs comprise at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein and/or at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein. In some embodiments, the appropriate therapy for a patient with an SNF2-type metastasis comprises an immune checkpoint cancer therapy. In some embodiments, the appropriate therapy for a patient with an SNF2-type metastasis comprises a local cancer therapy unaccompanied by systemic cancer therapy. In some embodiments, the appropriate therapy for a patient with an SNF1 metastasis comprises a DNA-damaging cancer therapy. In some embodiments, the DNA-damaging cancer therapy comprises administering PARP inhibitors. In some embodiments, the appropriate therapy for a patient with an SNF1 or SNF3 metastasis comprises a systemic cancer therapy. In some embodiments, the appropriate therapy for a patient with an SNF1 or SNF3 metastasis excludes immune checkpoint cancer therapy.
Also disclosed is a method of providing a prognosis for a patient having metastatic colorectal cancer, the method comprising: (a) evaluating the expression of one or more genes and/or miRNAs in a tissue sample from a metastasis taken from the patient to identify the metastasis as an SNF1, SNF2, or SNF3-type metastasis; (b) determining the clinical risk score of the patient; (c) determining the ten-year survival expectation of the patient as follows: (i) identifying the patient as having a ten-year survival expectation of greater than 90% if the metastasis is type SNF1 or SNF2 and the clinical risk score is 0 or 1; (ii) identifying the patient as having a ten-year survival expectation of between 40 and 50% if the metastasis is type SNF2 and the clinical risk score is 2 or greater or if the metastasis is type SNF3 and the clinical risk score is 0 or 1; and (iii) identifying the patient as having a ten-year survival expectation of less than 24% if if the metastasis is type SNF1 or SNF3 and the clinical risk score is 2 or greater. In some embodiments, the genes and/or miRNAs comprise at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein, and/or at least 5, 10, 20, 30, 40, 50, or 53 miRNAs listed in Table 11A, or any range derivable therein.
Also disclosed is a method comprising evaluating the expression levels of multiple mRNA and/or miRNA species in a sample comprising tissue from a liver metastasis of a patient that has metastatic colorectal cancer to identify the patient as belonging to a first group of metastatic colorectal cancer patients or a second group of metastatic colorectal cancer patients, wherein: (a) the first group has one or more of the following characteristics: (i) a mean ten-year overall survival expectation of at least 60%; (ii) a mean ten-year overall survival expectation that is higher than that for patients outside of the first group; (iii) a likelihood of experiencing metastatic recurrence after hepatic resection that is lower than the likelihood for patients outside of the first group; (iv) a likelihood of being successfully treated without systemic cancer treatments that is higher than the likelihood for patients outside of the first group; and (v) a likelihood of being successfully treated with immune checkpoint therapy that is higher than the likelihood for patients outside of the first group; and (b) the second group has one or more of the following characteristics: (i) a mean ten-year overall survival expectation of less than 50%; (ii) a mean ten-year overall survival expectation that is lower than that for patients outside of the second group; (iii) a likelihood of experiencing metastatic recurrence after hepatic resection that is higher than for patients outside of the second group; (iv) a likelihood of being successfully treated without systemic cancer treatments that is lower than the likelihood for patients outside of the second group; (v) a likelihood of being successfully treated with immune checkpoint therapy that is lower than the likelihood for patients outside of the second group; and (vi) a likelihood of being successfully treated with DNA damaging cancer therapy that is higher than the likelihood for patients outside of the second group. In some embodiments, the mRNA species comprise transcripts of one or more genes listed in Table 10A. In some embodiments, the miRNA species comprise one or more of the miRNAs listed in Table 11A. In some embodiments, the patient is identified as belonging to the first group of patients if the expression of one or more genes listed in Table 10A is within a predetermined amount of a reference expression level of the one or more genes. In some embodiments, the patient is identified as belonging to the first group of patients if the expression of one or more miRNAs listed in Table 11A is within a predetermined amount of a reference expression level of the one or more miRNAs. In some embodiments, step (b) comprises using a classifier that has been trained to identify an RNA expression pattern associated with the first group of patients. In some embodiments, the classifier evaluates the expression levels of at least, at most, or exactly 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or 113 genes listed in Table 10A, or any range derivable therein. In some embodiments, the classifier evaluates the expression levels of at least, at most, or exactly 5, 10, 20, 30, 40, 50, or 53 of the miRNAs listed in Table 11A, or any range derivable therein. In some embodiments, the method further comprises administering an immune checkpoint therapy to a patient identified as belonging to the first group. In some embodiments, the method further comprises treating a patient identified as belonging to the first group with local treatment of liver metastases unaccompanied by systemic cancer treatment. In some embodiments, the method further comprises administering a DNA damaging cancer therapy to a patient identified as belonging to the second group of patients.
Also disclosed is a method of identifying a molecular subtype of metastatic cancer, the method comprising performing genome-wide expression profiling of a plurality of metastatic tissue samples to generate expression data of mRNA and miRNA in the tissue samples and analyzing the expression data using a similarity network fusion algorithm or other integrated molecular analysis technique that identifies similarities in both mRNA and miRNA expression data among samples to identify groups of samples having expression patterns that are similar to other samples in the group and that are dissimilar from samples outside the group. In some embodiments, the method further comprises identifying genes and miRNAs that are differentially expressed in a group of samples relative to either a mean expression level across all samples or a mean expression level of samples outside the group. In some embodiments, the method further comprises identifying a subset of the differentially expressed genes and/or miRNAs whose expression levels in a single sample can be used to accurately classify the sample as belonging to a particular molecular subtype or not belonging to a particular molecular subtype.
In any of the embodiments described herein, the patient may have already been diagnosed with cancer or already had tumor resection before any of the steps of methods described herein are performed.
Any step or aspect of an embodiment described herein may be implemented in the context of any other embodiment described herein
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
Here, utilizing independent clinical cohorts of CRC patients who underwent resection of liver metastases, the inventors have identified integrated molecular patterns in liver metastases associated with long-term survival. The inventors' findings indicate a molecular basis for oligometastasis that is predictive of clinical outcome and complementary to established clinical risk factors associated with long-term survival following hepatic resection. Aspects of the current invention have important clinical implications in the selection of local therapy for those patients with potentially curable oligometastatic disease from those whose few metastases are a part of a large cascade of widespread disease. These concepts may be applicable to many histological types of cancer. Methods disclosed herein involve determining expression levels of genes and miRNAs in liver metastases to identify the molecular subtype of the metastasis. The subtype classification can be used to provide a prognosis and to guide treatment decisions. These and other aspects of the disclosed methods will be described in greater detail below.
Methods disclosed herein include measuring expression of genes and/or miRNAs. Measurement of expression can be done by a number of processes known in the art. The process of measuring expression may begin by extracting RNA from a metastasis tissue sample. Extracted mRNA and/or miRNA can be detected by hybridization (for example by means of Northern blot analysis or DNA or RNA arrays (microarrays) after converting mRNA into labeled cDNA) and/or amplification by means of a enzymatic chain reaction. Quantitative or semi-quantitative enzymatic amplification methods such as polymerase chain reaction (PCR) or quantitative real-time RT-PCR or semi-quantitative RT-PCR techniques can be used. Primer pairs may be designed for the purpose of superimposing an intron to distinguish cDNA amplification from the contamination from genomic DNA (gDNA). Additional primers or probes, which are preferably labeled, for example with fluorescence, which hybridize specifically in regions located between two exons, are optionally designed for the purpose of distinguishing cDNA amplification from the contamination from gDNA. If desired, said primers can be designed such that approximately the nucleotides comprised from the 5′ end to half the total length of the primer hybridize with one of the exons of interest, and approximately the nucleotides comprised from the 3′ end to half the total length of said primer hybridize with the other exon of interest. Suitable primers can be readily designed by a person skilled in the art. Other amplification methods include ligase chain reaction (LCR), transcription-mediated amplification (TMA), strand displacement amplification (SDA) and nucleic acid sequence based amplification (NASBA). Expression levels of mRNAs and/or miRNAs may also be measured by RNA sequencing methods known in the art.
To normalize the expression values of one gene among different samples, comparing the mRNA level of the gene of interest in the samples from the subject object of study with a control RNA level is possible. As it is used herein, a “control RNA” is an RNA of a gene for which the expression level does not differ among different metastatic subtypes, for example a gene that is constitutively expressed in all types of cells. A control RNA is preferably an mRNA derived from a housekeeping gene encoding a protein that is constitutively expressed and carrying out essential cell functions.
Methods disclosed herein may include comparing a measured expression level to a reference expression level. The term “reference expression level” refers to a value used as a reference for the values/data obtained from samples obtained from patients. The reference level can be an absolute value, a relative value, a value which has an upper and/or lower limit, a series of values, an average value, a median, a mean value, or a value expressed by reference to a control or reference value. A reference level can be based on the value obtained from an individual sample, such as, for example, a value obtained from a sample from the subject object of study but obtained at a previous point in time. The reference level can be based on a high number of samples, such as the levels obtained in a cohort of subjects having a particular characteristic. The reference level may be defined as the mean level of the patients in the cohort. For example, the reference expression level for a gene or miRNA can be based on the mean expression level of the gene or miRNA obtained from a number of patients who have SNF2 metastases. A reference level can be based on the expression levels of the markers to be compared obtained from samples from subjects who do not have a disease state or a particular phenotype. The person skilled in the art will see that the particular reference expression level can vary depending on the specific method to be performed.
Some embodiments include determining that a measured expression level is higher than, lower than, increased relative to, decreased relative to, equal to, or within a predetermined amount of a reference expression level. In some embodiments, a higher, lower, increased, or decreased expression level is at least 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 50, 100, 150, 200, 250, 500, or 1000 fold (or any derivable range therein) or at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, or 900% different than the reference level, or any derivable range therein. These values may represent a predetermined threshold level, and some embodiments include determining that the measured expression level is higher by a predetermined amount or lower by a predetermined amount than a reference level. In some embodiments, a level of expression may be qualified as “low” or “high,” which indicates the patient expresses a certain gene or miRNA at a level relative to a reference level or a level with a range of reference levels that are determined from multiple samples meeting particular criteria. The level or range of levels in multiple control samples is an example of this. In some embodiments, that certain level or a predetermined threshold value is at, below, or above 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 percentile, or any range derivable therein. Moreover, a threshold level may be derived from a cohort of individuals meeting a particular criteria. The number in the cohort may be, be at least, or be at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 441, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000 or more (or any range derivable therein). A measured expression level can be considered equal to a reference expression level if it is within a certain amount of the reference expression level, and such amount may be an amount that is predetermined. This can be the case, for example, when a classifier is used to identify the molecular subtype of a metastasis. The predetermined amount may be within 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, or 50% of the reference level, or any range derivable therein.
For any comparison of gene or miRNA expression levels to a mean expression levels or a reference expression levels, the comparison is to be made on a gene-by-gene and miRNA-by-miRNA basis. For example, if the expression levels of gene A, gene B, and miRNA X in a patient's metastasis are measured, a comparison to mean expression levels in metastases of a cohort of patients would involve: comparing the expression level of gene A in the patient's metastasis with the mean expression level of gene A in metastases of the cohort of patients, comparing the expression level of gene B in the patient's metastasis with the mean expression level of gene B in metastases of the cohort of patients, and comparing the expression level of miRNA X in the patient's metastasis with the mean expression level of miRNA X in metastases of the cohort of patients. Comparisons that involve determining whether the expression level measured in a patient's metastasis is within a predetermined amount of a mean expression level or reference expression level are similarly done on a gene-by-gene and miRNA-by-miRNA basis, as applicable.
Methods disclosed herein can be used to identify different molecular subtypes of metastatic cancer that correlate with different clinical outcomes and different sensitivities to particular treatment regimens. The subtypes can be identified using an integrated molecular analysis techniques. One such technique described in the Examples below is a similarity network fusion (SNF) algorithm, which incorporates parallel miRNA and mRNA expression networks in a number of patient samples. The SNF analysis established three subtypes of metastatic cancer based solely on expression data, but the subtypes exhibited heterogenous clinical outcomes. Other types of integrated approaches to identifying molecular subtypes can also be used. For example, the inventors analyzed the miRNA and mRNA expression data using consensus clustering of clusters and iClusterPlus and found that, similar to SNF, these approaches identified three distinct subtypes of metastases based on expression alone, and that the distinct subtypes showed statistically significant differences in clinical outcomes of patients. These data demonstrate that the intrinsic subtypes are independent of the type of integrated molecular analysis used to identify them. Furthermore, the discovery that metastatic cancers are heterogeneous and include distinct molecular subtypes enables skilled persons to identify metastatic subtypes of different types of metastatic cancers using integrated analyses of gene and miRNA expression, including liver cancer, testicular cancer, biliary cancer, ovarian cancer, urinary tract cancer, pancreatic cancer, prostate cancer, esophageal cancer, gastric cancer, head and neck cancer, cervical cancer, lung cancer, neuroendocrine cancer, kidney cancer, breast cancer, melanoma, and other cancers that can progress to metastatic cancer.
Methods disclosed herein may include administering a cancer therapy or determining a course of cancer treatment based on an identified metastatic subtype. Some embodiments include administering a local cancer treatment or determining that a local cancer treatment is appropriate. Local cancer treatments include those that target cancer tissue using a technique directed to a specific organ or limited area of the body. Local cancer treatments include surgery (i.e., resection), radiation therapy, cryotherapy, laser therapy, topical therapy, high intensity focused ultrasound, and photodynamic therapy. The local treatments may include stereotactic body radiotherapy (SBRT), stereotactic ablative body radiotherapy (SABR), stereotactic radiosurgery (SRS), radiofrequency ablation (RFA), percutaneous cryoablation therapy (PCT), and photodynamic therapy (PDT). The local therapies may be directed at the primary tumor and/or at one or more metastases.
Systemic cancer therapies are those that are distributed widely within the body, such as a variety of drug treatments, which may be delivered orally or intravenously. Examples of systemic therapies include chemotherapy, hormone therapy, immunotherapy, and targeted therapy (i.e., drugs that are distributed widely within the body, but have targeted effects on cancer cells). More specifically, chemotherapy includes administering drugs such as cyclophosphamide, paclitaxel, epirubicin, methotrexate, gemcitabine, albumin-bound paclitaxel, carboplatin, etoposide, doxorubicin, capecitabine, fluorouracil, vinorelbine, docetaxel, liposomal doxorubicin, eribulin, or irinotecan, including combinations thereof. Immunotherapy includes monoclonal antibodies, such as alemtuzumab, trastuzumab, ibritumomab tiuxetan, brentuximab vedotin, ado-trastuzumab emtansine, denileukin diftitox, and blinatumomab; immune checkpoint inhibitors, such as pembrolizumab, nivolumab, atezolizumab, avelumab, durvalumab, and ipilimumab; and cancer vaccines such as sipuleucel-T.
Identifying the molecular subtype of metastatic colorectal cancer can be used to determine an appropriate treatment regimen. In some embodiments, the appropriate treatment for SNF1 metastases include EGFR inhibitors, PARP inhibitors, PI3K inhibitors, NOTCH inhibitors, angiogensis inhibitors, DNA damaging agents, STING agonists, innate immune agonists, RNA vaccines, or combinations thereof. In some embodiments, the appropriate treatment for SNF2 metastases include PD-1/PD-L1 immunotherapies, other immunotherapies, beta-secretase inhibitors, lipid-lowering agents, and combinations thereof. In some embodiments, the appropriate treatment for SNF3 metastases include PDGF/PDGFR inhibitors, VEGF/VEGFR inhibitors, angiogenesis inhibitors, JAK1/JAK2 inhibitors, COX2 inhibitors, HDAC inhibitors, DNA demethylating agents, other epigenetic modifiers, and combinations thereof.
Methods disclosed herein can also include making treatment decisions based on an integrated risk group classification of a patient. This classification combines the molecular subtyping of the metastasis with a clinical risk score of the patient. Integration of SNF subtypes and CRS yielded three prognostic risk groups: (1) low-risk (22% of patients)—SNF1 and SNF2 subtypes with low CRS; (2) intermediate-risk (29% of patients)—SNF2 subtype with high CRS and SNF3 subtype with low CRS; (3) high-risk patients (49% of patients)—SNF1 and SNF3 subtypes with high CRS. A patient's integrated risk group indicates the likelihood of benefit from local metastasis-directed therapies such as surgical resection, stereotactic body radiotherapy (SBRT), stereotactic ablative body radiotherapy (SABR), stereotactic radiosurgery (SRS), radiofrequency ablation (RFA), percutaneous cryoablation therapy (PCT), and photodynamic therapy (PDT): low-risk patients have the highest likelihood of benefit from these therapies, high-risk patients have the lowest likelihood of benefit from these therapies, and intermediate-risk patients have an intermediate likelihood of benefit from these therapies.
Conventionally, it has been thought that metastatic cancer always requires a systemic therapy. However, the identification of molecular subtypes of metastatic cancer as described herein shows that some metastatic cancers are likely to respond favorably to local therapies and may not need an additional systemic therapy. Conversely, some metastatic cancers are not likely to respond to local therapy alone, or at all, and should therefore be treated with appropriate systemic therapies.
One hundred thirty-four patients with comprehensive clinical annotations underwent hepatic resection of limited CRCLM. The clinical characteristics of these patients are summarized in Table 1. The median patient age was 61 years (range, 29-85). Patients were diagnosed with primary adenocarcinoma of the colon (72%) or rectum (28%) and presented with either synchronous (47%) or metachronous (53%) liver metastasis. The initial number of liver metastases was one in 61%, two in 22% and three or more in 17% of patients. Liver metastases were limited to one hepatic lobe in 91% of patients and two hepatic lobes in 9% of patients. Our analysis focused on de novo liver metastases and excluded patients with extrahepatic disease or a history of previously resected metastasis. Patients received uniform treatment with 5-fluorouracil-based perioperative chemotherapy, curative intent management of primary colorectal tumors, and partial hepatectomy of all visible liver metastases (Table 1). Post-operatively all patients were surveilled with serial axial CT imaging and serum CEA levels.
At a median follow-up of 49 months, 32% of patients had no evidence of metastatic recurrence. These patients had a 10-year OS of 77% whereas patients with clinically evident, recurrent metastases exhibited a 10-year OS of 13% (P<0.0001, log-rank test) (
Gene expression analysis is an established approach for molecular subtyping of primary human cancers14,15. The International Colorectal Cancer Subtyping Consortium (CRCSC) demonstrated the existence of four biologically and clinically distinct Consensus Molecular Subtypes (CMS) of CRC based on gene expression analysis of 3,962 primary tumors16,17. However, it is unknown whether CMS subtypes also exist in CRCLM. First, the application of CMS classification to the analysis of RNA Sequencing data from 558 primary CRC tumors in The Cancer Genome Atlas (TCGA)18 was validated, which verified the expected frequencies of CMS subtypes (
Transcriptomic analyses of individual mRNA or miRNA datasets were limited in the molecular subtyping of colorectal liver metastases (
Each SNF subtype demonstrated distinct patterns of mRNA and miRNA expressions (
Ensemble of Gene Set Enrichment Analyses (EGSEA) provided substantial insight into the biological features of SNF subtypes (
Importantly, CRCLM subtypes were also discernible at the histological level (
Fifty-nine liver metastases and matched normal liver specimens underwent next-generation genomic sequencing using OncoPlus, a clinically validated hybrid capture genomic sequencing platform comprising 1,212 commonly altered cancer genes for mutational and copy number analyses11. Mutation Significance (MutSig) analysis confirmed enrichment in CRC driver gene mutations of APC, TP53, KRAS, PIK3CA, SOX9, SMAD4, and FBXW7 in 83%, 73%, 37%, 20%, 14%, 14% and 12% of liver metastases, respectively (
These findings were extended by characterizing the mutational and copy number landscapes of CRCLM by SNF subtype. Unique somatic mutations in each SNF subtype were identified (
Furthermore, the inventors found that the median number of mutations per sample was not statistically different across SNF subtypes. Given that mismatch repair deficiency leading to microsatellite instability (MSI) contributes to tumor hypermutation in association with cytotoxic immune infiltration25, the inventors investigated whether MSI explained the SNF2 subtype. They identified an MSI phenotype in 3.4% of patients, which is consistent with the incidence of MSI in metastatic colorectal cancer26. However, only one SNF2 metastasis demonstrated an MSI-high phenotype, while two metastases—one from SNF1 and one from SNF2, exhibited an MSI-low phenotype. The SNF2 MSI-high and MSI-low metastases, but not SNF1 MSI-low metastasis, showed significant enrichment of cytotoxic cell signature expression (
The inventors investigated whether SNF molecular subtyping could improve clinical risk stratification after hepatic resection of CRCLM by augmenting the prognostic effect of CRS. Multivariate Cox proportional hazard analysis indicated the prognostic impact of SNF subtypes was statistically independent of but complementary to CRS (
Patients: Samples from 134 adults with liver metastases from primary CRC of which 121 metastases from independent patients successfully underwent molecular analysis were analyzed (
Analytic Platforms: microRNA (miRNA) profiling for 116 samples using Affymetrix miRNA 4.0 Arrays was performed as well as whole genome RNA sequencing for 95 samples using Illumina TruSeq Stranded Total RNA Sequencing. In addition, hybrid capture genomic sequencing of liver metastases and matched normal liver specimens from 59 patients using the OncoPlus panel was performed11. All sequencing was conducted on Illumina HiSeq sequencers. Also performed was microsatellite instability (MSI) analysis on 89 samples using the Promega MSI 1.2 clinical assay according to FDA approved guidelines. Clinical data were frozen on Apr. 30, 2016 and molecular data were frozen on Jun. 26, 2016. Overall survival (OS), defined as the interval between hepatic resection and death from any cause or until censoring at the time the patient was last known to be alive, was chosen as the optimal primary endpoint. The complete list of datasets is provided in Table 2.
Statistical analysis: The statistical analysis included Fisher's exact tests for associations of categorical variables when there were two categories or Chi-square tests when there were three categories. Kaplan-Meier and Cox proportional hazard analyses were used to examine the associations of molecular features with clinical outcomes. Multiple testing corrections were performed using the Benjamini-Hochberg method. All reported P-values are two-sided. A complete description of the methods is in the methods described hereafter.
A retrospective clinical cohort study was conducted on patients who underwent hepatic resection of histologically confirmed metastatic colorectal adenocarcinoma at the University of Chicago Medical Center (Chicago, Ill.) and NorthShore University Health System (Evanston, Ill.) between 1994 and 2012. During this time period, approximately 60-75 patients per year underwent hepatic resection of colorectal liver metastases at the two participating institutions. All available clinical, pathologic, radiologic, and outcome data were collected for patients using medical records. Patients with unresectable or extrahepatic disease at the time of metastatic diagnosis were excluded from this study. In total, 134 consecutive patients with metastatic colorectal cancer who underwent surgical resection of limited de novo liver metastases were selected for molecular analysis. Patients were uniformly treated with perioperative chemotherapy, definitive treatment of primary colorectal cancer, and partial hepatectomy for resection of liver metastases. Detailed cohort characteristics are provided in Table 1 and Table 2. This study was approved by the Institutional Review Boards at each respective institution. Dates of recurrence, death or last follow-up were extracted from medical records and Social Security Death databases. Clinical risk scores (CRS) were calculated as previously described1.
Formalin-fixed paraffin-embedded (FFPE) specimens were collected from archived pathologic tissue. FFPE specimens were catalogued and histologically reviewed by an expert pathologist (Dr. Nora Joseph) to ensure adequacy of the specimen and histologic quality control. Tissue blocks containing sufficient tumor tissue were subjected to 2 mm punch biopsies of both tumor and normal liver regions. For each surgical specimen, representative FFPE tissue blocks and corresponding H&E slides were analyzed to confirm the diagnosis of colorectal adenocarcinoma and identify regions containing high quantities of viable tumor cells, as well as independent regions containing normal liver parenchyma. Three cores from tumor and normal tissue regions were obtained. For each specimen, all three cores were combined to reduce intratumoral variability. This procedure was repeated for both tumor and normal biopsies for each patient.
Nucleic Acid Extraction
Punch biopsy specimens were deparaffinized and processed using the RecoverAll Total Nucleic Acid Isolation Kit (Ambion, TX) according to the manufacturer's instructions. Briefly, 200 μL of digestion buffer and 4 μL of protease were added to each sample and incubated overnight at 55° C. RNA and DNA were extracted following the RecoverAll protocol according to the manufacturer's recommendations. Nucleic acid quantification was performed using a NanoDrop 1000 Spectrophotometer and a Qubit® Fluorometer. Nucleic acid extracts were stored at −80° C. until further analysis.
RNA Sequencing
1. Library Construction: RNA integrity and quantity were evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, CA). Reverse-stranded paired-end 75 base-pair sequencing libraries were constructed using Illumina Total RNA Stranded Kits. Ribosomal RNAs (rRNAs) were depleted by using the Ribo-Zero rRNA Removal Kit (Illumina). Libraries were sequenced on a HiSEQ2500 machine using standard reagents and protocols provided by Illumina. In total, 95 metastatic samples were successfully sequenced using this approach.
2. Read Alignment and Quantification: Unless otherwise specified, all data analyses were performed under the R programming and software environment for statistical computing and graphics version 3.3 (R Core Team, 2016). FastQ files for each sample were assessed for quality using the FastQC tool (version 0.11.2). Raw reads were aligned to the GRCh38 primary genome assembly using Spliced Transcripts Alignment to a Reference (STAR) aligner (version 2.4.2a) 1-pass algorithm2. After sorting the bam files in lexicographical order with the sambamba program3, the inventors assigned the reads to exon features annotated in GENCODE (release 22) using the FeatureCounts tool from the subread package (version 1.4.6) and summarized the read counts by genes4. The post-alignment quality control was carried out with Picard tools (version 1.117) and RSeQC package (version 2.3.1). Specifically, the inventors examined the QC data regarding the alignment summary, gene body coverage, read distribution, and ribosomal RNA depletion rate.
3. Data Normalization: The inventors used functions in the R/Bioconductor package edgeR to extract the raw counts of the reads that were mapped to the protein-coding genes5. After removing the genes with zero read counts across all samples, the inventors calculated the normalization factors to scale the raw library sizes and the log 2-transformed count per million (log-CPM) for the expression level of each gene. The log-CPM values were corrected for batch effect (sequencing lane effect and institution) using removeBatchEffect function from the R/Bioconductor package limma6. 18,714 genes were retained for the subsequent analyses.
4. Detection of Differentially Expressed mRNAs: To identify differentially expressed mRNAs among samples grouped by Similarity Network Fusion (SNF—see Example 7, Similarity Network Fusion) clusters, the inventors first removed non/low-expressed genes in comparison groups by requiring read counts to be at least 1 across a minimum number of samples in one of the comparison groups, followed by trimmed mean of M-values (TMM) normalization using the calcNormFactors function in the edgeR package. Next, the inventors removed heteroscedascity from the count data using the voomWithQualityWeights function from the limma package with quantile normalization method enabled. The inventors then fit a linear model for each gene using the limma algorithm, adjusted for batch effect, and ranked the genes for differential expression using the empirical Bayes method with trend and robust options enabled. The differentially expressed genes were identified with the Benjamini-Hochberg procedure for multiple test adjustment and fold-change. The adjusted P-value threshold and fold-change threshold were set at 0.05 and 2.0, respectively (Tables 3A-C).
microRNA Expression Profiling
RNA integrity and quantity were evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, CA). Total RNA (500 ng) was processed for biotin labeling according to the Affymetrix Flash Tag Biotin HSR RNA labeling guide (Affymetrix, CA). The biotin-labeled target was hybridized to Affymetrix miRNA 4.0 Array Chips for 16h at 48° C. and 60 rpm in an Affymetrix 640 hybridization oven. Arrays were washed and stained in an Affymetrix Fluidics Station 450 according to the Affymetrix GeneChip expression guide. The arrays were scanned using the Affymetrix GeneChip Scanner 3000 7G. CEL intensity files were generated using GCOS software. In total, 116 metastatic samples were successfully assayed using this approach.
1. Data Pre-Processing and Normalization: The methods used in this analysis are available as part of the R/Bioconductor packages affy, oligo, limma and sva6,7. The raw Affymetrix GeneChip miRNA 4.0 Array CEL files were imported to R using the read.celfiles function from the oligo package. The inventors first performed robust normexp-by-control background correction using the nec function from the limma package with the robust option enabled8. They then normalized the log 2-transformed expression data using cyclic loess normalization with the array weight method. Finally, the inventors summarized the probes into probesets using the rma function from the affy package with the options normalize and background disabled. To remove batch effects caused by array processing dates and the patient cohorts, they applied the ComBat algorithm implemented in the sva package9. Two batch factors: (1) institution and (2) microarray scan date were considered. A single sample was run in batch 6 and combined with samples from batch 5. The inventors removed non/low-expressed probesets and retained the probesets representing 778 mature human miRNAs for the subsequent analyses.
2. Detection of Differentially Expressed miRNAs
The inventors applied the limma method to identify differentially expressed miRNAs among the samples grouped by SNF clusters. They first estimated the relative quality weights for each array using the arrayWeightsSimple function, and then fit a linear model for each probeset adjusted for batch effect, followed by ranking probesets for differential expression using empirical Bayes method. The differentially expressed miRNAs were identified with the Benjamini-Hochberg procedure for multiple test adjustment and fold-change. The adjusted P-value threshold and fold-change threshold were set at 0.05 and 2.0, respectively (Tables 4A-C).
Consensus Clustering of Expression Data
Unsupervised consensus clustering analysis was performed on independent mRNA and miRNA expression data sets using the R package ConsensusClusterPlus (version 1.38.0). The inventors selected the most informative mRNAs or miRNAs for clustering, which consisted of the top 25% most variable mRNAs or miRNAs, as measured by the median absolute deviation (MAD). Normalized expression data from previous procedures were first standardized using the data normalization function in the R package clusterSim (version 0.45-1). To run ConsensusClusterPlus, the inventors preset the options as a maximum evaluated cluster k=66, 80% samples per resampling, 1,000 resamplings, Euclidean distance, and k-means clustering algorithm. They chose complete and average linkage as the inner-linkage and final linkage, respectively. The optimal number of k clusters was inferred by inspecting the consensus cumulative distribution function (CDF) plot and the proportion of ambiguously clustered pairs (PAC) plot where the optimal k corresponds to the lowest PAC10 (k=2,
Consensus Molecular Subtyping (CMS) of Colorectal Liver Metastases.
Microarray expression data derived from 183 patients with colorectal liver metastasis were collected from ArrayExpress (study IDs: E-MTAB-1951, E-GEOD-62322, E-GEOD-41258, and E-GEOD-35834). Study E-MTAB-1951 contains 96 samples profiled on the Illumina HumanHT-12 v3.0 Expression BeadChip. E-GEOD-62322 and E-GEOD-41258 contain 19 and 47 samples that were profiled on Affymetrix HG-U133A Arrays, respectively. E-GEOD-35834 consists of 27 samples profiled on the Affymetrix Human Exon 1.0 ST Array. The inventors also used two sets of normalized RNA Sequencing data. One cohort includes 93 metastases from our cohort which were reanalyzed with RSEM to assess TPM abundances, while the other cohort contains 45 liver metastases that were obtained from the Memorial Sloan-Kettering Cancer Center and processed similar to previously described methods in Example 7 (RNA sequencing). For E-MTAB-1951, raw expression data was preprocessed with variance stabilizing transformation and quantile normalization using the lumi package (version 2.26.3). For the remaining microarray studies, CEL files were downloaded directly from ArrayExpress and processed with fRMA (version 1.28.0) for core annotation targets summarized by robust weighted average. Level 3 TCGA READ and COAD RNA Sequencing RSEM expression data was obtained from Sage-Bionetworks Synapse repository (syn: syn2320098, syn2320092, syn2320147, and syn2320079). TPM expression data corresponding to primary tumor samples were selected, offset by 1, and log 2 transformed. Multiple gene level mappings were resolved by singular value decomposition. Datasets from both tissues were merged, and a custom ComBat correction was performed to account for batch effects between HiSeq-RNASeqV2 and Illumina-GA platforms. All scripting and normalization methods are available for download via the CRC Subtyping Consortium's github including the merging protocol (https://github. com/S age-Bionetworks/crcsc/blob/dc58542555e281c1ccb55aeb73 d087e7d0bdf6bf/groups/G/dataQc/tcgaCrcRNAseq-merged.R) and miscellaneous normalization procedures (https://github.com/Sage-Bionetworks/crcsc/blob/dc58542555e281c1ccb55aeb73 d087e7d0bdf6bf/groups/G/dataQcaGnorm.R).
For both microarray and RNA Sequencing expression data, features were mapped to corresponding Entrez gene IDs using annotation sets provided by Ensembl GRCH38 and Bioconductor including hgu133a.db (version 3.2.3), huex10sttranscriptcluster.db (version 8.6.0), lumiHumanIDMapping (version 1.10.1), or org.Hs.eg.db (version 3.3.0). For multiple annotations mapping to a unique gene feature, either the median probeset value or the largest coefficient of variation across RNAseq samples was retained as an expression estimate for the corresponding gene feature. CMS classification was performed using the single sample procedure (SPP) (https://github.com/Sage-Bionetworks/CMSclassifier).
Similarity Network Fusion
The matched normalized mRNA and miRNA expression data of 93 metastases were first separately standardized using the standardNormalization function from the R package SNFtool (version 2.2). The Euclidean distances between all pairs of samples in mRNA and miRNA data were calculated, respectively. An affinity matrix was computed using the function affinityMatrix with the number of nearest neighbors K and the variance for local model alpha. The inventors then performed similarity network fusion on affinity matrices of mRNA and miRNA with the number of iterations T, which was used in the subsequent spectral clustering step where samples were assigned to one of the SNF clusters. Three clusters were identified using default settings. In order to find other possible compositions of three clusters, the inventors tested 168 parameter combinations of K (10, 15, 20, 25, 30, 35, 40), alpha (0.3, 0.4, 0.5, 0.6, 0.7, 0.8), and T (20, 30, 40, 50). For each parameter setting, they applied the estimateNumberOfClustersGivenGraph function to estimate the possible number of clusters using two heuristic methods: (1) eigen gap and (2) rotation cost. The inventors retained the clustering results which comprised three clusters and calculated the median Silhouette index (SI) of each result. The top 8 clustering results that had the highest median SIs were selected (
Robustness of SNF Clustering on Overall Survival Previous work has shown that the SNF algorithm for clustering is statistically robust11. The inventors examined whether the observed survival difference between SNF clusters could be reproduced by random chance. To this end, they performed permutation analyses. For each permutation, miRNA profiles were shuffled and randomly assigned to mRNA profiles. Subsequently, SNF clustering was performed de novo and each patient was assigned to one of three resulting groups. Differential overall survival across clusters was then assessed with a log-rank test. This process was repeated 1,000 times, and log-rank p-values were used to construct a null distribution. The inventors examined the number of instances when p-values from the null distribution were more extreme (i.e. smaller) than the empirical p-value (
Ensemble of Gene Set Enrichment Analyses (EGSEA).
Raw gene feature counts were mapped to Entrez ID using the R/Bioconductor package org.Hs.eg.db v3.4.05. Low/non-expressed genes with less than 1 CPM across the minimum number of samples in any SNF group were excluded from subsequent analysis using edgeR v3.16.5. Quality weighted, quantile, and log-normalized CPM were calculated using limma-voom v3.30.11. Gene set enrichment was performed using the R/Bioconductor package EGSEA v1.2.012 with planned contrasts of each SNF group against the average of the remaining groups. Independent EGSEA analyses were performed for gene lists provided by MSigDB v5.213 (Tables 5A-C) and a custom gene list identifying numerous immunological, canonical, and metabolic pathways14 (Tables 6A-C). Intratumoral immunome profiling was performed as previously described15, and resulting gene lists were used to calculate SNF-level and single sample enrichment scores using EGSEA.
SNF Class Predictor
Data Preprocessing: To build a classifier to distinguish samples between SNF cluster 2 (C2) and SNF cluster 1 and 3 combined (C13), the normalized mRNA expression data of 93 patients was split into a training set, consisting of 20 SNF cluster 2 samples and 51 cluster 1 and 3 samples, and a test set, consisting of 6 cluster 2 samples and 16 cluster 1 and 3 samples. The class ratio remained unchanged during the partition. For the training set, the inventors first filtered genes with near zero-variance. They then identified highly correlated genes with a pair-wise absolute correlation coefficient greater than 0.7, and removed those with the largest mean absolute correlation. They further removed potential linear dependencies of the data using the findLinearCombos function from the R package Caret (version 6.0). They applied the preProcess function to center and scale the training and test data by mean and standard deviation, followed by rescaling data to −1 and 1.
Model Training and Testing: The inventors applied Prediction Analysis of Microarrays (PAMR, version 1.55)—a nearest shrunken centroid classification algorithm—on the training set16. A 10-fold cross-validation was performed to obtain the optimal threshold of 2.72 for the prediction, where the overall error rate was 0.056. The final classification model contains 113 genes (Tables 10A-C) and was evaluated using the held-out test data of 6 SNF cluster 2 samples and 16 SNF cluster 1 and 3 samples. Performance metrics such as accuracy, balanced accuracy, sensitivity, specificity, positive prediction value (PPV), negative prediction value (NPV), Cohen's Kappa, Matthew's correlation coefficient, and area under the curve (AUC) were calculated using the confusionMatrix function from the Caret package and an in-house script (
Independent Validation of Classifier: The inventors downloaded the raw expression data of 96 patients from ArrayExpress (study ID: E-MTAB-1951). They prioritized the analysis of the E-MTAB-1951 samples as it is the only publicly available colorectal cancer liver metastasis dataset with available clinical annotations (i.e. Clinical Risk Scores (CRS)) to test for association with SNF membership grouping. The samples were profiled using the Illumina HumanHT-12 v3.0 Expression BeadChip. Using the R/Bioconductor package lumi (version 2.26.4)17, they transformed the expression data via variance-stabilizing transformation (VST) algorithm, followed by between-chip normalization with the robust spline normalization (RSN) algorithm. For multiple probes that mapped to the same Ensembl gene ID, they removed those with the smallest variance across samples. The inventors re-trained the PAM classifier on all 93 samples in our cohort using the 113 genes selected from the previous analysis and applied it to the normalized E-MTAB-1951 microarray data set. For genes that were missing in the microarray data, they replaced the expression values with −1 after scaling the data to −1 and 1. The concordance between the predicted SNF clustering memberships and the Clinical Risk Scores (CRS) from the E-MTAB-1951 samples was examined using contingency analysis (
Hybrid Capture Next Generation Sequencing
Targeted Capture Sequencing Panel: For each specimen, DNA from 1,212 exonic regions was captured using the UCM-OncoPlus panel based on the NimbleGen SeqCap EZ custom capture method as previously described18. In brief, this approach utilizes a tiered assay system in which highly clinically relevant genes (tier 1, n=316) are sequenced approximately 3-fold deeper than the remaining (tier 2) genes. Capture libraries were generated using the Illumina TruSeq platform. Libraries were multiplexed with 6 base-paired indexes up to 9 samples per lane and sequenced using Illumina HiSeq2000 and HiSeq2500 machines. FastQ files were generated using Illumina's BCL2FastQ1.8.4.
Sequencing Data Alignment: FastQ files were quality trimmed using cutadapt v1.9.1 (http://cutadapt.readthedocs.io/en/stable/guide.html) for Phred score quality on 3′ end Q>=3018 and a minimum length of 19 after trimming (bwa-mem recommended minimum read size). Remaining reads were aligned using the bwa-mem algorithm v0.7.8 (http://bio-bwa.sourceforge.net) against the hg19 reference. PCR duplicates were removed by Broad Institute Picard tools v1.128 MarkDuplicates (https://github.com/broadinstitute/picard). Bedtools v2.22.1 (http://bedtools.readthedocs.io/en/latest/) was used to ascertain coverage at tier 1 and tier 2 loci. Samples that did not have a mean 300× depth of coverage at tier 1 genes were excluded from subsequent analyses. In targeted-capture sequencing, oxidative damage can be pervasive and lead to false positive variant calls at sites with sequence context CCG being read as CAG19. Sample-level oxidative damage was calculated using Picard CollectOxoGMetrics. Sample with ArtQ19 scores less than 21 were removed. Overall, 59 unique metastasis-normal pairs were available for analysis.
Example alignment pipeline flow:
Variant Calling and Filtering: Single nucleotide variants (SNVs) were called using MuTect v1.1.7 (http://archive.broadinstitute.org/cancer/cga/mutect). Insertions and deletions (indels) were called using scalpel-discovery 0.5.3 (http://scalpel.sourceforge.net/). Calls not annotated as “PASS” or “KEPT” were removed. For both SNVs and indels, only calls falling within genomic coordinates targeted by the capture panel were retained for subsequent analyses. Targeted capture libraries have been shown to be susceptible to oxidative damage. Even samples that do not have pervasive oxidative damage can have false positive calls attributable to this phenomenon19. All SNV calls were assigned a FoxoG score using metalfox (https://github.com/cpwardell/bin/blob/master/metalfox.py). Based on previously reported studies19, calls without a MuTect tumor_lod greater than −10+(100/3) * FoxoG were removed as they were likely a consequence of oxidative damage. All variants were annotated using snpeff v3.6c (http://snpeff.sourceforge.net/), hg19 reference. Only variants that exist within coding regions or disrupted splice sites were included in analyses. Calls with a variant allele frequency (VAF)<5%, position coverage<30, or an allele frequency>=0.01 in ExAC were removed. To further improve the quality of indel calls, two additional filters were implemented: (1) Dustmasker (https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/dustmasker/) was used to identify low complexity genomic regions, and indels falling within these regions were discarded; (2) A pseudo-panel of normal samples was constructed, such that across the matched normal samples, all putative indel calls that failed Scalpel filters due to ‘HighVafNormal’ or ‘HighAltCountNormal’ were aggregated. All indels that failed in two or more samples from unique patients were filtered. These methods helped to eliminate remaining noisy calls which passed previous filtering steps.
Example variant calling workflow:
Mutation Significance (MutSig) Analysis: VCFs were annotated and converted to a MAF format using Oncotator20. MAF files for all patients were merged and assessed for significant gene-centric mutation frequency using MutSigCV version 2 with default coverage and covariate tables provided by the Broad Institute21. Mutation Assessor22 and ClinVar23 were used to predict the functional impact of protein-coding mutations.
Copy Number Variation Analysis: Copy number calling was carried out using CNVKit v0.7.12.dev024. All 59 matched-normal samples were used to calculate the pooled reference baseline using default parameters. Segmented log 2 ratios were used to call copy number gains and losses.
Identification of Prognostic Mutations: Multivariate Cox proportional hazard ratios were generated for each mutated gene feature as a binary factor across 59 liver metastasis-matched paired normal samples using the survival v2.40-1 R package. SNF subtype and Clinical Risk Score (CRS) were included as covariates in multivariate analyses. Ten-year overall survival was chosen as the primary endpoint of the analysis.
Microsatellite Instability (MSI) Analysis
H&E slides of normal and tumor specimens were reviewed by a molecular pathologist (Dr. Nora Joseph). Tumor sections with greater than 30% tumor percentage were used for DNA extraction by the Pinpoint Slide DNA Isolation System (Zymo Research). DNA was subsequently purified by using the Zymo-Spin I Column protocol. All samples were run on the Promega MSI 1.2 assay according to the FDA approved protocol and result interpretation. MSI testing was performed on 93 metastases with corresponding SNF subtypes of which 89 samples were successfully assayed. Four samples failed repeated testing.
Immunohistochemical Analysis
CRC liver metastases were preserved in formalin and embedded in paraffin. 5 μm tissue sections were created from paraffin blocks and mounted on glass slides. The slides were stained on Leica Bond RX Automatic Stainer using HTRC Bond Refine DAB protocol. After antigen retrieval treatment (epitope retrieval solution II, AR9640, Leica Biosystems) for 20 minutes, anti-human CD3 (DAKO, Cat#M7254, Clone: F7.2.38, mouse IgG) antibody (1:600) was applied on tissue sections for 25 minutes incubation. For CD8 staining, anti-human CD8 (DAKO, Cat#M7103, Clone: C8/144B, mouse IgG) antibody (1:400) was applied. The antigen-antibody binding was detected with Bond polymer refine detection (Leica Biosystems, DS9800). A coverslip was applied to the tissue sections. For Masson's trichrome staining, tissue sections were deparaffinized using heated Bouin's solution and then stained with Weigert's iron hematoxylin and Biebrich scarlet solutions. The tissue sections were then treated with phosphotungstic-phosphomolybdic acid and immediately stained with aniline blue solution. The tissue sections were rinsed and a coverslip was applied.
Tables 3A-C. Differentially expressed genes across SNF clusters in 93 metastatic RNA Sequencing samples identified by the limma-voom method. (A) Differentially expressed genes (DEGs) between SNF1 versus SNF2 and 3. (B) DEGs between SNF2 versus SNF1 and 3. (C) DEGs between SNF3 versus SNF1 and 2. Log 2FC: estimate of the log 2 fold-change corresponding to the contrast. Adj.P.Val: Benjamini-Hochberg corrected P-value. Cutoff values for DEGs are log 2FC=±1 and adj.P.Val≤0.05.
Tables 4A-C: Differentially expressed miRNAs across SNF clusters in 93 metastatic miRNA samples identified by the limma method. (A) Differentially expressed miRNAs (DEMs) between SNF1 versus SNF2 and 3. (B) DEMs between SNF2 versus SNF1 and 3. (C) DEMs between SNF3 versus SNF1 and 2. Log 2FC: estimate of the log 2 fold-change corresponding to the contrast. Adj.P.Val: Benjamini-Hochberg corrected P-value. Cutoff values for DEGs are log 2FC=±1 and adj.P.Val≤0.05.
Tables 5A-C: Ensemble of gene set enrichment analyses for hallmark mSigDB pathway signatures. Pathway enrichment or depletion (i.e., direction) was determined for each SNF cluster against the others (e.g., (SNF1−(SNF2+SNF3)/2)). The Hallmark Signature gene list was retrieved from Broad Institute's mSigDB. Twelve gene set enrichment algorithms (including GSVA, GAGE, PADOG, etc.) were used for analyses, and run independently for each set of gene lists. Results for SNF1 are set forth in Table 5A; results for SNF2 are test forth in Table 5B; and results for SNF3 are set forth in Table 5C. Raw P-values for a given pathway were combined across algorithms using Fisher's method and adjusted for multiple testing corrections by Bonferroni's method. Log 2 transformed fold-change (Log 2FC) was averaged in a similar fashion. A collective significance score proportional to combined P-values and average Log 2FC was generated and scaled from 0-100 to assess the degree of pathway enrichment or depletion relative to the inclusive set.
Table 6: Ensemble of gene set enrichment analyses for custom colorectal cancer pathways. Pathway enrichment or depletion (i.e., direction) was determined for each SNF against the others (e.g., (SNF1−(SNF2+SNF3)/2)). A compilation of pathways associated immunology, metabolism, canonical pathways, cancer signatures, and stromal infiltration estimates were retrieved from14. Twelve gene set enrichment algorithms (including GSVA, GAGE, PADOG, etc.) were used for analyses, and run independently for each set of gene lists. Raw P-values for a given pathway were combined across algorithms using Fisher's method and adjusted for multiple testing corrections by Bonferroni's method. Log 2 transformed fold-change (Log 2FC) was averaged in a similar fashion. A collective significance score proportional to combined p-values and average Log 2FC was generated and scaled from 0-100 to assess the degree of pathway enrichment or depletion relative to the inclusive set.
Table 7: Immune genes over-expressed in SNF2 metastases. Immune genes were extracted from the Hallmark signatures ‘inflammatory response’, ‘interferon alpha response’, and ‘interferon gamma response’, in addition to the custom gene sets ‘immune estimate’, ‘immune msc’, ‘immune response’, and ‘immune Th1’. Shown are differentially expressed genes in the comparison of SNF2 metastases to SNF1 and 3 metastases. Fold-change denotes ratio of SNF2 vs. SNF1+SNF3. P-value corrected for multiple comparisons using the Benjamini-Hochberg method.
Table 8: Significantly mutated genes determined by MutSigCV. All variants that passed validation criteria in coding regions were categorized and tabulated to create an overall mutation type summary for each gene. n_syn=number of synonymous mutations; n_mis=number of missense mutations; n_lof=number of loss-of-function mutations; n_splice=number of splice junction mutations; n_indels_mis=number of inserts/deletions causing missense mutations; n_indels_lof=number of insertions/deletions causing loss-of-function mutations; num_unique=number of unique instances of a point mutation seen. MutSigCV v1.2 determined the probability of base level mutations within specific gene-level contexts given overall mutation rate, ratio of synonymous to non-synonymous mutation types, and other gene-levels factors including estimates of expression, replication rate, and chromatin state21. Raw P-values indicate the probability that the number of somatic mutations found within each gene is observed by chance with multiple testing corrections controlled by false discovery rate (FDR, q-value).
Table 9: Genomic alterations unique to each SNF subtype. Differentially enriched mutations and gene-level copy number variations are presented. Analysis of gene-level copy number variations was performed for those genes identified by TCGA in primary colorectal cancers25. Overall, analyses were performed for genomic aberrations with at least 20% frequency in at least one SNF subtype. Statistical significance was determined using Fisher's exact tests between each SNF group versus the remaining two SNF groups.
Tables 10A-C: Table 10A lists genes whose expression is analyzed in a classification model for identifying SNF2 metastases. The difference in gene expression (“Log 2FC” column) between SNF2 metastases as compared to SNF1 and SNF3 metastases is shown, along with the function or pathway associated with each gene. Table 10B lists the genes in Table 10A that are expressed at a significantly higher level in SNF2 metastases than in SNF1 and SNF3 metastases. Table 10C lists the genes in Table 10A that are expressed at a significantly lower level in SNF2 metastases than in SNF1 and SNF3 metastases.
Tables 11A-C: Table 11A lists miRNAs whose expression is analyzed in a classification model for identifying SNF2 metastases. The difference in miRNA expression (“Log 2FC” column) between SNF2 metastases as compared to SNF1 and SNF3 metastases is shown. Table 11B lists the miRNAs in Table 11A that are expressed at a significantly higher level in SNF2 metastases than in SNF1 and SNF3 metastases. Table 11C lists the miRNAs in Table 11A that are expressed at a significantly lower levels in SNF2 metastases than in SNF1 and SNF3 metastases.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 62/659,936 filed Apr. 19, 2018, which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US19/28071 | 4/18/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62659936 | Apr 2018 | US |