The present invention is in the field of low grade glioma (LGG).
Gliomas is the most common primary central nervous system (CNS) malignant tumor, accounting for ˜80% of all CNS malignancies.1 According to the 2007 WHO classification, gliomas were categorized into grades 1-4.2 The 2021 WHO classification3 introduced a paradigm shift in the classification of CNS tumors combining histopathologic and genotypic features4 to reveal an “integrated” diagnosis. Factors affecting overall survival (OS) include age >40 years, astrocytic subtype, tumor maximum diameter >6 cm, tumors crossing the midline, and the patient's degree of neurological impairment, Karnofsky performance score, multiple lesions, IDH-mutant status, lpl9q status, TERT mutation status, and ATRX mutation status.5-8 Moreover, lower-grade gliomas (LGGs) are highly heterogeneous both at histopathological and molecular levels,4,9 resulting in significant variability in clinical outcomes.9,10 Therefore, to personalize care and treatment of LGG patients, accurate and robust patient stratification, which is significantly associated with clinical outcomes, is mandatory.
Cellular morphometric properties play key roles in cancer diagnosis and prognosis together with important molecular factors. Recently, deep neural networks (e.g., convolutional neural network [CNN]) have been successfully applied in several glioma-related studies.11-13 However, the quantitative profiling and molecular association of the cellular morphometric landscape from whole-slide images (WSIs) remain inadequately investigated due to both technical and conceptual limitations.
The present invention provides for a method for determining a Lower Grade Glioma (LGG) subtype for a subject. The present invention also provides for a device for determining an LGG subtype in a subject. The present invention also provides for a system using machine learning for determining a Lower Grade Glioma (LGG) subtype in a subject.
The present invention provides for a method for determining and treating a Lower Grade Glioma (LGG) subtype for a subject, the method comprising: (a) obtaining a tissue sample from a subject suffering from LGG, (b) determining a cellular morphometric subtype (CMS) and/or cellular morphometric biomarkers (CMBs) of the tissue sample, (c) identifying a LGG subtype of the subject as LGG subtype 1 or LGG subtype 2, and (d) treating the subject wherein (i) when the subject is LGG subtype 1, the subject is not treated with immunotherapy, and (ii) when the subject is LGG subtype 2, the subject is treated with immunotherapy.
In some embodiments, the sample comprises a tumor cell, T cell, B cell, and macrophages, and the like.
Cellular morphometric biomarkers (CMBs) are identified with artificial intelligence technique. Consensus clustering is used to define CMS. Survival analysis is performed to assess the clinical impact of CMBs and CMS. A nomogram is constructed to predict 3- and 5-year overall survival (OS) of LGG patients. Tumor mutational burden (TMB) and immune cell infiltration between subtypes are analyzed using the Mann-Whitney U test. In some embodiments, the CMB is extracted from a machine learning (ML) pipeline from whole-slide images of tissue histology, wherein the ML pipeline identifies and externally validates robust CMS of LGGs. In some embodiments, the method uses a framework (CMS-ML) for CMS discovery in LGG associated with specific molecular alterations, immune microenvironment, prognosis, and treatment response. In some embodiments, the cellular morphometric descriptors used are described in Table 1. The differentially expressed genes between Subtype 2 and Subtype 1 patients is described in Table 2.
Further detailed is described in Liu et al. “Clinical significance and molecular annotation of cellular morphometric subtypes in lower-grade gliomas discovered by machine learning,” Neuro. Oncol. 25(5):68-81, 2023, including the corresponding Supplementary Material; all of which is hereby incorporated by reference).
In some embodiments, the method for determining a Lower Grade Glioma (LGG) subtype for a subject comprises: (a) segment nuclear regions of a sample obtained from a subject; (b) measure the corresponding morphometric properties of the sample, such as the corresponding morphometric properties from hematoxylin and/or eosin (H&E) stained whole slide images of tissue histology of the sample; (c) apply pre-identified cellular morphometric biomarkers with prebuilt stacked predictive sparse decomposition (SPSD) model to construct cellular-level and patient-level morphometric representation; and (d) apply pre-built subtype model to stratify LGG subjects into subclasses.
In some embodiments, the LGG subjects are identified as LGG subtype 1 is treated with radiation therapy and/or chemotherapy. In some embodiments, the LGG subjects are identified as LGG subtype 2 is treated with radiation therapy and/or chemotherapy, and immunotherapy (anti-PD-1, anti-PD-L1, and/or anti-CTLA-4 immunotherapy).
Methods or means of administering anti-PD-1, anti-PD-L1, and/or anti-CTLA-4 immunotherapy is described by Wojtukiewicz et al., “Inhibitors of immune checkpoints—PD-1, PD-L1, CTLA-4—new opportunities for cancer patients and a new challenge for internists and general practitioners,”Cancer Metastasis Rev. 40(3):949-982 (2021); Ghouzlani et al., “Immune Checkpoint Inhibitors in Human Glioma Microenvironment,”Front. Immunol. 12: Article 679425, 2021; and, Deshmukh, “CTLA-4 and PD-L1 or PD-1 Pathways: Immune Checkpoint Inhibitors and Cancer Immunotherapy,”J. Cancer Immunol. 2(1):10-12, 2020; which are all herein incorporated by reference.
In some embodiments, the device for determining a Lower Grade Glioma (LGG) subtype in a subject comprises: (a) a means to obtain nuclear segmentation from whole slide image of tissue histology; (b) representation learning from cellular morphometric properties derived from segmented nuclei; and (c) subtype identification based on patient-level morphometric context representation.
Other objects, features, and advantages of the present invention will be apparent to one of skill in the art from the following detailed description and figures.
The foregoing aspects and others will be readily appreciated by the skilled artisan from the following description of illustrative embodiments when read in conjunction with the accompanying drawings.
Before the invention is described in detail, it is to be understood that, unless otherwise indicated, this invention is not limited to particular sequences, expression vectors, enzymes, host microorganisms, or processes, as such may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting.
As used in the specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to an “expression vector” includes a single expression vector as well as a plurality of expression vectors, either the same (e.g., the same operon) or different; reference to “cell” includes a single cell as well as a plurality of cells; and the like.
In this specification and in the claims that follow, reference will be made to a number of terms that shall be defined to have the following meanings:
The terms “optional” or “optionally” as used herein mean that the subsequently described feature or structure may or may not be present, or that the subsequently described event or circumstance may or may not occur, and that the description includes instances where a particular feature or structure is present and instances where the feature or structure is absent, or instances where the event or circumstance occurs and instances where it does not.
The term “about” as used herein means a value that includes 10% less and 10% more than the value referred to.
It is to be understood that, while the invention has been described in conjunction with the preferred specific embodiments thereof, the foregoing description is intended to illustrate and not limit the scope of the invention. Other aspects, advantages, and modifications within the scope of the invention will be apparent to those skilled in the art to which the invention pertains.
All patents, patent applications, and publications mentioned herein are hereby incorporated by reference in their entireties.
The invention having been described, the following examples are offered to illustrate the subject invention by way of illustration, not by way of limitation.
Lower-grade gliomas (LGG) are heterogeneous diseases by clinical, histological, and molecular criteria. We aimed to personalize the diagnosis and therapy of LGG patients by developing and validating robust cellular morphometric subtypes (CMS) and to uncover the molecular signatures underlying these subtypes.
Cellular morphometric biomarkers (CMBs) were identified with artificial intelligence technique from TCGA-LGG cohort. Consensus clustering was used to define CMS. Survival analysis was performed to assess the clinical impact of CMBs and CMS. A nomogram was constructed to predict 3- and 5-year overall survival (OS) of LGG patients. Tumor mutational burden (TMB) and immune cell infiltration between subtypes were analyzed using the Mann-Whitney U test. The double-blinded validation for important immunotherapy-related biomarkers was executed using immunohistochemistry (IHC).
We developed a machine learning (ML) pipeline to extract CMBs from whole-slide images of tissue histology; identifying and externally validating robust CMS of LGGs in multicenter cohorts. The subtypes had independent predicted OS across all three independent cohorts. In the TCGA-LGG cohort, patients within the poor-prognosis subtype responded poorly to primary and follow-up therapies. LGGs within the poor-prognosis subtype were characterized by high mutational burden, high frequencies of copy number alterations, and high levels of tumor-infiltrating lymphocytes and immune checkpoint genes. Higher levels of PD-1/PD-L1/CTLA-4 were confirmed by IHC staining. In addition, the subtypes learned from LGG demonstrate translational impact on glioblastoma (GBM).
We developed and validated a framework (CMS-ML) for CMS discovery in LGG associated with specific molecular alterations, immune microenvironment, prognosis, and treatment response.
LGGs are highly heterogeneous both at the histopathological and molecular level reflected in significant variability in clinical outcomes. Therefore, to personalize care and treatment of LGG patients, accurate and robust patient stratification, which is significantly associated with clinical outcomes, is mandatory. In this study, we developed and multicentrically validated a framework (CMS-ML) for CMS discovery in LGG associated with specific molecular alterations, immune microenvironment, prognosis, and treatment response. And the subtypes learned from LGG demonstrate translational impact on glioblastoma. Our findings have potential clinical implications to facilitate precision diagnosis and personalized treatment of LGG patients. In addition, CMS-ML may provide potential clinical value across tumor types.
To capture the heterogeneous cytoarchitecture of gliomas, we developed a high-throughput and robust computational pipeline that quantifies tissue histology at the cellular level14 with applications to tumor classificationL and molecular association.16 In addition, we introduced stacked predictive sparse decomposition (SPSD)17 for mining underlying cellular morphometric properties within WSI. Here, we applied SPSD to LGG cohorts to discover clinically relevant cellular morphometric subtypes (CMSs) and evaluate the clinical impacts and molecular correlation of CMSs
Data Collection
The patient data in this retrospective study, including tissue histology diagnostic slides and the clinical information, were collected from TCGA-LGG cohort (Supplementary Table 1; Supplementary Tables and Figures can be found in U.S. Provisional Patent Application Ser. No. 63/369,982, filed Aug. 1, 2022, and Liu et al. “Clinical significance and molecular annotation of cellular morphometric subtypes in lower-grade gliomas discovered by machine learning,” Neuro. Oncol. 25(5):68-81, 2023; both of which are hereby incorporated by reference), Zhongnan Hospital of Wuhan University (ZN-LGG cohort, between January 2016 and May 2019, Supplementary Table 2), the Medical Center of Stanford University (SU-LGG cohort, between January 2013 and December 2014, Supplementary Table 3), TCGA-GBM cohort (Supplementary Table 4), and Zhongnan Hospital of Wuhan University (ZN-GBM cohort, between January 2016 and May 2019, Supplementary Table 5) to form the discovery cohort and multicenter validation cohorts. The inclusion criteria were primary LGG and GBM with diagnostic slides and OS information available. This study was approved by the institutional review board (IRB) of Zhongnan Hospital of Wuhan University, Stanford University, and Lawrence Berkeley National Laboratory, with a waiver of informed consent.
Treatment Response in TCGA-LGG Cohort
The treatment response in TCGA-LGG cohort was assessed using Response Evaluation Criteria in Solid Tumors (RECIST)18 as complete remission, partial remission, progressive disease, and stable disease. Here, we categorized patient response into Response (including complete/partial remission), and non-Response (including progressive/stable disease).
Identification of Cellular Morphometric Biomarkers
We developed an unsupervised machine learning pipeline based on SPSD17 for the discovery of underlying cellular morphometric characteristics from the 15 cellular morphometric features extracted from the WSIs of TCGA-LGG cohort (Supplementary Method 1). We then identified 256 cellular morphometric biomarkers (CMBs) for cellular object representation. Specifically, we used a single network layer with 256 dictionary elements (i.e., CMBs) and sparsity constraint 30 at a fixed random sampling rate of 1000 cellular objects per WSI from TCGA-LGG cohort (Supplementary
Clinical and Biological Evaluation of CMBs
We evaluated the prognostic impact of the top 30 CMBs with largest variations mined from TCGA-LGG cohort with Cox proportional hazards regression (CoxPH) model (survival package in R, Version 3.2-3), and examined the effects of high or low levels of each prognostic significant CMB on OS using Kaplan-Meier analysis (survminer package in R, Version 0.4.8) and log-rank test (survival package in R, Version 3.2-3), where TCGA-LGG cohort was divided into CMB-high and CMB-low groups per CMB (survminer package in R, Version 0.4.8). Meanwhile, we evaluated biological significance between these groups by assessing their relationship with factors available in TCGA-LGG cohort using the Mann-Whitney U test.
Construction of Patient-Level Cellular Morphometric Context Representation
The patient-level representation was constructed based on pre-identified 256 CMBs as an aggregation (i.e., max-pooling) of all the cellular sparse codes extracted via pre-built SPSD model from the cellular objects belonging to the same patients following these steps consecutively: (1) delineation of cellular architecture and extraction of cellular morphometric properties from WSIs of each patient; (2) construction of cellular sparse codes for the cellular objects belonging to each patient based on pre-identified 256 CMBs and pre-built SPSD model; (3) aggregation (i.e., max-pooling) of all cellular sparse codes belonging to the same patient to form the patient-level cellular morphometric representation; and (4) selection of the top 30 CMBs with the largest variations identified in TCGA-LGG cohort as the final patient-level cellular morphometric representation.
Identification and Application of CMS
The CMS was identified based on patient-level cellular morphometric context representation through consensus clustering19 (ConsensusClusterPlus R package, Version 1.50.0) with hierarchical clustering, Pearson's correlation, and 500 bootstrapping iterations; and the optimal number of subtypes was determined by the consistency of cluster assignment (consensus matrix) and the prognostic impact of subtypes. For a new patient, the subtype was assigned as follows: (1) construct patient-level cellular morphometric context representation with pre-built CMBs and SPSD model; (2) calculate the Pearson's distances between the new patient's representation and the mean representation of each pre-identified patient subtype; and (3) assign the new patient to its closest subtype yielding smallest Pearson's distance.
Clinical Evaluation and Validation of CMS
We evaluated and independently validated the clinical impact of pre-identified CMSs from TCGA-LGG cohort, ZN-LGG cohort, SU-LGG cohort, TCGA-GBM cohort, and ZN-GBM cohort, respectively. Refer to Supplementary Method 2 for details.
Differences in Gene Expression, Mutation Load, and Immune Microenvironment Between CMSs
We evaluated the differences in gene expression, mutation load, and immune microenvironment between CMSs. Refer to Supplementary Methods 3 for details.
Immunohistochemistry Staining
Immunohistochemistry (IHC) staining was carried out on 4-μm sections of formalin-fixed and paraffin-embedded tissues according to standard protocols (see Supplementary Method 4 for details).
Statistical Analysis
Refer to Supplementary Method 5 for details.
Study Design and Characteristics of Patient Cohorts
We used three retrospective LGG cohorts to evaluate and independently validate the prognostic impact of CMSs; and used two retrospective GBM cohorts to evaluate the generalizability and translational impact of LGG-driven CMSs in GBM (
Identification of CMBs Using Unsupervised Representation Learning
Our pipeline14 recognized and delineated over 400 million cellular objects from TCGA-LGG chort; over 25 million cellular objects from ZN-LGG cohort; over 10 million cellular objects from SU-LGG cohort; over 400 million cellular objects from TCGA-GBM cohort; and over 25 million cellular objects from ZN-GBM cohort, where each cellular object was represented with 15 morphometric properties (Supplementary
Next, we trained SPSD17 model based on pre-quantified cellular objects randomly selected from TCGA-LGG cohort to discover the CMBs (Supplementary
Clinical and Biological Evaluation of CMBs
We next evaluated the association of the 30 CMBs with respect to histological meanings, prognosis, and cancer biology. Our survival analysis revealed that 20 CMBs had significant prognostic impact (false discovery rate [FDR]<0.05), where 5 of them were prognostically favorable (hazard ratio [HR]<1) and 15 prognostically unfavorable (HR>1) (
Additionally, the TCGA-LGG patient cohort was divided into two groups based on each CMB. The Kaplan-Meier curves showed significant impact (P<0.01,
Identification and Validation of CMS
Consensus cluster analysis using 30 CMBs identified three CMSs from TCGA-LGG cohort with significantly differing prognosis (log-rank P<0.0001; Supplementary
Clinical Significance of CMSs
We examined the association between CMSs and clinical and tumor characteristics in TCGA-LGG cohort. Surprisingly, there was no significant association between CMSs and any clinical/molecular prognostic factors (including age, grade, histological type, IDH mutation status, 1p/19q codeletion, MGMT promoter status, TERT promoter status, and ATRX status) (Supplementary Table 1). This finding was confirmed in both validation cohorts (Supplementary Tables 2 and 3).
In the TCGA-LGG cohort where genetic alteration burden information was available, Maftool analysis showed significantly higher TMB (P=0.003) and focal SCNA score (P=0.012) in subtype 2 patients (
Kaplan-Meier analysis showed significantly shorter OS of subtype 2 than subtype 1 patients (P=0.001,
Importantly, the double-blind deployment of the pre-built CMS model on both validation cohorts with independent survival analysis confirmed the significantly worse OS of subtype 2 patients (P=0.027 in ZN-LGG, P=0.005 in SU-LGG,
Interestingly, the direct translation of the pre-built CMS model on TCGA-GBM and ZN-GBM cohorts confirmed the clinical impact of CMS learned from LGG on GBM patients (Supplementary
Lastly, we performed pooled analysis combing all LGG and GBM patients into Pooled-LGG (595 patients) and Pooled-GBM (457 patients) cohorts, respectively. The pooled analysis confirmed (1) the significantly distinct stratification of patients (Pooled-LGG: P=0.001, Supplementary
Molecular Annotation Underlying CMSs
To gain insight into molecular differences underlying CMSs, we used available transcriptome data from TCGA-LGG and identified 316 differentially expressed genes (DEGs) between CMSs (|log2FC|>1, P<0.001, Supplementary
Association of CMSs with Tumor Immune Microenvironment
Based on the molecular annotation of DEGs between CMSs, we investigated their association with the immune microenvironments. Subtype 2 (
To explore the possibility of immune escape in subtype 2 LGG patients, we examined expression levels of immune suppression molecules CTLA-4, PD-1, the ligand of PD-1 (i.e., PD-L1), HAVCR2, LGALS9, CD86, LAG3, PDCD1LG2, CD28, CD96, CD80, and IDO1. In TCGA-LGG (
In this study, we extracted CMBs from WSIs of LGG patients through unsupervised learning strategy and subsequently defined two CMSs. Different from classical biomarkers, the CMBs act as imaging biomarkers capturing the heterogeneity in cellular properties and their microenvironments, which could be further explored as a future direction. The robustness of CMSs was demonstrated in two independent LGG cohorts. Interestingly, although a minority of GBM arises through the progression from LGG, the relevance of CMSs from LGG was shown to have prognostic value in GBM in two independent GBM cohorts, possibly related to common tumor microenvironments between LGG and GBM captured in CMSs. Although the HR of CMS was not as large as the HRs of well-known prognostic factors in gliomas (e.g., grade, IDH mutation status), the importance of CMSs lies in its independent prognostic significance after adjusting for other clinical and molecular factors; the relation to immunosuppressive tumor microenvironments; the association with treatment response; and the relation to underlying molecular and phenotypic alterations.
Different from many CNN-like systems, which mainly focus on end-to-end prediction of clinical/molecular endpoints, the emphasis of our study was on novel knowledge discovery with interpretability, robustness, and independent clinical value through multicentric validation. As a further justification, we evaluated a superior CNN-like system (i.e., SCNN [survival CNN]), specifically designed and optimized for the prediction of cancer outcomes in brain tumor.22 Interestingly, the SCNN risk score did not provide independent and significant prognostic value in both TCGA-LGG (P=0.182, Supplementary
SCNA score, closely related to the occurrence and progression of many tumors (including glioma), is related to poor prognosis.23 Meanwhile, TMB levels, closely related to degree of malignancy and poor prognosis of glioma, are often used as a biomarker for predicting the efficacy of anti-PD-1 therapy.24,25 Our study confirmed significantly higher focal SCNA scores and TMB levels in subtype 2 patients, which explains the poor prognosis and provides justification for anti-PD-1 immunotherapy for subtype 2 patients.
Our KEGG analysis suggested that DEGs were significantly enriched (FDR<0.05) in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, IL-17 signaling pathway, complement and coagulation cascades, and S. aureus infection, which were closely associated with the diagnosis and/or prognosis of glioma.26-30 Moreover, IL-6, at the hub of the PPI network (Supplementary
The tumor immune microenvironment plays an important role in tumor progression. In glioma, NK cells, macrophages, neutrophils, CD4+ T cells, CD8+ T cells, regulatory T cells, etc. influence disease outcome.34 Molinaro et al35 evaluated immune cell fractions and epigenetic age in glioma patients and found that IDH/1p19q/TERT-WT patients had lower lymphocyte fractions (CD4+ T, CD8+ T, NK, and B cells) and higher neutrophil fractions than people without glioma, suggesting that common host immune factors among different glioma types may affect survival. Consist with previous studies, we showed that T cells (including CD4+ T cells, CD8+ T cells, gamma delta T cells, regulatory T cells), B cells, plasma cells, macrophages, NK cells, neutrophils, mast cells, etc. were higher in subtype type 2 patients, suggesting higher immune infiltration in tumors of subtype 2 patients. Moreover, we examined expression levels of immune inhibitory receptor CTLA-4 and PD-1 and the ligand of PD-1 (i.e., PDCD1L1), HAVCR2, LGALS9, CD86, LAG3, PDCD1LG2, CD28, CD96, CD80, and IDO1. The expression levels of these immune suppression molecules (
CTLA-4 inhibits T-cell activation by inducing antigen-presenting cells to express CD80 and CD86.6.36 Regulatory T cells can inhibit T-cell function by secreting IL-10 and TGF-β.37 Studies have reported that neutrophil infiltration in tumor tissues can promote tumor progression and metastasis, and in glioma, neutrophils can promote tumor proliferation by inducing angiogenesis.38-40 NK cells are an important component of the human immune system. However, Poli et al showed that NK cells are in a state of inactivation in glioma.41 These results indicated possible mechanisms for immune escape or immune tolerance due to the influence of immunosuppressive cell (e.g., regulatory T cells) infiltration, T-cell function inactivation, and other factors in the poor subtype tumors, which could explain the poor prognosis of subtype 2 patients in spite of more immune cells enriched in this subtype. Given the role of these immunosuppressive molecules in cancer immunotherapy, CMS also lays the foundation to select patients for the targeted immunotherapy.34 Surprisingly, there was no significant association between PIK3CA/PIK3R1 mutation or CDKN2A/B copy number alternation and CMBs (Supplementary
This study has some shortcomings. First, relatively few LGG patients were included in the validation cohorts, so the conclusions of this study need further verified in large-scale studies. Second, the prevalence of subtype 2 was potentially due to the differences in patient population across hospitals. Nevertheless, our findings demonstrated the robustness and significant clinical value of CMS in all five cohorts. However, further large-scale studies are still needed to evaluate the impact of population difference on CMS before its utility in clinical practice. Third, our findings raise the possibility that subtype 2 LGG patients could benefit from anti-PD-1 immunotherapy; however, since LGG patients have not been recommended for anti-PD-1 immunotherapy based on existing clinical practice, we could not find any retrospective dataset to test this and will investigate it in our future prospective study.
In conclusion, we developed a pathology image-based LGG subtyping that seems to stratify LGG patients into two groups with different OS associated with treatment responses, copy number alterations, and TMB levels and immune tolerance. It provides a cost-effective solution with potential applicability worldwide in current clinical settings (Supplementary Table 28).
Supplementary Method 1. Cellular Morphometric Feature Estimation. The nuclear size was calculated based on segmented nuclear region; the Cellular Voronoi Size was calculated based on the voronoi region, which is the pixel set that is closest to a specific segmented nuclear region; the aspect ratio, major axis, minor axis and rotation were estimated based on the ellipse fitted from segmented nuclear contour; the curvature related features (e.g., bending energy, STD curvature, Abs max curvature) were estimated based on the curvature values along segmented nuclear contour1; the intensity based features were estimated in gray scale in segmented nuclear region and its background (i.e., area that is outside nuclear region, and inside the corresponding voronoi region); and gradient related features were estimated using the first derivative of gaussian.
Supplementary Method 2. Clinical Evaluation and Validation of Patient Subtype. We evaluated and independently validated the clinical impact of pre-identified patient subtype from TCGA-LGG cohort, ZN-LGG cohort, SU-LGG cohort, TCGA-GBM cohort, and ZN-GBM cohort, respectively, where the latest clinical data of TCGA-LGG and TCGA-GBM cohorts was downloaded from Genomic Data Commons (GDC, https://portal.gdc.cancer.gov/), and the subtype assignment of each patient in independent validation cohorts (i.e., ZN-LGG, SU-LGG, TCGA-GBM, and ZN-GBM) was achieved through the application of pre-built TCGA-LGG patient subtype model as described previously. The evaluation and validation reside in three folds as follows, (1) Prognostic impact. The prognostic impact of patient subtype on OS was evaluated on TCGA-LGG, ZN-LGG, SU-LGG, TCGA-GBM, and ZN-GBM cohorts with univariate and stepwise multivariate Cox proportional hazards regression (CoxPH) models (survival package in R, Version 3.2-3), and the subtype-specific survival was visualized through Kaplan-Meier curve (survminer package in R, Version 0.4.8); (2) Predictive power of survival. A nomogram, based on multivariate CoxPH model, was developed to assist the prediction of 3-year and 5-year survival rate of LGG patents, where the multivariate CoxPH model was constructed with selected variables (i.e., clinical factors, molecular factors, and patient subtype) based on their significant and independent prognostic impact. Specifically, during nomogram construction and validation, the patients in TCGA-LGG cohort were randomly partitioned into training set (60% patients) and testing set (40% patients) through stratified sampling strategy. Then, a nomogram was constructed (rms package in R, Version 6.0-1) on the training set to predict the 3-year, and 5-year overall patient survival. The performance of nomogram was evaluated based on concordance-index (C-index) with 1000 bootstraps on TCGA-LGG training set and test set, followed by calibration analysis to calibrate the performance of the nomogram; and (3) Treatment response. The treatment response was categorized as: Response (including complete remission and partial remission); and Non-response (including progressive disease and stable disease). And the differences in treatment response were assessed with Chi-square test for both primary therapy and follow-up treatment.
Supplementary Method 3. Differences in Gene expression, Mutation load, and Immune microenvironment between Subtypes. Differentially expressed genes (DEGs) between patient subtypes were estimated (edgeR package in R, Version 3.30.3) based on the count data of TCGA-LGG cohort, where genes with |log2FC|>1 (FC: fold change) and P<0.001 were selected and visualized via volcano plot (EnhancedVolcano package in R, Version 1.6.0). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed2 (clusterProfiler package in R, Version 3.16.1) to exam the biological functions of DEGs. Moreover, we performed protein-protein interaction (PPI) network analysis on the DEGs using the String database (https://string-db.org/), and visualized the PPI network using R package igraph3. The total mutation number and somatic copy number alteration (SCNA) of each TCGA-LGG sample were calculated (maftool package in R, Version 2.4.05)4 on the basis of MuSe5 preprocessed mutation data. The SCNA levels of each patient in the TCGA-LGG cohort were calculated according to previous work6. The infiltration scores of 18 immune cells and overall immune infiltration score were estimated via R package “ConsensusTME” (version: 0.0.1.9000) 7, and total T cell infiltration score was calculated according to the method introduced by Senbabaoglu et al.8.
Supplementary Method 4. Immunohistochemical (IHC) Staining. IHC staining was carried out on 4-μm sections of formalin-fixed and paraffin-embedded tissues according to the standard protocol on the entire ZN-LGG cohort (70 patients in total). Briefly, sections were dewaxed and rehydrated in serial alcohol washes, and then the endogenous peroxidase activities were blocked. After the nonspecific sites were saturated with 5% normal goat serum, the sections were incubated overnight at 4° C. with anti-CD3 (Ready-to-Use, mouse mAb, #F7.2.38, Leica), anti-CD20 (Ready-to-Use, mouse mAb, #L26, Leica), anti-CD80 (Ready-to-Use, mouse mAb, #MRQ-26, Leica), anti-CD163 (1:500, rabbit mAb, #EPR1157(2), abcam), anti-PD-1 Ab (1:50, mouse mAb, #UMAB199, ZSGB-Bio), anti-PD-L1 Ab (1:100, rabbit mAb, #13684, Cell signaling), or anti-CTLA4 Ab (1:50, mouse mAb, #UMAB249, ZSGB-Bio), and then incubated with anti-rabbit or anti-mouse Ig secondary Ab. The sections were visualized with the biotin-peroxidase complex and were counterstained with hematoxylin. For the assessment of CD3, CD20, CD80, CD163, PD-1 and CTLA4, the stained sections were screened at low-power field (×40), and 5 hot spots were selected. The number of positive cells in these areas were counted at HPF×400, 0.47 mm2. The expression of PD-L1 was scored as a percentage of tumor cells expressing PD-L1 (3, ≥50%; 2, ≥5% and <50%; 1, ≥1% and <5%; and 0, <1%), where the staining in areas of necrosis was not quantified. The assessment was conducted by two experienced neuropathologists blinded to clinical information.
Supplementary Method 5. Statistical Analysis. Survival differences between subtypes or groups were examined using log-rank test. Differences in the treatment response of primary therapy and follow-up treatment between subtypes were examined using Chi-square test. Differences in respect of the expression of four negative immune regulators CTLA4, PD-1 and PD-L1, the immune cell infiltration, and genomic heterogeneity (tumor mutation burden, somatic copy number alteration) between subtypes were analyzed with Mann-Whitney non-parametric test. P value (FDR corrected if applicable) less than 0.05 was considered to be statistically significant. All analysis was performed with R (Version 4.0.2).
Supplementary Discussion 1. Extended discussion on gene function, pathway classifications and clinical relevance of DEGs. Our KEGG analysis suggested that DEGs were significantly enriched (FDR<0.05) in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, IL-17 signaling pathway, complement and coagulation cascades and Staphylococcus aureus infection. The pathway of neuroactive ligand-receptor interaction comprises of G-protein coupled receptors, ion channels and ligands which functions in modulation of neural plasticity, memory processes, behavior etc. Jagriti Pal et al. reported that defective neuroactive ligand receptor interaction pathway was a poor prognosticator in glioma patients9. Moreover, Xuemei Ji et al., by using eQTL analysis, suggested that the neuroactive ligand receptor interaction pathway was involved in lung cancer risk10. Cytokines are reported to be associated with host innate and adaptive inflammatory defenses, cell growth, differentiation, cell death, angiogenesis, and development and repair processes aimed at the restoration of homeostasis. Nijaguna et al. introduced an 18-cytokine signature that could be used for the diagnosis and prognosis for patients with glioma11. The IL-17 signaling pathway mainly includes six members, IL-17A, IL-17B, IL-17C, IL-17D, IL-17E, and IL-17F, which are produced by multiple cell types and are involved in pro-inflammatory immune responses12 and it was also reported to participated in the growth, progression and prognosis of glioma13,14. Complement is an integral part of the immune system and mediates immune and inflammatory responses, classical pathway, lectin pathway and alternative pathway are reported to involved in glioma15. Moreover, the PPI network indicated that a total of 72 genes with a degree no less than 5 were at the hub of the network. As shown in Supplementary
While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto.
This application claims priority to U.S. Provisional Patent Application Ser. No. 63/369,982, filed Aug. 1, 2022, which is hereby incorporated by reference.
The invention described and claimed herein was made utilizing funds supplied by the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
63369982 | Aug 2022 | US |