The invention relates to molecular classifiers and more particularly to classifiers for prostate cancer.
Although prostate cancer (CaP) is a leading cause of cancer death, the majority of biopsy-confirmed cases are sufficiently indolent to be safely monitored without definitive treatment [1, 2]. The most powerful biomarker of aggressive prostate cancer has been Gleason Grade, as determined by comprehensive pathologic examination of the surgically removed prostate. Low Gleason grade cancers, defined as Gleason grade 3+3=6 or WHO Grade Group (GG) 1 [3], exhibit negligible risk of metastasis or death [4, 5]. Higher-grade cancers (WHO GG2 to GG5) require definitive treatment. Unlike most cancer types, for which grading schemes prioritize nuclear morphology and mitotic counts, GG for prostate cancer focuses exclusively on glandular architecture. Both benign prostate glands and glands formed by GG1 prostate cancer cells feature a single layer of luminal epithelial cells surrounding a single lumen. All cancer cells occupy similar environments, directly contacting the lumen on apical aspects, with stroma at their base, and other cancer cells on the remaining four sides. This arrangement provides similar access to oxygen and nutrients from surrounding blood vessels. In contrast, higher grade cancers (GG2-GG5) form fused gland-like structures with multiple lumens, or make no lumens at all, reflecting far greater plasticity with respect to cell-cell interactions, differentiation, and metabolism.
The ability to grow in these different arrangements corresponds to the ability to grow as metastatic deposits outside the prostate. Thus, cancer metabolism, epithelial plasticity, and epithelial-stromal interactions are key themes in prostate cancer progression [6-9]. The molecular underpinnings of glandular architecture associated with GG provide direction for the development of diagnostic biomarkers for aggressive prostate cancer.
In the United States, Canada, and Europe, active surveillance (AS) represents a standard of care for GG1 cancers [10-13]. Patients are monitored with prostate-specific antigen (PSA) levels and a series of core biopsies and may receive imaging as an adjunct [10]. While GG based on prostatectomy is highly informative, current methods cannot accurately separate GG1 and GG2 based on needle biopsies, presenting a major dilemma. Due to sampling error in core biopsy and inter-observer variability, biopsy grading inaccurately reflects surgical GG in 36-67% of cases [14-17]. The consequence of these inaccuracies is that men are placed into the wrong risk category. Those who are eligible for AS may receive aggressive surgical interventions (radical prostatectomy) and suffer undue morbidity, due to uncertainty relating to their true risk of harboring aggressive high-grade cancer. Conversely, others fail to receive the treatment they require in time to prevent the spread of incurable metastatic disease.
Inaccurate reporting of GG at biopsy has motivated molecular approaches to improving risk stratification based on a core biopsy sampling of CaP [18]. However, existing molecular classifiers for biopsy GG fail to accurately distinguish between GG1 and GG2 [19, 20].
In an aspect, there is provided a method of predicting disease progression risk in a subject with prostate cancer, the method comprising: a) providing a sample containing RNA and DNA material from tumour cells; b) determining or measuring values for substantially all of 353 patient features comprising the mRNA and copy number aberration (CNA) features listed for PRONTO-e in Table 6, and some or all reference or control features set forth in Table 6; c) comparing said patient features to the reference or control features; and d) computing a prediction score using a classifier that takes said patient feature values as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
In an aspect, there is provided a method of predicting disease progression risk in a subject with prostate cancer, the method comprising: a) providing a sample containing RNA and DNA material from tumour cells; b) determining or measuring substantially all of 94 patient features comprising the mRNA, CNA, methylation and clinical features listed for PRONTO-m in Table 6, and some or all reference or control features set forth in Table 6; c) comparing said patient features to the reference or control features; and d) computing a prediction score using a classifier that takes said patient feature values as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
In an aspect, there is provided a computer-implemented method of predicting disease progression risk in a patient with prostate cancer, the method comprising: a) receiving, at at least one processor, data reflecting substantially all of the patient features defined in claim 1 or 7 corresponding to the PRONTO-e or PRONTO-m classifiers regarding a prostate cancer tumor, and some or all reference or control features set forth in Table 6; b) constructing, at at least one processor, a patient profile based on the patient features; c) comparing, at the at least one processor, said patient profile to the reference or control; d) computing, at the at least one processor, a prediction score using a classifier that takes said patient profile as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method of any one of claims 13-15.
In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product according to claim 16.
In an aspect, there is provided a device for predicting disease progression risk in a patient with prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at least one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive data reflecting substantially all of the patient features defined in claim 1 or 7 corresponding to PRONTO-e or PRONTO-m classifiers regarding the prostate cancer tumor, and some or all reference or control features set forth in Table 6; b) compare said patient features to the reference or control features; and c) compute, at the at least one processor, a prediction score using a classifier that takes said patient profile as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
These and other features of the preferred embodiments of the invention will become more apparent in the following detailed description in which reference is made to the appended drawings wherein:
(A) Cases were split into training and validation cohorts. Both high-grade and low-grade samples were extracted from each resected tumor (i.e. for each case). (B) 431 genes/loci associated with GG were profiled. (C) A machine learning pipeline was used to develop GG classifiers. First, one or more data types were selected. Second, the relevant data were partitioned for five-fold cross-validation. Third (optional), features without significant univariate association with GG were discarded. Fourth, after selecting a machine learning algorithm, a classifier was trained on four partitions and tested on the 5th partition.
Each column represents a classifier. The top panel indicates the datasets used by the classifier, the machine learning algorithm used to train it, the sample weighting (i.e. envelope) scheme and the types of training samples used (see Methods). In the AUC panel, each box summarizes the mean AUCs from the 1000 repetitions of cross-validation. In the GG1 and GG2 panels, each box summarizes the mean fractions of correctly classified GG1 and GG2 cases, respectively. The mean statistics were computed as xmean=(xlow+xhigh)/2 where xlow and xhigh are the statistics computed from only low- or high-grade samples, respectively. The classifiers are sorted by decreasing AUC. Abbreviations: AUC—area under the curve; BCR—biochemical recurrence; CAPRA—Cancer of the Prostaste Risk Assessment; CN_MLPA—copy number, MLPA platform; CN_NS—copy number, NanoString platform; GG—grade group; MSP—methylation-specific PCR.
(A-C) Multimodal classifiers, i.e. classifiers that use different types of data, outperform single-mode classifiers in cross-validation. The TP rate (A), FP rate (B) and AUC (C) of each classifier were computed from cross-validation repeated 1000 times (boxes summarize the repetitions). In each repetition, each statistic was computed using only the high- or only the low-grade sample from each case. The mean of the high- and low-grade statistics is indicated in the ‘mean’ section. The type of input data used by a given classifier is indicated in the key in (C); CAPRA uses only clinical data. The multimodal classifiers are top-performing classifiers according to cross-validation. (D) Validation performance of the multimodal classifiers. For each case in the validation cohort, one sample was randomly selected and statistics were computing using the representative samples. This process was repeated 1000 times and each point indicates the median across repetitions (i.e. sampling-based AUC), and the lower and upper error bars indicate the first and third quartiles, respectively. (A-C) CNA refers to CNA data from MLPA since PRONTO-e and PRONTO-m only use CNA data from MLPA. (E) Agreement between predicted classes of low- and high-grade samples from the same validation case. (F) Of those cases with agreement, the percentage with a correct prediction. (E-F) Cases with GG1 are separated from patients with GG2. The total numbers of validation cases used to compute each percentage are shown above the bars. Note that the numbers vary for PRONTO-e and PRONTO-m since the classifiers have different data requirements for each sample.
For each significant molecular feature, the left plot shows the median difference in feature values for GG≥2 and GG1 cases. The difference is show for each cohort, where the point indicates the median and the ends of the intersecting line indicate the first and third quartiles, across 1000 random selections of one representative sample per case. The right plot indicates the q-value (i.e. adjusted p) resulting from the combination of the training and validation cohort q-values, representing the significance of the univariate association between the feature and GG (see Methods). The mRNA feature analysis used 332 training and 200 validation cases, and the methylation feature analysis used 318 training and 202 validation cases. For the targeted genes, preferential expression in the epithelial or stromal compartments is indicated [54].
A suitable configured computer device, and associated communications networks, devices, software and firmware to provide a platform for enabling one or more embodiments as described herein.
The GG classifier would take a patient profile as input, where the profile is potentially comprised of features of different data types (including clinical features, not shown). The classifier is trained with one of several possible machine learning algorithms (see Methods) to predict whether the patient has a pathological GG2 or not. That is, the final classifier output would be yes or no.
(A) Validation ROC curves of the PRONTO-e and PRONTO-m classifiers on only the low-grade or only high-grade sample from each case. The prediction score is the numerical output of a classifier, and with an operating point of x, a score>=x predicts pathological GG>=2 whereas a score<x predicts pathological GG1. Curves show the true and false positive rates at different operating points. (B) The prediction score distributions of the PRONTO-e and PRONTO-m classifiers. Boxes indicate the score distributions from the classifiers applied to all samples in the validation cohort, separated by the GG of their source cases. As expected, the scores tend to be higher for samples from higher GG cases, for both classifiers. The red line indicates the chosen operating point of 0.5.
CNA refers to CNA data from MLPA since PRONTO-e and PRONTO-m only use CNA data from MLPA. Abbreviation: methyl.—methylation.
Hypothetical performance of the PRONTO-e classifier if applied to the diagnostic biopsy of 1000 patients recommended for active surveillance. Given 1000 active surveillance patients and the predicted performance of PRONTO-e, the illustration shows the hypothetical number of true and false positives, true and false negatives, and how these patient subsets would be impacted by their test results. A positive test result would trigger an early biopsy 3 or 6 months after diagnosis, which may result in upgrading and subsequent treatment. A negative test result would instead lead to a biopsy 12 months after diagnosis.
In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details.
Cancer grade is the most powerful predictor of disease progression in early-stage prostate cancer (CaP). Infra-tumoral heterogeneity and inter-observer variability limit accuracy in diagnostic biopsies, and reduce clinical utility. Using pathologic examination of the prostatectomy as the gold standard, we developed and validated a robust objective biomarker of prostate cancer grade.
Radical prostatectomies were collected from low- and intermediate-risk CaP patients and assigned to either a training (n=333) or validation (n=202) cohort. To integrate intra-tumoral heterogeneity, each case was separately sampled at two locations. We profiled 342 mRNAs enriched for CaP metabolism, stromal signaling, and epithelial plasticity, complemented by 100 copy number aberrations (CNAs) and 14 DNA hypermethylation loci. Over 41,000 candidate classifiers of pathologic Grade Group (1 versus 2) were generated with the training data, subjecting clinical, pathologic and molecular variables to 12 different machine learning algorithms. We selected two classifiers, PRONTO-e and PRONTO-m, for validation by prioritizing classifiers with greater true positive (TP) rates and areas under the receiver-operator curve (AUCs).
The PRONTO-e classifier comprises 353 mRNA and CNA features, while the PRONTO-m classifier comprises 94 mRNA, CNA, methylation and clinical features. The classifiers (PRONTO-e, PRONTO-m) independently validated, with respective true positive rates of 0.802 and 0.810, false positive rates of 0.403 and 0.398, and AUCs of 0.799 and 0.786.
Two multigene classifiers were developed and validated in separate cohorts, each achieved excellent performance by integrating different types of genomic data. Classifier adoption could improve current active surveillance approaches without increasing patient morbidity.
In an aspect, there is provided a method of predicting disease progression risk in a subject with prostate cancer, the method comprising: a) providing a sample containing RNA and DNA material from tumour cells; b) determining or measuring values for substantially all of 353 patient features comprising the mRNA and copy number aberration (CNA) features listed for PRONTO-e in Table 6, and some or all reference or control features set forth in Table 6; c) comparing said patient features to the reference or control features; and d) computing a prediction score using a classifier that takes said patient feature values as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
In some embodiments, substantially all of 353 patient features is all 353 patient features.
As used herein, the term “control” refers to a specific value or dataset that can be used to prognose or classify the value e.g. patient features comprising the mRNA, copy number aberration (CNA) features, or clinical features obtained from the test sample associated with an outcome class. A person skilled in the art will appreciate that the comparison between the test sample and the control will depend on the control used.
The term “low risk” or “low likelihood” as used herein in respect of cancer refers to a statistically significant lower risk of cancer as compared to a general or control population. Correspondingly, “high risk” or “high likelihood” as used herein in respect of cancer refers to a statistically significant higher risk of cancer as compared to a general or control population.
The term “sample” as used herein refers to any fluid, cell or tissue sample from a subject that can be assayed for the DNA or RNA materials referenced herein.
In an aspect, there is provided a method of predicting disease progression risk in a subject with prostate cancer, the method comprising: a) providing a sample containing RNA and DNA material from tumour cells; b) determining or measuring substantially all of 94 patient features comprising the mRNA, CNA, methylation and clinical features listed for PRONTO-m in Table 6, and some or all reference or control features set forth in Table 6; c) comparing said patient features to the reference or control features; and d) computing a prediction score using a classifier that takes said patient feature values as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
In some embodiments, substantially all of 94 patient biomarkers is all 94 patient biomarkers.
In some embodiments, determining the prediction score comprises classifying the patient tumour into a pathological Gleason Grade Group (GG) class.
In some embodiments, the patient tumour is classified in the pathologic GG2 class if the score is 0.5 or the pathologic GG1 class if the score is <0.5.
In some embodiments, if the patient is classified into the pathologic GG1 class, further comprising managing the patient with active surveillance. In some embodiments, if the patient is classified into the pathologic GG2 class, further comprising treating the patient with surgery, endocrine therapy, chemotherapy, radiotherapy, hormone therapy, gene therapy, thermal therapy, or ultrasound therapy.
The present system and method may be practiced in various embodiments. A suitably configured computer device, and associated communications networks, devices, software and firmware may provide a platform for enabling one or more embodiments as described above. By way of example,
The present system and method may be practiced on virtually any manner of computer device including a desktop computer, laptop computer, tablet computer or wireless handheld. The present system and method may also be implemented as a computer-readable/useable medium that includes computer program code to enable one or more computer devices to implement each of the various process steps in a method in accordance with the present invention. In case of more than computer devices performing the entire operation, the computer devices are networked to distribute the various steps of the operation. It is understood that the terms computer-readable medium or computer useable medium comprises one or more of any type of physical embodiment of the program code. In particular, the computer-readable/useable medium can comprise program code embodied on one or more portable storage articles of manufacture (e.g. an optical disc, a magnetic disk, a tape, etc.), on one or more data storage portioned of a computing device, such as memory associated with a computer and/or a storage system.
In an aspect, there is provided a computer-implemented method of predicting disease progression risk in a patient with prostate cancer, the method comprising: a) receiving, at at least one processor, data reflecting substantially all of the patient features defined in claim 1 or 7 corresponding to the PRONTO-e or PRONTO-m classifiers regarding a prostate cancer tumor, and some or all reference or control features set forth in Table 6; b) constructing, at at least one processor, a patient profile based on the patient features; c) comparing, at the at least one processor, said patient profile to the reference or control; d) computing, at the at least one processor, a prediction score using a classifier that takes said patient profile as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method of any one of claims 13-15.
In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product according to claim 16.
In an aspect, there is provided a device for predicting disease progression risk in a patient with prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at least one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive data reflecting substantially all of the patient features defined in claim 1 or 7 corresponding to PRONTO-e or PRONTO-m classifiers regarding the prostate cancer tumor, and some or all reference or control features set forth in Table 6; b) compare said patient features to the reference or control features; and c) compute, at the at least one processor, a prediction score using a classifier that takes said patient profile as input, the classifier having been previously trained on samples from a population of early prostate cancer patients.
The advantages of the present invention are further illustrated by the following examples. The examples and their particular details set forth herein are presented for illustration only and should not be construed as a limitation on the claims of the present invention.
Materials and Methods
Patient Samples:
To train and validate classifiers, radical prostatectomy samples were identified using local electronic medical records at Kingston General Hospital (diagnosis between 1999 and 2012), Montreal General Hospital at McGill University Health Centre (1994-2013) and London Health Sciences Centre (LHSC) (2004-2009). Initial inclusion criteria were (i) reviewed diagnosis of GG1 or GG2 on core biopsy, (ii) underwent radical prostatectomy, and (iii) treatment-naïve prior to surgery. Patients with clinical stage of T3 or higher were excluded. Cases were assigned to either the training cohort or the validation cohort.
For all cases, central pathology review of both diagnostic core biopsies and radical prostatectomies was performed by expert pathologists (FB, MM, DB, TJ). Where possible, DNA and RNA were extracted from punch cores obtained from two areas of the dominant tumor focus (
The clinicopathologic features of the training and validation cohorts are summarized in Table 1. We were 89% powered to validate two classifiers (α=0.01), assuming true positive (TP) rates 0.8 and false positive (FP) rates ≤0.55 [24].
Selection of Candidate Features for Classifiers:
Multiple functional aspects reflecting the biology of GG were interrogated with molecular features at the transcriptomic (mRNA abundance), genomic (DNA copy number alteration, CNA) and epigenomic levels (DNA methylation) (
Centralized Molecular Profiling:
We employed four molecular diagnostics platforms of which three are currently in clinical use for molecular diagnostics of cancer. The mRNA analysis was performed using the Nanostring N-counter platform [32] with a specific code set developed for this study. CNA analysis was performed both using a multiplex ligation-dependent probe amplification (MLPA)-based assay developed specifically for this project and a custom NanoString copy number codeset [33] [34]. (Ebrahimizadeh et al, submitted manuscript). Finally, epigenetic profiling was performed using methylation-specific polymerase chain reaction (MSP) [26]. All samples in both cohorts were profiled on as many platforms as possible given their RNA and DNA yields.
Development and Validation of Prognostic Classifiers:
Both training and validation data were preprocessed as described in Supplementary Methods. We created a supervised machine learning pipeline (
We validated classifiers by computing statistics as above, and also by randomly selecting one sample (high- or low-grade) per patient in the validation cohort for computing performance statistics, and repeated this process 1000 times. These sampling-based statistics better simulate clinical practice. All statistical analyses were performed using the R software framework (v3.4.3) [35], the machine learning package mlr (v2.15.0) [36] and the plotting package BoutrosLab.plotting.general (v5.9.8) [37].
Ethical Review
All research was performed according to the Tri-Council Policy Statement (TCPS2) and following ethical approval of the study protocol at each participating institute's research ethics board (Table 3).
Selection of Features
CNA Features: MLPA Assay
A multiplexed ligation-dependent probe amplification (MLPA) assay was developed to assess fourteen loci for copy number alterations (CNA; Table 6) previously associated with clinical outcome in prostate cancer (CaP; Ebrahimizadeh et al, submitted manuscript). The loci assayed include the MYC oncogene S[1-3], the PTEN S[4-7], TP53 S[2, 8, 9], CDKN1B S[10, 11], and RB1 S[12, 13] tumor suppressors, loci associated with metastasis such as GABARAPL2 S[13, 14] and PDPK1 S[15, 16], loci associated with maintenance of genomic stability such as RWDD3 S[17-20], GTF2H2 S[21-24] and WRN S[13, 25-27], and genes associated with CaP subtypes: CHD1 S[13, 28, 29], MAP3K7 S[13, 28, 30], NKX3-1 S[13] and PDZD2 S[31, 32].
CNA Features: CPC-GENE NanoString Assay
Using DNA CNA assays, the Canadian Prostate Cancer Genome Network (CPC-GENE) identified an association between percentage of genome alteration and reduced biochemical recurrence-free survival in low- to intermediate-risk CaP patients, and developed a classifier that uses CNA features to predict patient outcome S[33]. A NanoString CNA assay was designed to derive values for those features S[34], and here we used the assay to include 92 CNA features: 85 loci (including 151 genes) and seven additional genes associated with CaP in the literature (Table 6).
mRNA Features:
We generated the mRNA abundance gene panel (for the NanoString RNA assay) by combining gene lists from the following studies:
mRNA Features: CPC-GENE
CPC-GENE performed RNA abundance profiling of samples from intermediate-risk patients S[35] and univariate analysis of these data identified 20 genes associated with poor prognosis. These genes were supplemented with 30 genes identified with similar univariate analysis and predictive modeling of RNA data from Taylor et al S[36].
mRNA Features: Stem Cell Signature
The gene list was derived from “reprogramming” four androgen receptor (AR)+ CaP cell lines (LNCaP, LAPC4, CWR22rv1 and VCaP) to a stem-like phenotype S[37]. Agilent Gene Chip analyses of each cell line revealed transcripts with significant abundance changes between parental and reprogrammed cells. These transcripts were then compared across cell lines to derive a ranked list of 132 commonly changed genes associated with reprogramming. This signature identified propensity for recurrence, metastasis and CaP-specific death as described by S[37]. The top 50 genes on this list were included in the RNA panel.
mRNA Features: Epithelial-to-Mesenchymal Transition (EMT) Signature
Using the GEO2R program and the Benjamini—Hochberg method for multiple testing corrections, gene expression data from PC-3, PC-3M, ALVA-31, RWPE-2-w99 cell lines undergoing invasive growth in 3-dimensional cultures (GEO #GSE19426) S[38] were compared to identify 1669 genes dysregulated in at least three of four cell lines. These genes were cross-referenced to the EMT-associated genes in the SABiosciences qRT-PCR array. The resulting 33 overlapping genes were used as the seed list for network building, using the String v9.1 and GeneMania algorithms S[39, 40]. From the resulting network, 37 key genes, including the common nodal points connecting the pathways, were included in the RNA panel.
mRNA Features: Stromal Influence on Epithelial Growth and Differentation.
A list of 318 genes identified as enriched in embryonic prostate stroma S[41-43] was filtered to enrich for genes also expressed in cancer-associated fibroblasts, and for association with clinical and pathological endpoints (recurrence, CaP death and Gleason score) in four publicly available datasets S[36, 44-46]. A list of 80 genes was created by prioritizing those associated with grade group (GG) and/or recurrence in multiple datasets.
mRNA Features: Tumor Cell Metabolism
Eighty-six candidate genes associated with CaP metabolism were identified through in silico gene network analysis linking signaling pathways of sterol regulatory element binding protein 1 (SREBP1), insulin growth factor (IGF), AR and suppressor of cytokine signaling 1 (SOCS1), using the String v9.1 and GeneMania algorithms S[47]. Expression analysis was performed for these genes by Nanostring nCounter assay on discovery and validation cohorts, each comprised of 32 Gleason pattern 3 and 32 Gleason pattern 4 foci from individual tumors. Univariate analysis using the Mann-Whitney U test (p<0.05) identified 25 differentially expressed genes.
mRNA Features: Prostate Homeostasis
This research strand leveraged benign prostate homeostasis as a model for growth and differentiation by steroid hormones, and dysregulation of these pathways in CaP. Transcripts representing this body of work included FER, PTK2, FLT1, LYN, SRC, JAK1, JAK3, MARK3, STAT3, STAT5A, EDF1, WNT11, ITGAV, ITGA2, and ITGV5.
Methylation and mRNA Features: CpG Island Hypermethylation
Genes (n=14) with CpG island hypermethylation in CaP were identified from the literature and DNA methylation of these genes was assayed using methylation-specific PCR as described S[48] to derive values for these methylation features (Table 6). These genes (except UCHL1) were also added to the RNA panel, along with seven additional epigenetic modifying and regulatory genes: DNMT1, EZH2, HDAC1, HIC1, KCNK2, SRP14 and TERT.
In summary, collating genes from each of these strands resulted in a novel NanoString mRNA panel comprising 342 genes (see Table 6) with additional housekeeping genes (see Supplementary Methods). We used the NanoString assay to measure the abundance of mRNAs from each gene, to derive values for our mRNA features.
Clinical Features
The Cancer of the Prostate Risk Assessment (CAPRA) score is computed with five clinical features: 1) age at diagnosis, 2) PSA at diagnosis in ng/ml, 3) biopsy GG (i.e. clinical GG), 4) clinical T stage and 5) percentage of biopsy cores involved with cancer S[49]. The CAPRA score of a patient can be used in turn to assign a CAPRA risk group (low, intermediate, high), and our candidate prognostic classifiers optionally used this group feature. Alternatively, the first four clinical features can be used directly by the classifiers. If the age at diagnosis was unavailable, we used the age at radical prostatectomy (if available). If PSA at diagnosis was unavailable, we used pre-operative PSA (if available). Biopsy GG1 and GG2 were represented as 0 and 1, respectively, to the classifiers. The clinical T stage was simplified to two possible values, T1 and T2, represented as 0 and 1, respectively, to the classifiers.
Preprocessing Training and Validation Data
mRNA Abundance Data.
To select the normalization method to use, we tested 96 different methods supported by the NanoStringNorm R package (v1.1.22; S[50]), by trying different combinations of parameter values, i.e. Background={none, mean, mean.2sd, max}, CodeCount={none, sum, geo.mean}, SampleContent={none, housekeeping.sum, housekeeping.geo.mean, total.sum, top.mean}, OtherNorm={none, rank.normal}. Otherwise, we used round.values=FALSE, take.log=TRUE and default values for the remaining parameters. To assess each normalization method, we computed several metrics with the resulting normalized data. These metrics include:
Only considering methods that passed metric 1 and had inter-cartridge concordance >0.9 and <10% of training samples failed, we ranked the methods by first ranking by metrics 2-7 separately and then taking the consensus ranking generated with the DECOR method (ConsRank package v2.0.1; S[51]. Based on this ranking, we selected the normalization method with Background=none, CodeCount=none, SampleContent=housekeeping.sum with a target value=5000 (which was roughly estimated based on the training data), and OtherNorm=none.
MLPA CNA Data.
One or two probes targeted each gene and each test sample was assayed in duplicate. For each replicate, the signal from each test probe was divided by the signal from each of the ten reference probes, resulting in a set of seven ratios. A probe was considered positive for a CNA when its 95% confidence interval for the replicate's ratios was outside of the probe's 95% confidence intervals for at least two of the three reference samples (fresh healthy female genome, normal FFPE kidney tissue, normal FFPE breast lymph node tissue) (Promega). The probe was considered positive for a test sample if it was positive for both of its replicates. If there was a discrepancy between the replicates, the probe was considered negative for a CNA. If either of the replicates did not pass quality control (Ebrahimizadeh, submitted manuscript), no CNA status was assigned to the given probe in the given test sample. If all probes for a gene were positive, the gene was considered positive for a CNA in the test sample; if there was a discrepancy, the gene was considered negative; otherwise, no CNA status was assigned. Only deletions were considered for RWDD3, GTF2H2, CHD1, MAP3K7, NKX3-1, WRN, PTEN, CDKN1B, RB1, GABARAPL2 and TP53 genes, while only gains were considered for, MYC, PDPK1 and PDZD2 genes.
NanoStrinq CNA Data
Data was preprocessed as previously described S[34].
Methylation Data
Cq values were computed as described previously S[48]. For a given test sample t and target gene g, we computed the methylation level as follows:
m
t,g,i,j,k,l=(Cq,p,g,i−Cq,p,r,j)−(Cq,t,g,k−Cq,t,r,i)
where
The normalized methylation level was then defined as:
m
t,g=mediani,j,k,l(mt,g,i,j,k,l)
A machine learning pipeline for the development of prognostic classifiers
We built a pipeline to exhaustively evaluate different methodologies for the development of a prognostic classifier. Specifically, the pipeline uses supervised machine learning methods to develop a classifier that takes a patient profile as input to predict good or poor prognosis (i.e. testing negative and positive, respectively). In our application, we binarized the GG in prostatectomy specimens (i.e. pathological GG) to define the true class of a patient: patients with only GG1 as negative gold standards and patients with GG2 as positive gold standards (Supplementary
The pipeline is comprised of four main stages: 1) dataset, 2) partition, 3) feature reduction and 4) cross-validation (
The first stage focuses on preparing the training dataset. The training dataset includes: a patient-sample by feature matrix (i.e. each row represents a patient profile), and a set of true class values with one value for each sample in the matrix. The pipeline can take input data generated by different platforms. In our application, we have clinical/CAPRA, RNA abundance, MLPA/NanoString CNA and methylation data. For each platform, this stage reduces the dataset to samples that do not have any missing data. If multiple platforms are desired, the dataset is also reduced to samples that have data from each platform of interest. Finally, the invariant features, i.e. features that have the same value across all remaining samples, are removed from the dataset.
The second stage focuses on partitioning the training dataset for repeated cross-validation. The dataset is reduced to only low-grade samples, only high-grade samples, or a randomly selected sample per patient, according to the desired option. By default, this stage prepares for five-fold cross-validation repeated 1000 times, and thus the stage creates 1000 partitionings of the dataset into five equally-sized subsets. For each candidate partitioning, each sample is first randomly assigned to one of the five subsets. If the partitioning is balanced with respect to the true class, biochemical recurrence status (which can be related to the true class in our application), and the origin of the sample since our training samples were obtained from different institutions (i.e. Kingston General Hospital, Montreal Hospital at McGill University Health Centre), the partitioning is retained. Specifically, for each pair of subsets in the partitioning, a two-sided Fisher's exact test is used to test for an association with each trait. If any of the potential associations are significant (p<0.05), another candidate partitioning is generated until a balanced one is obtained.
The third stage focuses on feature reduction. For x-fold cross-validation, each partitioning enables x training subsets. In this stage, invariant features, i.e. features that have the same value across all samples, are removed from each training subset. If desired, each remaining feature will then be tested for a univariate association with the true class (e.g. with a two-sided Mann-Whitney U test). Features with a significant association (e.g. P<0.01 or 0.05) are retained.
The fourth stage performs the repeated x-fold cross-validation with the desired machine learning algorithm using the mlr package v2.15.0 S[52] (
For algorithms that support sample weighting, this stage also cross-validates different weightings of the negative/positive gold standard classes: 30%/70%, 40%/60%, 50%/50%, 60%/40%, 70%/30%. Specifically, with a wn%/(100−wn)% weighting, each negative and positive sample is assigned a weight of wn/pn and (100−wn)/(1−pn), respectively, where pa is the proportion of samples in the negative gold standard class;
thus, the total weight of all negative samples makes up wn% of the overall total and the total weight of all positive samples makes up (100−wn)% of the overall total. For all other machine learning algorithm parameters, default values are used.
During cross-validation, a classifier is trained on (x−1) of the x folds with the given machine learning algorithm, dataset (prepared in earlier stages) and sample weighting. If this training fails after three attempts, the pipeline skips to training with the next (x−1) folds of data. If successful, the resulting classifier is tested on the remaining fold of data from two perspectives: i) only the low-grade sample from each case, and ii) only the high-grade sample from each case. For each perspective, the pipeline computes the area under the receiver-operator curve (AUC) averaged across the x folds, and using an operating point of 0.5 (in our application, if a sample's score ≥0.5, the patient is predicted as GG1, otherwise, GG≥2), the true positive (TP), false positive (FP) and true negative (TN) rates with all patients in the x folds. Moreover, for each of these statistics, the pipeline reports the mean of the values from the two perspectives [e.g. AUCmean=(AUClow+AUChigh)/2] Finally, the pipeline further summarizes by computing median statistics across the repetitions of cross-validation (e.g. across the 1000 partitionings).
Validation of Grade Group Classifiers PRONTO-e and PRONTO-m
We ran the pipeline to exhaustively test all possible methodologies that it supports, thereby enabling a more thorough search for the optimal methodology. Two main factors went into selecting the methodologies for validation. First, we wanted methodologies that resulted in greater AUC values from cross-validation as they suggest greater overall performance of the corresponding classifiers. Second, we favored greater TP rates (i.e. TP rate ≥0.8) as this prioritized correct classification of the GG≥2 cases, in accordance with our consultations with clinicians who prioritized earlier intervention for these cases at the expense of over-treating some GG1 cases (quantified by the FP rate). The 25 top-performing classifiers have AUCs ranging from 0.772 to 0.790 (
Each selected methodology was then used to train a classifier with the unpartitioned training cohort, restricted to patients with data for the required samples and features. As in cross-validation, we computed the mean AUC, TP and FP rates, where the mean is of the value for only low-grade samples and the value for only high-grade samples. Despite known intra-tumoral heterogeneity S[53], at diagnosis, it is unknown how well the grade of a biopsy sample represents the overall grade of the whole tumor. To better mimic this clinical scenario, for each patient in the validation cohort, one sample was randomly selected, statistics were computing using the representative samples and this process was repeated 1000 times. We computed the median AUC, TP and FP rates across these repetitions (i.e. sampling-based statistics).
Similarity Between Molecular Profiles
In this analysis, we computed the similarity between molecular profiles of samples from the same patient (i.e. the similarity between the low- and high-grade sample profiles), thus, patients with only a single sample were excluded. For all platforms, we only considered profiles that do not have missing values (for any features). For the CNA profiles, the profiles were first restricted to features from the MLPA platform since the validated classifiers only use CNA features from this platform. We defined the pairwise similarity between CNA profiles as the fraction of features where both samples have the same CNA status (i.e. altered or unaltered). For the RNA abundance and methylation profiles, we defined pairwise similarity as the concordance coefficient across the feature values.
Univariate Feature Analysis
For each platform separately, we tested each feature for a univariate association with pathological GG (i.e. GG1 versus GG2). Specifically, we randomly selected one sample per case and then for each feature, used the selected samples to quantify the difference in features values of GG2 versus GG1cases, x(GG≥2)−x(GG1), and estimated the significance of the difference. For the RNA and methylation platforms, we defined x(GG1) and x(GG≥2) as the median feature values for GG1 and GG2 cases, respectively, and the significance was estimated using a two-sided Mann-Whitney test comparing the sets of feature values of GG1 and GG≥2 cases. For the CNA platforms, we defined x(GG1) and x(GG≥2) as the proportions of GG1 and GG≥2 cases, respectively, with an identified CNA, and the significance was estimated using a two-sided proportion test. The p values from the statistical tests were adjusted using the Benjamini-Hochberg method, across all features from the same platform (resulting in q values). The sampling procedure and subsequent computation of statistics were repeated 1000 times, allowing the computation of the median, first and third quartile values across the repetitions. This feature analysis was performed separately with the training and validation data. To estimate the significance of the univariate association of a given feature across both cohorts, we used the weighted-Z method to combine the median q value from each cohort, weighting each q value by the number of cases used to compute it S[54].
Results
Overview of Cohorts/Samples
We successfully generated 954 mRNA, 845 NanoString-CNA, 794 MLPA-CNA, and 847 methylation profiles for samples from 535 prostatectomy cases across the training and validation cohorts. We also generated CAPRA scores for 492 cases.
Development and Validation of GG Classifiers
Classifiers were trained on 333 cases from two sites, reserving 202 cases from a 3rd site for independent validation (Table 4). Of the >41,000 GG classifiers we evaluated, 718 exhibited AUC≥0.75 with TP and TN rates ≥0.5 (i.e. ≥50% cases in each GG class were correctly predicted). Sensitivity for GG2 was prioritized over specificity because of the clinical need for earlier intervention, resulting in our selection of two top-performing classifiers for validation, PRONTO-e and PRONTO-m (Table 5). For cases with GG>2 samples, both of these classifiers were both trained using only the high-grade sample from that case. Performance statistics for the 25 top-performing classifiers (by AUC) are shown in
Despite reported intra-tumoral heterogeneity in prostate cancer [38] we observed remarkable stability in performance statistics when they were computed with one randomly selected sample per case (
The validated classifiers frequently provided consistent GG classification between paired samples from the same case: 70.8% for PRONTO-e and 73.9% for PRONTO-m indicating a high degree of resistance to sampling error. For PRONTO-e, we observed superior agreement between two samples when both were taken from a GG2 versus GG1 case (
Molecular Features of Grade Group
We investigated which molecular features were most strongly associated with GG. By univariate analysis, the abundance of 22 transcripts and methylation at 9 loci showed significant association with GG (adjusted p<0.1, see Methods;
Multimodal Classifiers Outperform CAPRA in Cross-Validation
The CAPRA score represents the current clinical standard for prostate cancer prognosis and it is computed only with non-molecular features such as age at diagnosis and the GG of the biopsy S[49]. Importantly, both PRONTO-e and PRONTO-m classifiers outperform a CAPRA classifier in cross-validation, with greater TP rates and AUCs (
GG Classifiers and Intra-Tumoral Heterogeneity
ROC curves computed only with the low-grade or high-grade sample from each case in the validation cohort indicate differences in classifier performance depending on the grade of the sample relative to the grade of whole tumor (
We examined the potential impact of intra-tumoral heterogeneity on the validated classifiers by comparing the input profiles (DNA, RNA) for samples from the same case. We quantified the similarity between the CNA profiles and found that the similarity values are significantly greater for the GG1 versus GG≥2 cases (Mann-Whitney test p=0.023). However, for both CNA and RNA data, the median similarity values are greater than 0.9, regardless of the GG subset (
Here we report the development of GG classifiers and the validation of the PRONTO-e and PRONTO-m classifiers in an independent patient population. These results suggest that incorporating diverse molecular (e.g. mRNA and CNA) features can add significant value (
Despite limited accuracy in biopsies for detecting GG≥2, the gold standard endpoint for AS, both OncotypeDx and ProMark report resistance to tumor heterogeneity [19, 20]. These results suggest that there are measurable underlying clonal changes that mediate CaP aggressiveness, reflect the GG of the whole tumor and are consistently present across areas of phenotypic tumor heterogeneity [46, 47]. The current work derived and independently validated two novel classifiers of GG that demonstrated resistance to tumor heterogeneity, yielding sampling-based AUCs of 0.799 (PRONTO-e) and 0.786 (PRONTO-m). Both classifiers can detect a GG2 tumor using molecular features of phenotypically low-grade tumor samples (
PRONTO-e comprises 353 features divided between mRNA abundance and DNA CNA types. The more compact PARSE-m comprises 94 features divided between mRNA abundance, DNA CNA, and DNA methylation types, and includes pre-surgical clinical and pathologic features (age, clinical stage, and PSA, biopsy GG). Although derived from prostatectomy tissue, for which GG is most accurate, both classifiers are resistant to sampling error and therefore there is a high probability that, when used on biopsy tissue, they will better inform decisions around AS versus clinical management. Work to validate the classifiers with biopsy samples from statistically-powered cohorts is currently underway.
When performed on the same patient, OncotypeDx and Prolaris often yield conflicting recommendations [48]. Nevertheless, the tests have demonstrated the potential to reduce biopsy frequency and overtreatment [40] suggesting more accurate tests have similar, if not better, potential impact. Once the PRONTO-e and PRONTO-m performance is validated in core biopsies, these assays have the potential to dramatically improve this impact. It is relatively simple to model the application of each validated classifier to diagnostic biopsies from 1000 hypothetical men selected for AS, with the assumption that 33% of these men would be upgraded during their AS [49]. A test with performance characteristics similar to PRONTO-m would identify 53.4% of men as positive (for risk of occult GG≥2) and 46.6% as negative. Of those testing positive (534/1000 men), 267 would be TPs and benefit from early repeat biopsy and treatment. Of the 466 men testing negative, only 13.5% (63) would be false negatives. For the 26.7% of all cases with a FP result, we suggest the consequence would be an earlier first AS biopsy, not additional biopsies. The early biopsies for these patients would provide pathological reassurance of low GG disease without additional morbidity. The hypothetical results for PRONTO-e are similar (
The current work establishes PRONTO-e and PRONTO-m as molecular biomarkers of GG that are resistant to sampling error, and therefore likely to perform well in diagnostic biopsies. Further work is needed, and ongoing, to fully validate their clinical performance. Multifocal CaPs represent a potential pitfall for any biopsy test in that that biopsies may sample a secondary low-grade focus while failing to sample the higher grade “dominant” or “index” focus. This phenomenon has been estimated to explain 20-30% of cases upgraded between biopsy and prostatectomy [15, 50]. The performance of the classifiers on biopsy tissue could also be compromised by limiting nucleic acid yields from small biopsy tissue samples. This limitation should be balanced by factors expected to improve performance of the classifiers in biopsies relative to surgical samples, including higher quality nucleic acids observed in biopsy tissue [51] and opportunities to employ more sensitive and precise massively parallel sequencing technologies [52] in the clinical assay.
While several studies have related biopsy classifiers to outcomes after surgery, there is little information linking test results to outcomes for men on AS. Further validation of PRONTO-e and PRONTO-m on biopsies from men on AS is needed. Overall, these results indicate that combining transcriptomic, epigenomic, and genomic features can improve the performance of clinically relevant biomarkers for CaP tissue. This result suggests potential benefits for other biospecimen types (e.g., blood or urine) and tumor sites.
Although preferred embodiments of the invention have been described herein, it will be understood by those skilled in the art that variations may be made thereto without departing from the spirit of the invention or the scope of the appended claims. All documents disclosed herein, including those in the following reference list, are incorporated by reference.
Cancer Metab 2016; 4:22.
Perspect Med 2018; 8(8).
Gleason grading system and factoring in tertiary grades. Eur Urol 2012; 61(5):1019-24.
Prostatectomy Specimens. Working group 2: T2 substaging and prostate cancer volume. Mod Pathol 2011; 24(1):16-25.
Learning 2016; 17:1-5.
J Clin Oncol 2014; 32(15_suppl):5090-5090.
featured updates to the NCCN guidelines. J Natl Compr Canc Netw 2012; 10(9):1081-7.
aMean values represent the average computed over the values derived from low-grade and high-grade samples.
bSamp1ing-based statistics provide a better representation of clinical practice (see Methods).
aFor 146 cases in the training cohort and 25 cases in the validation cohort, primary core biopsy material was not available for central pathology review leading to exclusion of these cases.
bA further 34 and 16 cases were excluded after pathology review (GG >2, etc).
cFor 34 and 5 cases respectively there was no/incomplete molecular data captured.
This application claims priority to U.S. Provisional Application No. 63/040,692, filed on Jun. 18, 2020, the contents of which are incorporated by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2021/050837 | 6/18/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63040692 | Jun 2020 | US |