The present invention refers to a marker set and its use in identifying a disease based on the determination a PCL-like transcriptomic status in a sample. The marker set of the present invention is for example also used for selecting an active agent for use in the treatment of a disease. Further, the present invention is directed to kits comprising means for determining the PCL-like transcriptomic status in a sample as well as selecting an active agent based thereon.
Plasma cells, also called plasma B cells, are a type of white blood cells that originate in the lymphoid organs by B lymphocytes and secrete antibodies. Plasma cells may develop plasma cell dyscrasias which constitute various plasma cell disorders ranging from benign to malignant conditions, eventually resulting in the degeneration of plasma cells.
Among plasma cell dyscrasias, multiple myeloma (MM), also known as plasma cell myeloma and simply myeloma, represents a cancer of plasma cells. The cause of MM is unknown. Risk factors include for example obesity, radiation exposure, family history, and certain chemicals. MM is considered generally incurable, however, treatable.
Metastatic capacity is a pivotal feature of aggressive cancers, for which tumor cell dissemination is an early requirement. Since the PCL-like classifier can identify plasma cell tumors that have a higher degree of hematogenous dissemination than would be expected based on tumor burden alone and many pathways are represented in this classifier that are part of known cancer hallmarks (Hanahan & Weinberg—Cancer Cell 2011), it may be anticipated that the PCL-like classifier will have prognostic value in other malignancies and pre-malignant conditions as well.
Plasma cell leukemia (PCL) is the most aggressive form of plasma cell dyscrasias and thus, represents a very serious and therapeutically challenging disease. Around 2% of all plasma cell dyscrasias are PCL.
PCL may present as primary plasma cell leukemia (pPCL), i.e. in patients without prior history of a plasma cell dyscrasia or as secondary plasma cell leukemia (sPCL), i.e. in patients previously diagnosed with a history of its predecessor dyscrasia such as MM.
For over a century, the level of circulating tumor cells (CTCs) has been assessed in MM to identify PCL. Even though MM is characterized by an intramedullary outgrowth of malignant plasma cells, the degree of hematogenous tumor cell dissemination is highly variable between patients. At the time of diagnosis, CTCs are routinely quantified in peripheral blood (PB) by morphology and can be detected in the majority of MM patients if flow cytometry is used. However, in only 2% of patients these levels are ≥20% or ≥2×109/L, which is pathognomonic for pPCL.
Symptomatic MM patients with lower CTC levels at diagnosis are classified as newly diagnosed MM (NDMM), but these may still develop sPCL after treatment.
Clinically, pPCL is considered a high-risk disease entity within MM. pPCL patients commonly present with a large tumor burden and extensive morbidity, show poor response to standard treatment and have a dismal overall survival.
Disease aggressiveness in pPCL is considered to be reflected by the presence of significantly higher CTC levels than in NDMM. Even though this was previously hypothesized to be the result of a spill over from a large intramedullary tumor, evidence is accumulating that altered molecular features involved in cell adhesion, evasion of apoptosis, migration, bone marrow (BM) independence and RNA metabolism are associated with this phenotype.
Yet, several reports have suggested that certain NDMM patients experience an equally aggressive disease course to that of pPCL, without having CTC levels ≥20%. Such NDMM patients are diagnosed as PCL-like MM.
Still, molecular determinants remain poorly understood, with conventional prognostic risk markers in NDMM (i.e. t (4;14), t (14;16) and deletion of chromosome 17p (del17p)) only being detectable in a subset of pPCL tumors.
Thus, a problem to be solved is for example the provision of means and methods to reliably and specifically identify a disease, for example a rare disease and/or a high grade of a disease, in a sample.
The present invention provides for the first time a marker set based on analysis of the transcriptomic profile for molecularly identifying diseases, for example cancer diseases, such as pPCL.
The present invention provides a marker set which has independent prognostic value in the context of conventional risk markers. The present invention facilitates for example a high sensitivity (93%) to detect pPCL, but also identified PCL-like MM in 11% of NDMM patients.
Hence, the present invention provides a marker set and methods for the determination of a novel and efficient high-risk biology that is, for example, already detectable in NDMM patients, despite not being clinically leukemic. Moreover, the present invention significantly improves the accuracy in diagnostics and treatment of rare diseases as well as the prognostic performance in the context of such disease.
The present invention refers to a marker set for determining a PCL-like transcriptomic status in a sample which is indicative for a disease, wherein the marker set comprises coding or non-coding genes associated to biological pathways and/or chromosomal location. The marker set according to the present invention indicates for example a rare disease and/or a high grading of the disease.
The marker set of the present invention is for example selected from the group consisting of cell adhesion marker, immune response marker, cell metabolism marker, tumor suppression marker, post-translational protein modification marker, (post-) transcriptional regulation marker, cellular (matrix) structure marker, cell migration marker, cell death marker, cell signaling marker, protein biogenesis and transport marker, cell proliferation marker, DNA damage response marker, or a combination thereof (see e.g., Hofste op Bruinink et al., J Clin Oncol 2022; Chakraborty & Lentzsch, J Clin Oncol 2022).
The marker set is for example selected from two or more or optionally all from the group of markers consisting of SDC1, IGLV3-19, PPAPDC1B, WDR11, ALG14, PHF19, TSC22D1, FAM174A, TSPAN3, CALU, TPM1, VCAM1, IDH2, P2RY6, ASAH1, IGHV1-69, FUCA1, STRN, CYSTM1, APH1B, SLAMF7, YIPF5, APOE, SPATS2, PRKCA, PSME4, SLFN11, RMDN3, CHID1, TMEM45A, TARSL2, DCLRE1C, TCTN3, DAP, DCK, SMOC1, EMC7, LINC00582, KDELR1, APOBEC3B, CRTAP, BRSK1, MZB1, ERI3, DERL3, CENPM, GDE1, FLNA, NCF4, DNASE1L3, ITGA8, SELENOM, AL159169.2, AC092620.1, or a combination thereof.
A sample according to the present invention is for example selected from plasma cell, blood, (pre-) malignant plasma cell, bone marrow, urine, serum, cells and tissue such as tumor tissue or tumor cells, or a combination thereof. In some embodiments the sample is from an individual afflicted with multiple myeloma.
The present invention also refers to a method for determining a PCL-like transcriptomic status in a sample which is indicative for a disease comprising the steps of
The score calculated in step c) is for example the lowest score that at least 90 to 100% of the samples in a reference have a higher score. For example, the score of step c) in the range of at least 1 to 7 is indicative for a disease corresponding to the disease of the reference of step d).
The method of the present invention further comprises for example the steps of
The tumor burden is for example determined based on the percentage of plasma cells in bone marrow, M-protein in serum and/or urine, the level of beta-2 microglobulin in serum, the level of lactate dehydrogenase in serum, by imaging, or a combination thereof.
The method optionally further comprises classifying the sample as having a high or standard SKY92 risk status, comprising determining in the sample the expression profile of each marker listed in Table 7.
The present invention further refers to a method for determining a treatment or prognosis for an individual afflicted with multiple myeloma, comprising:
In addition, the present invention is directed to a method for treating an individual afflicted with multiple myeloma, comprising:
Moreover, the present invention relates to a method for treating an individual afflicted with multiple myeloma, comprising:
In the methods for treating an individual afflicted with multiple myeloma of the present invention an individual is for example classified as having a PCL-like transcriptomic status and optionally a SKY92 high risk status is intensively monitored, and the individual is treated with quadruplet induction therapy including anti-CD38, high dose autologous stem cell transplantation therapy or a combination thereof. In these methods for example a bispecific antibody, a CAR T cell or a combination thereof is administered.
The PCL-like transcriptomic status determined by the method of the present invention indicates for example a high grading of a disease which correlates to at least one prognostic risk model. The at least one prognostic risk model is specific for the disease. For example the prognostic risk model is selected from the group consisting of R-ISS status, ISS status, FISH status, SKY92 status, UAMS70 status of NDMM, or a combination thereof.
The method of the present invention further comprises for example selecting an active agent, such as a chemotherapeutic, for treatment of a disease based on the PCL-like transcriptomic status in a sample.
The marker set or the method of the present invention indicates for example a disease selected from the group consisting of newly diagnosed multiple myeloma (NDMM), primary plasma cell leukemia (pPCL), secondary plasma cell leukemia (pPCL), progressive disease (PD), smoldering multiple myeloma (SMM), monoclonal gammopathy of undetermined significance (MGUS), plasmacytomas, Waldenström's macroglobulinemia, POEMS syndrome, breast cancer, lung cancer, malignant melanoma, lymphoma, skin cancer, bone cancer, prostate cancer, liver cancer, brain cancer, cancer of the larynx, gall bladder, pancreas, testicular, rectum, parathyroid, thyroid, adrenal, neural tissue, head and neck, colon, stomach, bronchi, kidneys, basal cell carcinoma, squamous cell carcinoma, metastatic skin carcinoma, osteo sarcoma, Ewing's sarcoma, reticulum cell sarcoma, liposarcoma, myeloma, giant cell tumor, small-cell lung tumor, islet cell tumor, primary brain tumor, meningioma, acute and chronic lymphocytic and granulocytic tumors, acute and chronic myeloid leukemia, hairy-cell tumor, adenoma, hyperplasia, medullary carcinoma, intestinal ganglioneuromas, Wilms tumor, seminoma, ovarian tumor, leiomyomatous tumor, cervical dysplasia, retinoblastoma, soft tissue sarcoma, malignant carcinoid, actinic keratosis, melanoma, pancreatic cancer, colon cancer, rhabdomyosarcoma, Kaposi's sarcoma, osteogenic sarcoma, malignant hypercalcemia, renal cell tumor, polycythermia vera, myeloproliferative disease, essential thrombocytosis, lymphoma, mastocytosis, myelodysplastic syndrome, clonal hematopoiesis of indeterminate potential, monoclonal B-cell lymphocytosis, chronic myelomonocytic leukemia, myelofibrosis, adenocarcinoma, anaplastic astrocytoma, glioblastoma multiforma, epidermoid carcinoma, a disease characterized by a circulating tumor cell, such as a circulating malignant plasma cell, or a combination thereof.
Furthermore, the present invention relates to a kit for determining a PCL-like transcriptomic status which is indicative for a disease comprising, probes, primers, or a combination thereof for determining an expression profile of a marker set of the present invention in a sample, optionally means for determining the CTC level in a sample and optionally means for determining the tumor burden in a sample. Optionally, the kit further comprises an active agent for use in a method of treating the disease detected by the marker. The expression profile of a marker set according to the present invention is for example determined using a microarray, next generation sequencing or qRT-PCR.
(5A) Line chart displaying the significance of the difference in scores between NDMM (n=109) and pPCL samples (n=15) in the discovery cohort, as determined with a two-sided Wilcoxon test for each number of genes in the classifier ranging from 25 to 422. Genes were previously selected and ranked based on the significance of their association with CTC levels. The dashed line represents the number of genes with which the highest significance was reached between scores of NDMM versus pPCL samples. (5B) Line chart representing the score per sample in the discovery cohort, computed over a range of gene numbers in the classifier. Per sample and per number of genes in the classifier, a score was computed according to a leave-one-out cross-validation procedure, as described in detail in the Examples. (5C) Principal component analysis plot using the centered expression values of 54 genes identified in the previous steps as input. PC1 represents the score that was determined on all n=124 samples from the discovery cohort and projected on all n=59 samples from the validation cohort.
The present invention provides a marker set for determining a plasma cell leukemia like (PCL-like) transcriptomic status in a sample which is indicative for a disease. The present invention is further directed to a method as well as kits for determining a PCL-like transcriptomic status in a sample which is indicative for a disease.
In addition, the present invention forms the basis for use of the marker set in identifying a disease for selecting an active agent, e.g., a chemotherapeutic or an antagonist or agonist modulating, i.e., decreasing or increasing, the expression of one or more genes of the marker set of the present invention, and therapy for preventing and/or treating the disease, respectively.
In the following, the features of the present invention will be described in more detail. It should be understood that embodiments may be combined in any manner and in any number to create additional embodiments. The variously described examples and embodiments should not be construed to limit the present invention to only the explicitly described embodiments. This description should be understood to support and encompass embodiments which combine the explicitly described embodiments with any number of the disclosed features. Furthermore, any permutations and combinations of all described features in this application should be considered disclosed by the description of the present application unless the context indicates otherwise.
Throughout this specification and the claims, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated member, integer or step or group of members, integers or steps but not the exclusion of any other member, integer or step or group of members, integers or steps. The terms “a” and “an” and “the” and similar reference used in the context of describing the invention (especially in the context of the claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by the context. Recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”, “for example”), provided herein is intended merely to better illustrate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.
All documents cited or referenced herein (“herein cited documents”), and all documents cited or referenced in herein cited documents, together with any manufacturer's instructions, descriptions, product specifications, and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated herein by reference, and may be employed in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference.
The marker set of the present invention determines a PCL-like transcriptomic status in a sample which is indicative for a disease. Any disease which is associated to the PCL-like transcriptomic status is identified by the marker set and method of the present invention. Accordingly, such diseases are equally designated as PCL-like diseases.
The PCL-like transcriptomic status determined by the marker set or method of the present invention is indicative for rare diseases or a high grading of a disease. A high grading of a disease is considered as a high-risk disease causing high morbidity, low overall survival (OS) and low progression free survival (PFS). A high grading of a disease also comprises high-risk cancers which are further characterized to recur (come back), or spread. A rare disease also comprises severe diseases and/or a high grade of a disease. In an exemplary embodiment, the presence of a PCL-like transcriptomic status in a sample from an individual afflicted with multiple myeloma classifies said individual as having a poor prognosis.
The marker set and the method of the present invention determines a PCL-like transcriptomic status in a sample which is indicative for several diseases. Such diseases for example are selected from the group consisting of newly diagnosed multiple myeloma (NDMM), primary plasma cell leukemia (pPCL), secondary plasma cell leukemia (pPCL), progressive disease (PD), smoldering multiple myeloma (SMM), monoclonal gammopathy of undetermined significance (MGUS), plasmacytomas, Waldenström's macroglobulinemia, POEMS syndrome, breast cancer, lung cancer, malignant melanoma, lymphoma, skin cancer, bone cancer, prostate cancer, liver cancer, brain cancer, cancer of the larynx, gall bladder, pancreas, testicular, rectum, parathyroid, thyroid, adrenal, neural tissue, head and neck, colon, stomach, bronchi, kidneys, basal cell carcinoma, squamous cell carcinoma, metastatic skin carcinoma, osteo sarcoma, Ewing's sarcoma, reticulum cell sarcoma, liposarcoma, myeloma, giant cell tumor, small-cell lung tumor, islet cell tumor, primary brain tumor, meningioma, acute and chronic lymphocytic and granulocytic tumors, acute and chronic myeloid leukemia, hairy-cell tumor, adenoma, hyperplasia, medullary carcinoma, intestinal ganglioneuromas, Wilms tumor, seminoma, ovarian tumor, leiomyomatous tumor, cervical dysplasia, retinoblastoma, soft tissue sarcoma, malignant carcinoid, actinic keratosis, melanoma, pancreatic cancer, colon cancer, rhabdomyosarcoma, Kaposi's sarcoma, osteogenic sarcoma, renal cell tumor, polycythemia vera, myeloproliferative disease, essential thrombocytosis, lymphoma, mastocytosis, myelodysplastic syndrome, clonal hematopoiesis of indeterminate potential, monoclonal B-cell lymphocytosis, chronic myelomonocytic leukemia, myelofibrosis, adenocarcinoma, anaplastic astrocytoma, glioblastoma multiforma, epidermoid carcinoma, a disease characterized by a circulating tumor cell, such as a circulating malignant plasma cell, or a combination thereof.
The marker set of the present invention is for example selected from coding or non-coding genes. Such genes are for example associated to biological pathways and/or a chromosomal location.
For example the marker set is selected from the group consisting of adhesion marker, immune response marker, cell metabolism marker, tumor suppression marker, post-translational protein modification marker, (post-) transcriptional regulation marker, cellular (matrix) structure marker, cell migration marker, cell death marker, cell signaling marker, protein biogenesis and transport marker, cell proliferation marker, DNA damage response marker, or a combination thereof.
For example, the marker set is selected from the group of markers as shown in Table 5 consisting of SDC1, IGLV3-19, PPAPDC1B, WDR11, ALG14, PHF19, TSC22D1, FAM174A, TSPAN3, CALU, TPM1, VCAM1, IDH2, P2RY6, ASAH1, IGHV1-69, FUCA1, STRN, CYSTM1, APH1B, SLAMF7, YIPF5, APOE, SPATS2, PRKCA, PSME4, SLFN11, RMDN3, CHID1, TMEM45A, TARSL2, DCLRE1C, TCTN3, DAP, DCK, SMOC1, EMC7, LINC00582, KDELR1, APOBEC3B, CRTAP, BRSK1, MZB1, ERI3, DERL3, CENPM, GDE1, FLNA, NCF4, DNASE1L3, ITGA8, SELENOM, AL159169.2, AC092620.1, or combinations thereof.
The marker set of the present invention comprises for example a combination of two or more markers selected from the groups as disclosed above. The marker set for example comprises a combination of three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, 10 or more, 13 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, 55 or more, 60 or more, 65 or more, 70 or more, 75 or more, 80 or more, 85 or more, 90 or more, 95 or more, 100 or more, 125 or more, 150 or more, 175 or more, 200 or more, 200 or more markers selected from the groups as disclosed above. It is clear to the skilled person that selecting two or more comprises selecting all markers.
For example, the marker set comprises all markers selected from the group consisting of SDC1, IGLV3-19, PPAPDC1B, WDR11, ALG14, PHF19, TSC22D1, FAM174A, TSPAN3, CALU, TPM1, VCAM1, IDH2, P2RY6, ASAH1, IGHV1-69, FUCA1, STRN, CYSTM1, APH1B, SLAMF7, YIPF5, APOE, SPATS2, PRKCA, PSME4, SLFN11, RMDN3, CHID1, TMEM45A, TARSL2, DCLREIC, TCTN3, DAP, DCK, SMOC1, EMC7, LINC00582, KDELR1, APOBEC3B, CRTAP, BRSK1, MZB1, ERI3, DERL3, CENPM, GDE1, FLNA, NCF4, DNASE1L3, ITGA8, SELENOM, AL159169.2, AC092620.1.
The PCL-like transcriptomic status in a sample refers to an expression profile determined by the marker set of the present invention. The expression profile of the marker set according to the present invention is determined in a sample by measuring the individual expression levels of each marker comprised in the marker set of the present invention. It is clear to the skilled person that a marker set comprises single markers which for example represent genes.
An expression level for example refers to detectable nucleic acid molecules. The nucleic acid molecules are for example detected by probes, primers or combinations thereof. Development and identification of such probes and/or primers facilitating specific binding and detection of the nucleic acid molecules of the marker set according to the present invention is performed according to the standard methods known to a person skilled in the art.
The expression level of nucleic acid molecules are determined by any method known in the art including for example RT-PCT, quantitative PCR, Northern blotting, gene sequencing, in particular RNA sequencing, for example Next Generation Sequencing (NGS), and gene expression profiling techniques, such as multiplex chip techniques such as microarray.
For example the nucleic acid molecule is RNA, such as mRNA and/or pre-mRNA or DNA, such as cDNA. The level of RNA or DNA expression determined is detected directly or indirectly, for example by generating cDNA and/or by amplifying the RNA/cDNA.
General methods for RNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al. (1997) Current Protocols of Molecular Biology, John Wiley and Sons. Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp & Locker (1987) Lab Invest. 56: A67, and De Andres et al., BioTechniques 18:42044 (1995). For example, RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions (QIAGEN Inc., Valencia, Calif.). For example, total RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns. Numerous RNA isolation kits are commercially available and can be used in the methods of the invention.
The expression levels of the marker set for example refers to the protein levels translated from the mRNAs of the markers comprised in the marker set of the present invention. Determining the expression levels of the marker set by protein detection may be performed by any method known in the art including ELISA, immunocytochemistry, flow cytometry, Western blotting, proteomic as well as mass spectrometry. Protein detection as used herein may include detection of full-length proteins, truncated proteins, peptides, polypeptides and combinations thereof.
General methods for protein purification are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al. (1997) Current Protocols of Molecular Biology, John Wiley and Sons. For example, protein purification can be performed using purification kit, buffer set and protease from commercial manufacturers.
The expression level is an absolute value, but a “normalized” expression level. Normalization refers adjusting levels measured on different scales to a notionally common scale, optionally prior to averaging. Normalization is particularly useful when expression is determined based on microarray data. Normalization facilitates the correction of variations for example within microarrays and across samples so that data from different chips can be simultaneously analyzed. The robust multi-array analysis (RMA) algorithm is optionally used to pre-process probe set data into gene expression levels for all samples. (Irizarry R A, et al., Biostatistics (2003) and Irizarry R A, et al., Nucleic Acids Res. (2003)). In addition, Affymetrix's default preprocessing algorithm (MAS 5.0), is optionally also employed. Additional methods of normalizing expression data are described in US20060136145.
For example, the levels of expression can be normalized against housekeeping or another reference gene expression. For example, in microarray data, specific normalization methods for background correction, probe summarization into exon, transcript or gene level expression values and scaling of the within and/or between array expression values are employed depending on the array platform manufacturer. For example, in Affymetrix microarray data, the MAS5 algorithm is employed. Optionally, the MAS5 scaling step is replaced by other methods such as loess transformation, quantile normalization, variance stabilizing normalization, (robust) spline normalization, or others.
For example, in RNAseq, standard mapping and/or quantification software like Salmon, Kallisto, or others are employed to obtain values reflecting the log scaled expression levels of the marker set (e.g. in terms of TPM, RPKM, FPKM, counts, etc.).
For applicability to the classifier, these normalized values are optionally additionally normalized in order to be compatible with the reference transcriptome. For example, this entails single sample transformations like a non-linearly transformation by e.g. robust spline normalization toward the reference expression profile or require parameter assessment (e.g. mean and standard deviation per gene) per batch (i.e. a collection of sample expression values obtained from samples that underwent a comparable processing in terms of sample storage and workup, reagents, processing times, etc.). These batch normalization parameters must be determined based on data comparable to the reference expression profile (i.e. a demographic and clinically homogeneous population of sufficient size e.g., suffering from NDMM), such that batch correction can be applied to any future sample (including e.g., non-NDMM) that underwent comparable processing. For example, batch specific means and standard deviations per gene are shifted and scaled respectively toward the mean and standard deviations per gene as observed in the reference transcriptome.
For example, the expression levels of the marker set in a sample are normalized to indicate an increase or decrease of the expression of the markers in the marker set. The expression profile in a sample constituting from the individual expression levels of the single two or more markers, is for example compared to the reference expression profile of the marker set to determine whether the subject expression profile is sufficiently similar to the reference profile.
Alternatively, the expression profile of the sample is compared to a more than one reference expression profiles to select the reference expression profile that is most similar to the subject expression profile.
The reference expression profile is for example a predetermined expression profile. Alternatively the expression profile of a reference is determined when determining the marker set expression profile in the sample. The reference expression profile is for example the average of the expression profiles in a particular group of samples, such as a group of disease samples. For example, the reference expression profile is the average of the expression profiles in a group of rare disease samples or samples of high grading of a disease.
For example, for normalization purposes, the reference expression profile is a demographic (e.g. gender, age, race, etc.) and clinically (e.g. chromosomal aberrations, disease grade, etc.) homogeneous population of n>50 for which the mean expression and its standard deviation per gene are characteristic. For example, the reference expression profile is a demographic and clinically homogeneous population of n between 50 to 500, n between 75 to 400, n between 100 to 350, n between 150 to 300, or n between 200 to 250. For example, the reference expression profile is a demographic and clinically homogeneous population of n=154.
Any method known in the art for comparing two or more data sets to detect similarity between them may be used to compare the expression profile of the sample to the reference expression profiles.
In machine learning and statistics, classification refers to identifying to which set of categories a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. Many classifiers are known in the art, with linear or non-linear classifier boundaries, such as but not limited to: ClaNC, nearest mean classifier, weighted voting method, simple Bayes classifier, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), Support Vector Machines (SVM), or the k-nearest neighbor (k-nn) classifier.
The PCL-like transcriptomic status determined by the marker set of the present invention represents an expression profile of a sample. A PCL-like transcriptomic status determined by the marker set of the present invention is for example indicative for a disease if the expression profile is similar to the expression profile of a sufficiently large reference group of disease samples, based on a score that is equal or larger than the lowest scoring disease sample in this reference group, to a score equal or larger than the highest disease sample.
In one example the PCL-like transcriptomic status is an expression profile of a sample that is similar to the transcriptomic profiles of a sufficiently large reference cohort of pPCL bone marrow tumor samples, based on a score that is equal or larger than the lowest scoring pPCL sample in this reference cohort, to a score equal or larger than the highest score of 10% of the lowest scoring pPCL samples in this reference cohort.
The score is for example calculated by computing the first principal component from the expression profile of the marker set. For example the score is calculated by computing the first principal component of the expression profile of the marker set in the classifier's discovery data. In one example, the classifier's discovery data is obtained from a demographically (e.g. gender, age, race, etc.) and clinically (e.g. chromosomal aberrations, disease grade, etc.) homogeneous population of n=109 NDMM patients. Any means for calculating the first principal component may be used. For example, principal components are determined using the “prcomp” function in R package “stats” (version 4.0.2) according to R Core Team REfSC: R: A Language and Environment for Statistical Computing, 2020. It is within the purview of the skilled person to obtain a suitable sample for determining a PCL-like transcriptomic status by the marker set of the present invention. For example, the sample is selected from plasma cell, blood, (pre-) malignant plasma cell, bone marrow, urine, serum, cells and tissue, such as tumor tissue or tumor cells, or a combination thereof.
The present invention likewise refers to a method for determining a PCL-like transcriptomic status indicative for a disease in a sample comprising the steps of
Isolation of RNA may be performed by any suitable method known in the art and as described herein, respectively.
For example total RNA of sufficient quality and quantify is isolated from a tumor. RNA quantification is performed, and values are normalized to obtain read-outs which are compatible with the classifiers discovery setting.
The marker set for determining the expression profile in a sample is chosen as described herein. For example, the method comprises determining the expression profile of all markers of Table 5.
The expression levels of the single markers as well as the expression profile of the marker set is determined by any means of the art, e.g., as described herein.
For example, total RNA is isolated from the sample. RNA quantity and quality are assessed. Tumor cells optionally comprise ≥80% of the cells in the sample as assessed by flow cytometry (or ≥90% morphologically) and a Bioanalyzer RNA integrity number ≥7.
Quantification of the RNA can be performed on any platform (e.g. microarray, NGS RNASeq, qRT-PCR, etc.), e.g., if a kit is used according to the manufacturer's instructions. The first normalization steps is for example performed according to the manufacturer's instructions. Quantifications is for example summarized in terms of the Ensembl v74 gene model, and expressed on log 2 scale (e.g. log 2 intensity for microarray, log 2 (TPM+1) for RNASeq, or ΔCt for qRT-PCR).
Depending on the platform used, additional corrections and normalization are required as described herein.
Calculating the score which is based on the first principal component of the expression profile of the marker set will be performed as described herein.
Further, the method includes comparing the calculated score to a reference score. The reference score is for example based on the expression profile of a reference as described herein. For example, the reference score is based on the first principal component of the expression levels of the marker set of the present invention in the reference. The reference varies and is selected depending on the score to be determined.
For example, the reference score is predetermined or determined in parallel to the determination of the score in a sample. Alternatively, the reference score is a generally established score which is indicative for a disease.
A reference is used for comparison and classification of the measurements and analysis obtained by the present invention. For example, reference refers to pPCL (e.g. when determining the reference score), or reference refers to PCL and NDMM (e.g. when calculating the principal components), or reference refers to NDMM (e.g. in case of the normalization steps).
The calculated score of the expression profile determines the PCL-like transcriptomic status of the sample indicating a disease. For example, the reference score is the lowest score that 100% of the samples in a reference have a higher score. For example, the reference score is the lowest score that at least 60%, at least 65%, at least 70%, at least 75%, at least 80% at least 85%, at least 90% at least 95% at least 95% of the samples in a reference have a higher score. For example, the reference score is the lowest score that at least 70 to 90%, at least 75 to 95% at least 80 to 97%, at least 85 to 99%, or at least 90 to 100% of the samples in a reference have a higher score.
For example, the score which indicates a disease corresponding to the disease of the reference is in the range of at least 1 to 7. For example, the score which indicates a disease corresponding to the disease of the reference is in the range of at least 0.1 to 15, of at least 0.3 to 15, of at least 0.5 to 15, of at least 1 to 10, of at least 1.5 to 8, of at least 2 to 7, of at least 2.5 to 5 or of at least 3 to 7, or a combination thereof. For example, the score which indicates a disease corresponding to the disease of the reference is at least 0.1, at least 0.3, at least 0.5, at least 0.7, at least 1.0, at least 1.5, at least 2.0, at least 2.5, at least 3.0, at least 3.5, at least 4.0, at least 4.5. In one example, the score which indicates a disease corresponding to a pPCL reference is at least 3.55.
The method of the present invention optionally further comprises determining a CTC level in a sample. The CTC level in a sample is for example determined by quantification according to any suitable method of the art. For example, the CTC level in a sample is quantified using flow cytometry (e.g., FACS), VDJ sequencing, morphologically, or using immunocapture technologies. For example, the CTC level is quantified as described in the following examples using flow cytometry. For example, the CTC level is quantified as described in the following examples using Next Generation Flow (NGF).
Further, the method optionally comprises determining the tumor burden. Tumor burden is for example generally be determined based on the percentage of plasma cells in bone marrow, M-protein in serum and/or urine, the level of beta-2 microglobulin in serum, the level of lactate dehydrogenase in serum or by imaging.
The expression profile of the marker set in a sample is for example referenced to the CTC level in a sample. For example, the combination of the expression profile in a sample and the CTC level for example strengthens the indication of a disease by the method of the present invention.
A CTC level indicating a rare disease or a high grade of a disease in the sample is for example between 0.001 to 100%, 0.01 to 100%, 0.1 to 100%, 1 to 100%, or 5 to 100%. For example, the CTC level in the sample is 0.001 to 99%, 0.01 to 98%, 0.05 to 97%, 0.1 to 96%, 0.5 to 95%, 1 to 94%, 3 to 93%, 5 to 92%, 7 to 92%, 9 to 90%, 10 to 85%, 12 to 80%, 15 to 75%, 17 to 70%, 18 to 65%, 19 to 60%, 20 to 55%, 22 to 50%, 25 to 45%, 30 to 55%, 35 to 60%, or 40 to 80%. Further, the CTC level indicating a rare disease or a high grade disease in the sample is for example ≥5%, ≥7%, ≥10%, ≥12%, ≥15%, ≥17% or ≥20%. In an example, a CTC level ≥2×109/L is indicating pPCL.
Determining the CTC level in a sample having a marker set expression profile that corresponds to the marker set expression profile of a reference having an increased CTC level, e.g., of 5 to 30%, facilitates discrimination between two diseases or grades of a diseases. Moreover, a CTC level serves as a threshold allowing differentiation between two diseases or grades of a disease. For example, a CTC level of at least 5% or 20% is used as a threshold. In one example, a CTC level of ≥5% in a sample indicates pPCL, wherein a CTC level in a sample of <5% indicates NDMM. Alternatively, a CTC level of ≥20% in a sample indicates pPCL, wherein a CTC level in a sample of <20% indicates PCL-like MM.
Optionally, the CTC level in a sample is referenced to the tumor burden. Referencing the CTC level to the tumor burden for example further strengthens the validity of the CTC level in a sample. For example, lower CTC levels in the sample of a subject, e.g. a CTC level of <5%, is associated with a lower tumor burden in the subject. On the contrary, higher CTC levels are for example associated with a higher tumor burden.
Referencing the expression profile in a sample to the CTC level referenced to the tumor burden for example further strengthens the indication of a disease by the method of the present invention.
Optionally, the expression profile is referenced to the molecular profile (i.e., mutational, copy number and cytogenetic profile) of the sample. Optionally the expression profile is referenced to the epigenetic profile (for instance the methylome) of the sample. Determining the molecular profile and the epigenetic profile is performed according to the standard methods known to a person skilled in the art. It is clear to the skilled person that the expression profile is for example referenced to one or more selected from the group consisting of CTC level, tumor burden, molecular profile, epigenetic profile, or a combination thereof.
The marker set and the method of the present invention enables detection of diseases, such as rare diseases and/or high-grading diseases which are hardly or not or at least nor reliably detectable by any method of the state of the art. Once the disease is detected by the present invention, it's severity is optionally double checked by at least one prognostic risk model known in the art for the specific disease.
A prognostic risk model grades the disease progression and defines the state on a disease. A prognostic risk model optionally provides information about disease progression, survival, treatment response or a combination thereof in a subject.
For example, prognostic risk models for plasma cell dyscrasias like MM comprise R-ISS, ISS, FISH, SKY92, UAMS70, Durie-Salmon Staging etc.
Both, the International Staging System (ISS) and the revised International Staging System (R-ISS) have been developed by the International Myeloma Working Group (IMWG), providing a prognostic risk model based on the serum β2 microglobulin (Sβ2M) value and serum albumin value in a subject. For the R-ISS two additional prognostic factors have been incorporated which are the risk of chromosomal abnormalities (CA) as assessed by fluorescence in-situ hybridization (FISH) and the serum level of lactate dehydrogenase level (LDH).
FISH is used to screen for chromosomal abnormalities and allows cytogenetic risk stratification of myeloma. Subjects are considered to have high-risk disease if FISH studies demonstrate for example one of the following chromosomal abnormalities: t (14;16), t (4;14), or loss of p53 gene locus (del (17p) or monosomy 17).
For example, the method according to the present invention further comprises determining the grade of a disease according to at least one prognostic risk model as described above. For example, the at least one prognostic risk model is selected from the group consisting of R-ISS status, ISS status, FISH status, SKY92 status, UAMS70 status or a combination thereof.
In particular, methods are provided for classifying, determining a treatment, or determining the prognosis of an individual, said method comprising determining a PCL-like transcriptomic status from a sample from said individual, as disclosed herein, and determining the SKY92 risk status from the sample. The SKY92 risk status may be determined by measuring the expression levels of the markers in Table 7 and classifying the individual as having a high or standard SKY92 risk status based on said expression levels. As exemplified in
Further, the present invention comprises the selection of a treatment of a disease in a subject in need thereof based on the PCL-like transcriptomic status. For example, the marker set or the method of the present invention is used for selecting a therapy to prevent and/or treat a disease in a subject in need thereof. For example, the marker set or the method of the present invention is used for selecting an active agent for preventing and/or treating a disease in a subject in need thereof.
For example, based on the PCL-like transcriptomic status determined according to the present invention, a cancer treatment is selected. For example, based on the PCL-like transcriptomic status determined according to the present invention an active agent such as an “adjuvant treatment” is selected. Adjuvant treatment, as used herein, refers to the administration of one or more drugs to a patient after surgical resection of one or more cancerous tumors, where all resectable disease (i.e. cancer) has been removed from the patient, but where there remains a statistical risk of relapse. Adjuvant treatment is useful to diminish the likelihood or the severity of reoccurrence or the disease.
Active agents are for example selected from the group consisting of a chemotherapeutic, targeted therapy drugs, immunotherapy drugs, an antagonist modulating the expression of one or more genes of the marker set of the present invention, or a combination thereof.
For example, the active agent is selected from the group consisting of monoclonal antibodies (e.g., daratumumab (Darzalex), elotuzumab (Empliciti)), BCL-2 inhibitors (e.g., venetoclax (Venclexta), navitoclax), selinexor, PRC2 inhibitors, nucleoside analogs, dacarbazine (DTIC), temozolomide (Temodal), carboplatin (Paraplatin, Paraplatin AQ), paclitaxel (Taxol), cisplatin (Platinol AQ), andvinblastine and (Velbe), BRAF inhibitors (vemurafenib (Zelboraf) and dabrafenib (Tafinlar)) and MEK inhibitors (cobimetinib (Cotellic) and trametinib (Mekinist)), BTK inhibitors, cytokines (e.g., Interferon alfa-2b or Interleukin-2) immune checkpoint inhibitors (e.g., Ipilimumab (Yervoy), Nivolumab (Opdivo), Pembrolizumab (Keytruda)), proteasome inhibitors (e.g., bortezomib (Velcade), carfilzomib (Kyprolis), ixazomib (Ninlaro)), immunomodulators (e.g., thalidomide, lenalidomide (Revlimid), pomalidomide), CAR-T cells, bispecific antibodies, NK cell therapy, autologous stem cell transplantation, allogenic stem cell transplantation, radiation therapy, oncolytic immunotherapy, or a combination thereof.
Individuals classified as having a SKY92 high risk status are preferably treated more aggressively (e.g., quadruplet induction therapy including anti-CD38 and high dose autologous stem cell transplantation therapy), and better patient monitoring, than individuals having a SKY92 standard risk status. Individuals classified as having a PCL-like transcriptomic status should receive more aggressive treatment (e.g., quadruplet induction therapy including anti-CD38 and high dose autologous stem cell transplantation therapy), and better patient monitoring than individuals that do not have a PCL-like transcriptomic status. Individuals classified as having a SKY92 high risk status and a PCL-like transcriptomic status are treated more aggressively than individuals without a PCL-like transcriptomic status and a SKY92 low risk status. Aggressive treatment comprises for example quadruplet induction therapy including anti-CD38 and high dose autologous stem cell transplantation therapy, and better patient monitoring. Additional therapy for patients with PCL high risk profile comprises for example experimental treatment with bispecific antibodies and CAR-T cell approaches.
Within the research field of multiple myeloma, trial designs have started to focus on high-risk disease specifically, i.e., it has become particularly relevant to perform adequate diagnostic assessments at baseline to screen patients for inclusion in these kind of trials (e.g., the MUKnine OPTIMUM trial (NCT03188172)). High-risk status defined by the present invention is for example used as an inclusion criterium for risk-adapted trials.
Further, it is helpful to know high-risk status at baseline for example to enable clinicians to better monitor the patients during and after treatment, as these patients may suffer from highly proliferative progressive disease (PD). To allow for an earlier treatment start of aggressive PD, these patients could benefit from more intensified follow-up protocols with for instance Minimal Residual Disease (MRD) assessments with Next Generation Flow (NGF) or Next Generation Sequencing (NGS) approaches.
Suitable active agents are for example administered by any appropriate route. Suitable routes include oral, rectal, nasal, topical (including buccal and sublingual), vaginal, and parenteral (including subcutaneous, intramuscular, intravenous, intradermal, intrathecal, and epidural).
For example, the marker set or the method of the present invention is used in the field of personalized medicine for individually treatment of a subject in need thereof.
The present invention is further directed to a kit for determining a PCL-like transcriptomic status which is indicative for a disease. The kit comprises or consists of means for determining the expression profile of the marker set of the present invention in a sample. Such means facilitate specific detection and/or binding to the one or more genes comprised by the marker set of the present invention. For example, such means are required for performing qRT-PCR, gene sequencing, microarrays etc.
It is well within the purview of the skilled person to identify and develop such means facilitating specific binding to the marker set of the present invention. For example, such means comprise reagents, probes, primers, proteins, peptides, antibodies, antibody fragments, antigens etc.
In some embodiments the kits comprise primer pairs or probes specific for the marker sets described herein. In some embodiments the kits comprise primer pairs or probes for housekeeping genes. In some embodiments, the kits further comprising one or more of the following: DNA polymerase, deoxynucleoside triphosphates, buffer, and Mg2+. In some embodiments, the kits comprise a control nucleic acid for one or more, preferably for each, primer pair. Preferably, the control nucleic acid is cDNA and more preferably the cDNA corresponds to a sequence that spans at least one intron/exon boundary of the respective gene. Such cDNA is useful to distinguish gene expression from genomic contamination. In some embodiments, one or more primers of the primer pair are chemically modified. Such modified primers include fluorescently or radioactively labeled primers.
Optionally, the kit further comprises means for determining the CTC level in a sample. For example, the kit comprises means for performing flow cytometric measurements.
Optionally, the kit further comprises means for determining the tumor burden in a sample. For example, the kit comprises means for performing flow cytometric measurements.
Identification and development means for determining the CTC level in a sample and means for determining tumor burden is performed according to the standard methods known to a person skilled in the art.
Optionally, the kit further comprises means for determining the grade of a disease according to prognostic risk model, such as R-ISS status, ISS status, FISH status, SKY92 status UAMS70 status, TP53 mutational status or a combination thereof.
Identification and development means for determining the grade of a disease according to a prognostic risk model is performed according to the standard methods known to a person skilled in the art. Such means for example comprise probes, primers, reagents, dyes, fluorescent probes, proteins, peptides, antibodies etc.
The kit as described herein, optionally further comprises an active agent for use in method of preventing and/or treating a disease, for example a rare disease or a high grade of a disease.
Optionally the kit of the present invention further comprise instructions for use of the kit and/or interpretation of the measurements obtained by the kit. Moreover, the kit comprises for example suitable references and reference scores, respectively.
In addition or alternatively, the kit of the present invention further comprises for example means for sample collection, sample processing, sample storage, product insert, or combinations thereof.
A subject and/or patient of the present invention is for example a mammalian such as a human, cat, dog or horse, a bird or a fish.
This study consisted of two main phases:
Construction and validation of a molecular classifier for plasma cell leukemia-like (PCL-like) disease (cohort 1):
The PCL-like classifier was validated in a separate cohort consisting of NDMM and pPCL samples (validation cohort).
Assessment of the prevalence and prognostic value of a classifier for PCL-like disease (cohort 2):
Additional datasets together with the discovery and validation cohort were leveraged to assess the prevalence of PCL-like transcriptomic status in a range of CD138-enriched plasma cell samples. These included healthy plasma cells, monoclonal gammopathy of undetermined significance (MGUS), smoldering MM (SMM), NDMM, pPCL, circulating tumor cell (CTC) and cell line samples (prevalence cohort).
The association of PCL-like transcriptomic status with progression-free survival (PFS) and overall survival (OS) was assessed in both univariate, meta-analysis and multivariate models in a subset of patients from the prevalence cohort, comprising a total of seven NDMM cohorts, which were independent of the discovery and validation cohort (survival cohort).
All human investigations in this study were performed after approval by medical ethical committees. All patients included in this study have provided written informed consent, in concordance with the Declaration of Helsinki.
In this cohort, patients from the Cassiopeia trial (NCT02541383) (n=171) were included, who had been enrolled in a hospital in Belgium or the Netherlands, as well as patients from the EMN12/HO129 (EudraCT 2013-005157-75) (n=51) and HO143 trials (EudraCT 2016-002600-90) (n=126), of whom baseline CTC levels had been quantified (see, Moreau P. et al., Lancet 394:29-38, 2019; Zweegman S. et al., Blood 134:695-695, 2019; Musto P. et al., Blood 134:693-693, 2019). A subset of patients with transcriptomic data of their bone marrow (BM) tumor cells was selected for either the discovery (n=124) or validation phase (n=59) of the PCL-like classifier (unpublished tumor transcriptomic profiles; deposited under accession numbers GSE164701, GSE164830 and GSE164703).
In this cohort, all patients with available tumor transcriptomics from the discovery and validation sets were included, as well as patients with unpublished tumor transcriptomic profiles from the EMN02/HO95 trial (unpublished tumor transcriptomic profiles; deposited under accession number GSE164706) and 7 previously published datasets with transcriptomic data from plasma cells.
N=22 healthy plasma cell samples, n=44 MGUS and n=12 SMM CEL files were downloaded from the Gene Expression Omnibus (GEO) (GSE5900), as well as n=328 HOVON-65/GMMG-HD4 (GSE19784), n=180 HOVON-87/NMSG-18 (GSE87900), n=247 MRC-IX (GSE15695), n=345 Total Therapy 2 (GSE24080), n=214 Total Therapy 3 (GSE24080) and n=4 MM cell line (GSE159289) CEL files. NDMM patients from the HOVON-65/GMMG-HD4, HOVON-87/NMSG-18, EMN02/HO95, MRC-IX, Total Therapy 2 and Total Therapy 3 were included in subsequent analyses if a baseline tumor sample had been obtained from BM and if data on both patient age, progression-free survival (PFS) and overall survival (OS) were available.
For a subset of EMN02/HO95 NDMM samples (n=123), tumor transcriptomic data from both microarray and RNA Seq data were generated (unpublished tumor transcriptomic profiles; deposited under accession number GSE164847). Paired microarray and RNA Seq data were used to compare classifier scores between platforms, whereas only microarray data of these patients were used in all other analyses.
Data from patients enrolled in the HOVON-65/GMMG-HD4, HOVON-87/NMSG-18, EMN02/HO95, Cassiopeia and EMN12/HO129 trials were used for the comparison of baseline data between intramedullary (i-MM), PCL-like MM and pPCL patients, as a comparable set of baseline characteristics was available from these trial cohorts. The same cohort was used for comparison of ssGSEA scores between i-MM, PCL-like MM and pPCL tumor samples.
Transcriptomic data from CTCs were obtained from patients from the EMN12/H0129 cohort. For comparisons of scores between BM and CTC samples from pPCL patients, only pre-treatment samples were used.
MM cell lines in our dataset were represented by transcriptomic profiles from the OPM-2, EJM, MOLP-8 and JJN-3 cell lines (see Katagiri S. et al., Int J Cancer 36:241-6, 1985; Hamilton M S. et al, 1990; Matsuo Y. et al., Leuk Res 28:869-77, 2004; Jackson N. et al, Clin Exp Immunol 75:93-9, 1989).
This cohort consisted of all NDMM patients from the prevalence cohort, who had been included in the HOVON-65/GMMG-HD4 (EudraCT 2004-000944-26), HOVON-87/NMSG-18 (EudraCT 2007-004007-34), EMN02/HO95 (EudraCT 2009-017903-28), MRC-IX (ISRCTN68454111), Total Therapy 2 (NCT00083551) and Total Therapy 3 (A: NCT00081939, B: NCT00572169) studies. Please refer to the respective study publications and/or trial registers for a detailed description on patient eligibility criteria and used treatment protocols, which have been summarized in Table 3 (see Sonneveld P. et al, J Clin Oncol 30:2946-55, 2012; Zweegman S. et al, Blood 127:1109-16, 2016; Cavo M. et al, Lancet Haematol 7: e456-e468, 2020; Morgan G. J. et al, Blood 118:1231-8, 2011; Morgan G. J. et al, Haematologica 97:442-50, 2012; Barlogie B. et al, Int J Hematol 76 Suppl 1:337-9, 2002; Barlogie B. et al, Br J Haematol 138:176-85, 2007).
Only sample processing procedures for samples from in the discovery, validation and EMN02/HO95 prevalence cohorts are discussed. Please refer to the original publications for additional information on specific sample processing procedures for all other cohorts (see Zhan F. et al, Blood 109:1692-700, 2007; Broyl A. et al, Blood 116:2543-53, 2010; Kuiper R. et al, Blood Adv 4:6298-6309, 2020; Dickens N. J. et al, Clin Cancer Res 16:1856-64, 2010; Zhan F. et al, Blood 108:2020-8, 2006; van Beers E. H. et al, J Mol Diagn 23:120-129, 2021).
Before treatment start, a BM aspirate sample was collected for all NDMM patients, whereas both a BM and peripheral blood (PB) sample were obtained from pPCL patients. Samples were shipped to Erasmus MC, Rotterdam, the Netherlands by overnight express courier and de-identified upon receipt. Tumor cell enrichment was generally performed within 36 hours after sampling, by means of CD138 positive cell selection with the EasySep™ Human Whole Blood and Bone Marrow CD138 Positive Selection Kit II (STEMCELL Technologies, catalog number 17887RF) on the mononuclear cell fraction. After tumor cell enrichment, aliquots of generally 1×106 cells were lysed in 600 μL RLT Plus buffer (Qiagen, catalog number 1053393), snap frozen in liquid nitrogen and stored at −80° C.
Of all enriched tumor samples in this study, purity was assessed after CD138 positive cell selection. Purity assessment was performed by both flow cytometry and morphology for each sample, with morphological purity assessment alone being performed if cell numbers were limited.
For morphological purity assessment, one cytospin was generated of a single cell suspension of 33×103 cells, followed by a May-Grünwald-Giemsa staining. Per slide, 100-200 intact cells were evaluated by a specialist in hemato-cytology. Purity assessment by flow cytometry was performed on a FACSCanto II (BD) machine. To this end, 1×105 cells were stained with a staining panel including CD138-PE (Beckman Coulter, catalog number A54190), CD38-PE-Cy7 (BD, catalog number 335825), CD45-APC (BD, catalog number 555485), annexin-FITC (Tau Technologies, catalog number A700) and DAPI (Thermo Fisher Scientific, catalog number D3571). Flow cytometric sample purity was defined as the percentage of CD45−/dimCD38+/++ events within a population of DAPI-leukocytes.
Total RNA was isolated with the AllPrep DNA/RNA Mini Kit (Qiagen, catalog number 80204). RNA quantity and quality were assessed on a NanoDrop 3300 fluorometer (ThermoFisher Scientific), whereas the RNA integrity number (RIN) was measured on a Bioanalyzer 2100 machine (Agilent) with the RNA 6000 Nano Kit (Agilent, catalog number 5067-1511).
Tumor samples were selected for subsequent transcriptomic profiling if these had a tumor purity of ≥80% as assessed by flow cytometry (or ≥90% morphological purity, in case no flow cytometric purity assessment had been performed) and a RIN ≥7. Additional quality criteria that were applied for microarray samples have been published previously (see Kuiper R. et al, Blood Adv 4:6298-6309, 2020).
Microarray data were generated on the MMprofiler™ (SkylineDx), for which Human Genome U133 Plus 2.0 Arrays (Affymetrix) were used. Arrays were processed as described in detail previously (see van Beers E. H. et al, J Mol Diagn 23:120-129, 2021).
RNA Seq libraries were generated with the mRNA HyperPrep Kit (KAPA, catalog number 08105952001/KK8544), according to manufacturer's instructions. In short, 250 ng of total RNA was used for poly(A) selection, after which magnesium-based fragmentation was conducted. A median fragment length of 200-300 bp was aimed for, using a fragmentation time of 6 minutes at 94° C. After cDNA synthesis and A-tailing, custom adapters were ligated (Integrated DNA Technologies), followed by 11 cycles of library amplification. The quality of the generated libraries was assessed on the Bioanalyzer 2100 machine (Agilent) with the High Sensitivity DNA kit (Agilent Technologies, catalog number 5067-4626). Libraries were quantified on a 7500 Fast Real-Time PCR System (Applied Biosystems) machine using the NEBNext Library Quant Kit for Illumina (New England BioLabs, catalog number #E7630S/L).
Paired-end sequencing of libraries was performed on a NovaSeq 6000 (Illumina) machine, with a read length of 2×101 bp and an average of 55×106 reads per sample (see Table 1).
Baseline CTC levels were quantified for patients enrolled in the Cassiopeia, HO143 and EMN12/HO129 trials. For all NDMM patients, CTC levels were quantified by flow cytometry. To this end, 6-10 mL of PB was drawn before treatment start and shipped to Erasmus MC, Rotterdam, the Netherlands, by overnight express courier and de-identified upon receipt. Samples were processed and analyzed according to standardized Next Generation Flow (NGF) methods (EuroFlow) (see Flores-Montero J. et al, Leukemia 31:2094-2103, 2017; Hofste op Bruinink D. et al, Haematologica 106:1496-1499, 2020)
In short, ≤36 hours after sampling NH4Cl bulk lysis was performed. Subsequently, the sample was divided over two tubes (100 μL with 106 cells each) and stained according to the EuroFlow NGF protocol, using CD138, CD38, CD45, CD19, CD27 and CD56 as backbone markers, with CD81 and CD117 as additional markers for tube 1, and CyIgL in combination with CyIgK as additional markers for tube 2. Cells were measured on either a FACSCanto™ II (BD) or FACSLyric™ (BD) machine, using EuroFlow settings (see Kalina T. et al, Leukemia 26:1986-2010, 2012; Glier H. et al, J Immunol Methods 475:112680, 2019). Data analysis was performed in Infinicyt (version 2.0, Cytognos). A total of ≥5×106 leukocytes was aimed to be acquired per tube. A population of ≥20 monoclonal plasma cells (mPCs) was required for CTC identification, which translated into a theoretical limit of detection (LOD) of 20/1×107=2×10−6 per CTC assay (see Arroz M. et al, Cytometry B Clin Cytom 90:31-9, 2016; Paiva B. et al, J Clin Oncol 38:784-792, 2020). The percentage of CTCs was defined as the number of mPCs/the number of leukocytes x 100.
For all pPCL patients, CTCs were detected and quantified at baseline by routine morphological assessment of blood smears in local hematology laboratories, after which data were collected and curated by the EMN data center. A subset of NDMM and pPCL patients had their baseline CTC levels quantified by both NGF and morphological assessment. For all subsequent CTC level analyses, NGF CTC levels were used for all NDMM patients, whereas morphological CTC levels were used for all pPCL patients.
Samples in which ≥150 CTCs had been quantified by flow cytometry were used for immunophenotypic characterization. A marker was defined as positive if ≥10% of mPCs had a EuroFlow-standardized staining intensity of >103 (arbitrary fluorescent units). Markers that were positive or negative in all samples were excluded in correlative analyses.
Cytogenetic aberrations were assessed by interphase fluorescence in situ hybridization (FISH) on CD138-enriched, chemotherapy-naive plasma cells, according to technical quality criteria that have been established within the framework of the European Myeloma Network (EMN) (see Ross F. M. et al, Haematologica 97:1272-7, 2012). Translocations of the immunoglobulin heavy chain (IgH) were detected with probes for t (4;14) (FGFR3/WHSC1), t (8;14) (MYC), t (11;14) (CCND1) and t (14;16) (MAF), whereas copy number aberrations involving deletion of chromosome 1p32 (del1p32) (CDKN2C), 13q14 (del13q14) (RB1) and 17p13 (del17p13) (TP53), as well as gain of chromosome 1q21 (gain1q21) (CKS1B) and hyperdiploidy were detected with either interphase FISH or high-density SNP arrays.
High-risk FISH status was defined according to criteria from the International Myeloma Working Group (IMWG) and included the presence of either a t (4;14), t (14;16) and/or del17p13.28 The presence of a primary IgH translocation was defined as having either a t (4;14), t (11;14) or t (14;16). Patients that had been tested positive for one primary IgH translocation were classified as negative for the other two primary IgH translocations. Hyperdiploidy was defined as having >2 gains of chromosomes 5, 9, 11, 15. Non-hyperdiploid status was defined as having no gains in ≥3 chromosomes out of chromosomes 5, 9, 11, 15. All reported prevalences were calculated based on the following formula: the number of patients with the respective cytogenetic aberrancy/the number of tested patients*100%.
The detection of cytogenetic aberrations in the remaining datasets of this study have been described in detail elsewhere (see Morgan G. J. et al, Blood 118:1231-8, 2011; Barlogie B. et al, Int J Hematol 76 Suppl 1:337-9, 2002; Barlogie B. et al, Br J Haematol 138:176-85, 2007; Kuiper R. et al, Blood Adv 4:6298-6309, 2020; Neben K., et al, Blood 119:940-8, 2012).
For all datasets, the mas5 function of R package “affy” (version 1.63.0) was applied to run a background correction, scale the arrays towards a mean expression value of 500 and summarize features into Ensembl gene IDs using brain array (version 18) ENSG CDF (see Gautier L. et al, Bioinformatics 20:307-15, 2004; Dai M. et al, Nucleic Acids Res 33: e175, 2005). Gene expression values were transformed into a log 2 intensity scale.
Fastq files were constructed using “bcl2Fastq” (version 2.20.0.422, Illumina), after which universal adapters were removed using “Trim galore” (version 0.4.4) (https://github.com/FelixKrueger/TrimGalore). Transcript per million (TPM) counts were measured on the trimmed Fastq files using Salmon (version 1.3.0) with an adapted version of the Ensembl (release 74) reference transcriptome, which has been described in the document “MMRF_COMMpass_IA15_Methods.pdf” (MMRF Researcher Gateway, https://research.themmrf.org) (see Patro R. et al, Nat Methods 14:417-419, 2017; Hubbard T. et al, Nucleic Acids Res 30:38-41, 2002). Transcripts were summarized into gene level TPM values using R package “tximport” (version 1.14.2).
Thereafter, both sets were merged, mitochondrial genes and ribosomal proteins were excluded and TPM was recalculated accounting for all remaining transcripts, excluding IgH-related genes (see Soneson C. et al, F1000Res 4:1521, 2015). All gene expression values were subsequently transformed into a log 2 (TPM+1) intensity scale.
To account for nonlinear global differences between platforms, a robust spline normalization was applied towards the Cassiopeia microarray samples using the “rsn” function in R package “lumi” (version 2.42.0) (see Du P. et al, Bioinformatics 24:1547-8, 2008). In RNA Seq samples, only expressed genes (i.e. TPM>0) were taken into account. Subsequently, a 2D UMAP dimension reduction analysis for the top 30 principal components was performed, using R package “umap” (version 0.2.7.0) to identify distinct batches closely corresponding with technical variation (see McInnes L., et al, arXiv.org, 2020). To specifically account for major batch effects, gene centric mean/variance normalization was performed towards the NDMM samples in the discovery cohort, using the batch-specific NDMM samples derived from BM. All expressed genes (i.e. log 2 expression >5 in >75% of samples in the discovery cohort) were subsequently used as input for all further downstream analyses.
Principal components were determined using the “prcomp” function in R package “stats” (version 4.0.2) (see R Core Team REfSC, 2020). Input expression values were centered, but not scaled to unit variance.
The PCL-like classifier was trained on data from patients in the discovery cohort, who presented with a CTC level ≥LOD and who had matched tumor transcriptomics, tumor burden and CTC level data.
The training phase consisted of three steps. First, genes were identified that associated with CTC levels (percentage of CTCs), independent of tumor burden (percentage of plasma cells in the bone marrow aspirate). To this end the linear regression model y=β0+β1x1+β2x2+∈ was applied using R package “limma” (version 3.46.0) (see Ritchie M. E. et al, Nucleic Acids Res 43: e47, 2015). In this model, y represents the logit-transformed CTC level, x1 the logit-transformed tumor burden, for which the baseline percentage of plasma cells in the BM aspirate was used, x2 the expression of the gene of interest on log 2 scale, β the regression estimates and ∈ the modeling error. CTC-associated genes with a false discovery rate (FDR)<0.05 were considered significant.
The second step was aimed at identifying the number of CTC-associated genes with which pPCL could be best distinguished from NDMM samples. To this end, a leave-one-out cross validation analysis was performed. In this analysis, each fold consisted of all samples in the discovery cohort minus one that was left out. Per fold, step one of the training phase was repeated, obtaining a ranking of all genes based on the significance of the association with CTC levels, independent of tumor load. Subsequently, the first principal component (PC1) was determined, for each combination of an increasing number of genes that were most significantly associated with CTC levels, ranging from 20 to 1000 genes, thereby rotating PC1 such that it positively correlated with CTC levels. Subsequently, a projection was computed for the sample that had been left out. This resulted in a specific cross-validated PC1 score for each pPCL and NDMM sample in this analysis, for each classifier size. The optimal number of genes for the PCL-like classifier was defined as the lowest possible number of genes with which the highest discriminative power was achieved to distinguish pPCL and NDMM samples, using a Wilcoxon test. Thereafter, the score was calculated by computing the first principal component from the expression values of this optimal number of genes, using all samples in the discovery cohort as input. The obtained loadings per PCL-like classifier gene were subsequently used to calculate the score for all remaining samples in cohort 2.
In the third step of training the PCL-like classifier, a cutoff was determined. Hereto, the lowest score was selected with which all pPCL samples in the discovery cohort could be identified.
The PCL-like classifier was validated in an independent validation cohort by means of two analyses:
For the first analysis, all samples in the validation cohort were used, whereas for the second analysis only matched CTC level, tumor burden and tumor transcriptomics data were used from patients with a CTC level ≥LOD.
The MMprofiler™ gene expression assay (SkylineDx) was used to determine the SKY92 high-risk classification and MM clusters in microarray samples (see Broyl A. et al, Blood 116:2543-53, 2010; Kuiper R. et al, Leukemia 26:2406-13, 2012).
SKY92 (=EMC92) scores were calculated as described in Kuiper et al 2012. Briefly, the SKY92 is a summation of the weighted expression of 92 probe sets (see Table 7). This signature constitutes a linear model, expressed in the following formula:
where Bi represents the weight factor of gene i, and xi represents the expression level of gene i in a patient. Based on their SKY92 score, patients were split into two groups, those above the threshold of 0.7774 were classified as positive (High Risk), and those below the threshold as negative (Standard Risk).
Positive beta values (i.e., weight values) indicate that increased expression of said gene over a reference value indicates a positive contribution towards the SKY92 score, as a consequence a larger chance of being above the threshold. Conversely, positive beta values indicate that decreased expression of said gene over a reference value indicates a negative, contribution towards the SKY92 score.
Negative beta values indicate that decreased expression of said gene over a reference value indicates a positive contribution towards the SKY92 score, as a consequence a larger chance of being above the threshold. Conversely, negative beta values indicate that increased expression of said gene over a reference value indicates a negative, contribution towards the SKY92 score.
The following Table 2 shows SKY92 probe sets and weights:
MM clusters were subsequently merged into a CD1/CD2 cluster (comprising clusters CD1 and CD2) and non-IgH cluster (comprising clusters HY, PR, CTA, LB, NFKB, NP, myeloid and PRL3), resulting in four main clusters: CD1/CD2, MF, MS and non-IgH. The UAMS70 high-risk classification was calculated as described in the original publication (see Shaughnessy J. D. et al, Blood 109:2276-84, 2007).
Microarray-developed gene classifiers were converted for RNA Seq datasets according to a bioinformatic pipeline that has been outlined in detail previously (see Kuiper R. et al, Blood Adv 4:6298-6309, 2020). To check the validity of this procedure, paired PCL-like, SKY92 and UAMS70 scores were generated from samples with both array and RNA Seq transcriptomic data. Scores were compared in a linear regression model, using the “Im” function in R package “stats” (version 4.0.2) (see R Core Team REfSC, 2020).
Single sample gene set enrichment analysis Single sample gene set enrichment analysis (ssGSEA) was performed on tumor transcriptomic data from all HOVON-65/GMMG-HD4, HOVON-87/NMSG-18, EMN02/HO95, Cassiopeia and EMN12/HO129 microarray samples in the prevalence cohort, using an in-house written R package that computationally optimized the publicly available ssGSEA GenePattern module (https://github.com/GSEA-MSigDB/ssGSEA-gpmodule) (see Barbie D. A. et al, Nature 462:108-12, 2009; Subramanian A. et al, Proc Natl Acad Sci USA 102:15545-50, 2005). Gene sets from the curated canonical pathways MSigDB Collections (c2.cp, version 7.1) were selected for subsequent analyses if these had ≥10 genes overlap with expressed genes in the discovery cohort.
Univariate and multivariate survival analyses were performed with a Cox regression model using R package “survival” (version 3.2.3), for which baseline and follow-up data from the survival cohort were used.43 Follow-up time was measured from start of treatment to either the occurrence of an event or last contact in case of no event. For PFS, an event was defined as either progressive disease or death from any cause. For OS, an event was defined as death from any cause. All multivariate survival analyses were stratified by trial cohort and included age≤65 years as covariate.
Meta-analyses were performed using R package “meta” (version 4.15.1), using a random effects model (see Balduzzi S. et al, Evid Based Ment Health 22:153-160, 2019). The Mantel-Haenszel formula was used to pool study cohort data, with between study variance being estimated with the DerSimonian and Laird procedure. Test statistics and confidence intervals were adjusted with the Hartung and Knapp method.
Figures were generated in RStudio (version 1.4.1103), with R packages “ggplot2” (version 3.3.2), “ggExtra” (version 0.9), “corrplot” (version 0.84), “ggridges” (version 0.5.2), “pheatmap” (version 1.0.12), “viridis” (version 0.5.1), “meta” (version 4.15-1) and “survminer” (version 0.4.8), as well as in Adobe Illustrator (version 25.1, Adobe) (see Balduzzi S. et al, Evid Based Ment Health 22:153-160, 2019; RStudio Team R, 2016; Wickham H., Springer-Verlag New York, 2016; Attali D. et al, R package version 0.9, 2019; Wei T. et al, R package “corrplot” 2017; Wilke C. O., R package version 0.5.2, 2020; Kolde R., R package version 1.0.12, 2019; Garnier S., R package version 0.5.1. 2018; Kassambara A. et al, R package version 0.4.8. 2020).
Baseline and follow-up data of all NDMM and pPCL patients in this study were systematically collected and curated in the context of nine registered clinical trials (Table 2). Baseline and follow-up data for the HOVON-65/GMMG-HD4, HOVON-87/NMSG-18 and HO143 trials were provided by Hemato-Oncology Foundation for Adults in the Netherlands (HOVON), for the Cassiopeia trial by the Intergroupe Francophone du Myelome (IFM) and for the EMN02/HO95 and EMN12/HO129 trials by EMN. For patients enrolled in the Total Therapy 2 and 3 protocols, these data were obtained from GEO (GSE24080), whereas clinical data from MRC IX trial patients were kindly shared by Dr. Walter Gregory.
Salmon TPM count data from the EMN02/H095 and HO143 cohorts, as well as CEL files from the EMN02/H095, EMN12/HO129 and Cassiopeia cohorts are available on the GEO repository (https://www.ncbi.nlm.nih.gov/geo/), under accession codes GSE164847, GSE164830, GSE164706, GSE164703 and GSE164701, respectively (see Table 2).
2004-
2007-
2018-
2018-
of
,
-eligible
years
A, B and C
pPCL
years
indicates data missing or illegible when filed
To investigate clinical and molecular determinants of PCL-like disease, baseline patient and tumor characteristics were collected of 297 NDMM and 51 pPCL patients (cohort 1) (
Baseline CTC levels (median, 31% versus 0.016%, p<0.0001) and tumor burden as reflected by BM plasmacytosis (median, 64% versus 32%, p<0.0001) were both higher in pPCL than in NDMM patients (
pPCL patients presented with significantly higher morbidity than NDMM patients, including more hypercalcemia (24% versus 6%), renal failure (25% versus 4%) and soft tissue plasmacytoma (18% versus 3%), yet a lower occurrence of bone lesions (59% versus 81%) (false discovery rate (FDR)<0.05) (
To enable a more comprehensive screening of tumor cell aberrations that associate with PCL-like disease, transcriptomic profiling was performed of BM tumor cells in a subgroup of 154 NDMM and 29 pPCL patients from cohort 1 (
For the identification of essential genes defining this PCL-like transcriptome, cohort 1 was divided into a discovery (n=124) and validation set (n=59), including both NDMM and pPCL patients in each set (
By using the composite information of a selection of 54/1700 genes, a score was constructed with which pPCL could be best distinguished from NDMM samples: the score (
.1
1
]
-19
1B
943
2
4
subunit
]
protein modification
3.2
protein 19
]
regulation
family member 1
]
regulation
4
similarity
member A
943]
3
4
.1
protein modification
82
]
molecule 1
mitochondrial
1
]
20
]
94
11
.75104
binding protein
1
283]
2.2
subunit
0]
family member 7
family member 5
E
alpha
393]
subunit
]
family member 11
3
550]
domain containing
2
proein
]
6.3
2
]
23
1C
11
family member 3
]
]
1
kinase
]
1
calcium binding
]
8
protein complex subunit 7
]
9
]
regulation
receptor 1
]
protein modification
1
M
]
41
3B
]
regulation
88
protein
17
kinase 1
]
24
regulation
34
.2
cell-specific protein
72
family member 3
regulation
85
3
13
protein M
82
1
A alpha
02
regulation
86
]
93
3
]
14
alpha
144]
indicates data missing or illegible when filed
Since the score is a reflection of PCL-like disease, it was hypothesized that this information could be leveraged to identify NDMM tumors with a similar transcriptome to pPCL tumors. To this end, a threshold for the PCL-like classifier was set by selecting the minimal score to include all pPCL tumors in the discovery cohort (
To explore the prevalence of a PCL-like transcriptome in all stages of plasma cell malignancies, PCL-like status was determined in 1650 additional plasma cell samples (cohort 2) (
To further characterize PCL-like MM, additional data were collected for 885 NDMM and pPCL patients from cohort 2. First, single sample gene set enrichment analysis (ssGSEA) scores were generated for each tumor sample, including 1788 canonical pathways. A comparison of these ssGSEA scores between subgroups showed that pPCL and i-MM were highly distinct at the transcriptomic level, whereas PCL-like MM and pPCL were very similar (
Also at the clinical and cytogenetic level, PCL-like MM was more similar to pPCL than i-MM. PCL-like MM only had a lower prevalence of R-ISS stage III (26% versus 56%) and ISS stage III (38% versus 75%) than pPCL, whereas i-MM differed from pPCL with respect to the presence of 14/25 investigated baseline characteristics, including del1p32 (8% versus 27%), del17p13 (10% versus 46%) and t (11;14) (17% versus 52%) (FDR<0.05) (
Of 28 pPCL patients, matched tumor samples from BM and PB were available. CTCs had a higher score than matched BM tumor samples (median, 7.42 versus 7.02, p=0.0045).
To investigate whether PCL-like transcriptomic status could be used as a novel molecular high-risk factor in NDMM, its association with PFS and OS was evaluated in 1540 NDMM patients from seven different phase 2 and 3 trial cohorts (
Multivariate regression analysis was performed to test if PCL-like transcriptomic status retained its prognostic value in the context of conventional high-risk markers in NDMM, independent of age and received treatment. This showed that PCL-like transcriptomic status significantly associated with both PFS and OS in the context of R-ISS stage, ISS stage, high-risk FISH, SKY92 high-risk status and UAMS70 high-risk status (Table 7,
PCL-like MM patients with R-ISS stage III (17/579, 3%) had a median OS of 13.7 months (95% CI, 6.8-41.1) versus not reached (95% CI, 87.8-NA) for i-MM patients with R-ISS stage I (104/579, 18%). Moreover, PCL-like MM patients with SKY92 high-risk disease (97/1540, 6%) had a median OS of 23.9 months (95% CI: 18.8-30.4) versus 87.8 months (95% CI: 81.2-NA) for i-MM patients with SKY92 standard-risk disease (1131/1540, 73%).
Number | Date | Country | Kind |
---|---|---|---|
21190230.9 | Aug 2021 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/NL2022/050460 | 8/5/2022 | WO |