The field of the invention is method of omics analysis for prediction and analysis of MDS (myelodysplastic syndrome) to AML (acute myeloid leukemia) progression.
The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
All publications and patent applications identified herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
Myelodysplastic syndrome (MDS) constitutes a group of clonal hematopoietic disorders characterized by bone marrow failure, dysplasia, and an increased likelihood of progression to acute myeloid leukemia (AML). MDS is generally classified as “primary” (or de novo) and “treatment-related” (secondary to prior cytotoxic chemotherapy) and both are thought to arise due to abnormalities in hematopoietic stem cell self-renewal and differentiation.
Many different conditions are grouped together under the “MDS” umbrella based on common clinical characteristics, thus accounting for the wide heterogeneity observed. Diagnosis of patients with this disease can be difficult at times. Similarly, the assigning of prognosis and the selection of appropriate therapy require careful application of prognostic scoring systems taking into account clinical characteristics (e.g., cytopenias, age, performance status) and cytological parameters (e.g., blast count, morphology, karyotype). Factors such as poor cytogenetics are associated with decreased survival in MDS.
Several factors have been identified that can significantly impact the prognosis and selection of therapy for MDS patients, such as cytogenetics, patient performance status, and red blood cell (RBC) transfusion dependence. Numerous studies have shown that patient performance status is inversely associated with overall or event-free survival in patients receiving intensive chemotherapy for MDS or AML, particularly in older individuals. Appropriate diagnosis and classification of MDS depends on accurate assessments of both clinical features and laboratory/pathology findings (e.g., blast count, peripheral blood counts, cytogenetics). To this end, well-prepared bone marrow smears and biopsy specimens are essential. Unfortunately, such methods require significant time and review by trained professionals, adding significant cost.
More recently, various genetic conditions have been associated with treatment sensitivity, prognosis, survival time, etc. for MDS and AML. For example, patients with del(5q) MDS who failed to achieve sustained erythroid or cytogenetic remission after treatment with lenalidomide were shown to have an increased risk for clonal evolution and AML progression (see Ann Hematol. 2010 April; 89(4):365-74). In another study, the Wilms' tumor gene WT1 was reported to be a good marker for diagnosis of disease progression of myelodysplastic syndromes (see Leukemia 1999 March; 13(3):393-9), and a combined assessment of WT1 and BAALC gene expression at diagnosis was reported to possibly improve leukemia-free survival prediction in patients with myelodysplastic syndromes (see Leuk Res. 2015 August; 39(8):866-73). Similarly, individual mutations in the TET2 gene were reported to be diagnostic markers for MDS or AML as discussed in WO2010/087702.
In still further known tests, somatic, non-silent mutational signatures were reported to predict survivability of MDS as is discussed in US 2014/0127690, and WO 2013/056184 teaches methods for testing whether a drug, compound, diet, therapy or treatment is effective or efficacious for preventing, ameliorating, slowing the progress of, stopping or slowing the metastasis of, or for causing a full or partial remission of, a cancer, or a cancer stem cell, or a leukemia cancer stem cell. However, none of the known methods allows for a robust prediction of time of progression from MDS to AML.
Therefore, there is still a need for improved prognostic tests that can predict the time of progression from MDS to AML, which helps guide physicians in the selection of appropriate treatment options for patients diagnosed with MDS.
The inventive subject is directed to various methods in which the time for progression of MDS to AML can be predicted based on certain omics features, especially by using differentially expressed genes and/or inferred pathway activities in a regression-based model.
In one aspect of the inventive subject matter, the inventors contemplate a method of predicting time of progression from MDS to AML that includes a step of quantifying expression of a plurality of genes of a sample containing myelodysplastic cells, wherein the plurality of genes have an above-average difference between MDS and AML with respect to at least one of mRNA expression and inferred pathway activity. In another step, the plurality of genes having the above-average difference between MDS and AML is used in a prediction model to calculate a likely time of progression from MDS to AML.
While in some embodiments, the plurality of genes have an above-average difference between MDS and AML with respect to mRNA expression, in other embodiments the plurality of genes have an above-average difference between MDS and AML with respect to inferred pathway activity. It is further contemplated that the plurality of genes are selected from the group consisting of CHD4, GPATCH2L, FAM212A, EXT2, MACF1, RTKN, ZSCAN2, RNF220, YEATS2, ERGIC1, ZNF618, MBTD1, CXXC5, and DUSP10. Viewed from a different perspective, the prediction model may be based on a plurality of differentially expressed genes in which at least 50 genes are differentially expressed as determined by t-test and an alpha of 0.05 (as for example shown in
While not limiting to the inventive subject matter, the prediction model may be built using a regression algorithm, and more preferably a lasso least-angle regression algorithm. It is further preferred that the prediction model provides predictions up to at least 120 months, and/or that the step of quantifying expression of the plurality of genes uses whole transcriptome RNAseq data. Moreover, it is contemplated that contemplated methods may further include a step of identifying a druggable target in the whole transcriptome RNAseq data, and optionally a step of generating or updating a report with a treatment recommendation.
Therefore, in yet another aspect of the inventive subject matter, the inventors also contemplate a method of generating a model for predicting time for MDS to AML transition. Preferred models will generally include a step of quantifying expression of a plurality of genes of a sample containing MDS cells, and another step of quantifying expression of a plurality of genes of a sample containing AML cells (typically performed using whole transcriptome RNAseq data). Optionally, inferred pathway activities are then calculated for the plurality of genes of the sample containing MDS cells and the plurality of genes of the sample containing AML cells. In yet another step, a plurality of genes are identified with an above-average difference between the MDS cells and the AML cells with respect to at least one of mRNA expression and inferred pathway activity, and the plurality of genes with the above-average difference between the MDS cells and the AML cells are used to build a prediction model that calculates a likely time of progression from MDS to AML.
Most typically, the plurality of genes have an above-average difference between MDS and AML with respect to mRNA expression and/or an above-average difference between MDS and AML with respect to inferred pathway activity. As noted above, it is contemplated that the prediction model may be based on a plurality of differentially expressed genes in which at least 50 genes are differentially expressed as determined by t-test and an alpha of 0.05. For example, suitable genes with above-average difference between the MDS cells and the AML cells include CHD4, GPATCH2L, FAM212A, EXT2, MACF1, RTKN, ZSCAN2, RNF220, YEATS2, ERGIC1, ZNF618, MBTD1, CXXC5, and DUSP10. In further contemplated aspects, the prediction model is built using a regression algorithm (e.g., lasso least-angle regression algorithm).
Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
The inventors have now discovered that the time for progression of MDS to AML can be predicted with relatively high accuracy using a predictive algorithm that is built on differentially expressed genes and/or genes with differential pathway activity. Notably, differential expression and/or differential pathway activity of selected genes held significantly stronger predictive power than overall mutation rates, single gene mutations, and presence or type of neoepitopes generated by mutations in MDS in the progression to AML. The inventors also discovered that while the coding clonal mutational burden in MDS was relatively low, there was a pervasive significant change in overall gene expression (with the exception of CD34) as the disease moved from MDS to AML.
With respect to specific mutations in selected genes, the inventors also discovered a small subset of mutations that may be associated (causally or indirectly) with the progression of MDS to AML. Specifically, and as is shown in more detail below, most AML cells exhibited a higher expression in Myc, FLT3 (which also sowed higher expression in Myb), and APF2. On the other hand, transcription decreased substantial downregulation of FOXM1 as the disease progressed and a reduced expression of GATA1.
Thus, on the basis of these observations, various manners or predicting progression, and especially time of progression of MDS to AML are contemplated using these observations. In most preferred aspects, prediction will not simply be predicated on the quantification of a single marker as variability with a single marker would be unlikely to provide a graduated prediction (e.g., within a time resolution of 3 months, 2 months, or 1 month, or 2 weeks, or even 1 week). Therefore, the inventors investigated whether a multi-factorial analysis using most differentially expressed genes and/or pathway activities could be used to produce a prediction model that can provide information on the likely time required for a patient to progress from MDS to AML. Such graduated information is especially important for choice of an appropriate treatment. In addition, a multi-factorial predictive algorithm is also advantageous as MDS is a collection of various sub-diseases for which individual diagnostic and prognostic makers are difficult to identify.
Based on the unexpected discovery that many genes had a negative expression bias upon transition from MDS to AML, the inventors investigated whether or not there was a differential expression pattern to one or more genes. Notably, and as shown in more detail below, genes with significant differential expression between MDS and AML served as statistically meaningful features in machine learning in an analysis that correlated time to progress from MDS to AML with expression values of these genes. As a consequence, a statistical model could be defined that allowed prediction of MDS to AML progression in a quantitative manner (as opposed to simply diagnosing a state of MDS or AML). Surprisingly, and as also shown in more detail below, the resultant model was relatively simple and required only relatively low numbers of expression data of selected genes.
In a first attempt to identify a predictive marker of progression of MDS to AML, the inventors compared patient data with different times of progression and mutational burden, and particularly mutational burden of genetic sequences that encode proteins. Omics analysis was performed using whole genome sequencing of MDS and AML cells from the same patient, and incremental location guided synchronous alignment using BAMBAM, as for example described in U.S. Pat. No. 9,721,062.
When analyzing the mutational changes for all genes as a possible guide for predicting transition time of MDS to AML, the inventors noted that several genes had a significant differential mutational burden. Interestingly, some genes lost mutations in the progression of MDS to AML, while other genes gained mutations as is exemplarily shown in Table 1. Notably, several patients had FLT3 and IDH1 mutations. Moreover, it was noted that large genes such as NBPF genes were more affected, possibly due to mutations by chance. Therefore, these mutations appear to represent passenger mutations rather than driver mutations. While significant in terms of specificity, these mutational changes were not sufficient for a quantitative predictive model. Most notably, the shutting down of a great number of genes at AML stage would be consistent with a situation where a blast population emerges where the cells complete two milestones: They do not differentiate and do not apoptose. Thus, those specific genes and pathways are deemed to have significance for diagnostic and prognostic use. For example, genes associated with viability like BCL2 family and those associated with apoptosis like CASPASE pathway or pro-inflammatory cytokine cascade. Involvement of Ribosomal proteins and their dosage effect of haplo-insufficiency rather than genetic mutations has been established in MDS and also found in congenital anemias. Ribosomal issues link congenital and acquired anemias.
Using the same comparative whole genome analysis and further considering expression of the mutated sequences, the inventors further investigated whether or not neoepitopes in coding and expressed DNA segments could serve as a basis for a quantitative predictive model, and exemplary results are shown in
Surprisingly, however, the inventors observed upon analysis of gene expression that a substantial portion of genes were expressed to a significantly lower degree as can be seen in the graph of
To that end, the inventors investigated on the basis of RNAseq data (and in some cases also whole genome or exome sequencing data) which of the differentially expressed genes had significant and strong difference in expression. Moreover, the inventors also used the function of the differentially expressed genes in a pathway analysis algorithm to identify those expressed genes that produced the largest difference in inferred pathway activity. More specifically, the inventors determined the effect of the differentially expressed genes using a pathway recognition algorithm using data integration on genetic models as is described in WO 2013/062505. Of course, it should be appreciated that numerous alternative pathway analysis models are also deemed suitable, and all known pathway analysis models are contemplated herein.
More specifically, Table 2 lists the genes with the largest median paired differences of mRNA expression (AML versus MDS), while Table 3 lists the genes with the largest median paired differences of inferred pathway activity (AML versus MDS). Table 4 lists the genes with the largest median inferred pathway activity (AML normalized to paired MDS).
As can be readily taken from the data and Tables 2-4 above, significant differences in gene expression and changes in inferred pathway activity were discovered. As such the changed genes could be employed in a model to differentiate between MDS and AML, and/or to predict progression time and/or likelihood of progression. Moreover, the inventors noted that selected genes with high differential expression and/or differences in inferred pathway activity were transcription factors or closely related to transcription factors and/or targets of these factors. Therefore, in at least some aspects of the inventive subject matter, the inventors contemplate use of these genes and/or targets of these factors in a diagnostic and/or predictive model for MDS/AML transition.
More specifically, in one example, 4/26 samples were held out for validation. Three normalizations were compared and ten regression algorithms were tested in a 6-fold cross-validation. As is shown in
It should be noted that any language directed to a computer should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus. In especially preferred embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges preferably are conducted over a packet-switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network.
In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the invention may contain certain errors necessarily resulting from the standard deviation found in their respective testing measurements. Moreover, and unless the context dictates the contrary, all ranges set forth herein should be interpreted as being inclusive of their endpoints, and open-ended ranges should be interpreted to include commercially practical values. Similarly, all lists of values should be considered as inclusive of intermediate values unless the context indicates the contrary.
It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the scope of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
This application claims priority to U.S. provisional applications with the Ser. No. 62/413,917, filed Oct. 27, 2016, and 62/429,036, filed Dec. 1, 2016.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/058793 | 10/27/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62413917 | Oct 2016 | US | |
62429036 | Dec 2016 | US |