The present invention provides methods and devices for prognosing chronic lymphocytic leukemia (CLL) using one or more markers, as well methods of treating CLL using for example a modulator of SF3B1 activity.
Chronic lymphocytic leukemia (CLL) remains incurable and displays vast clinical heterogeneity despite a common diagnostic immunophenotype (surface expression of CD19+CD20+dimCD5+CD23+ and sIgMdim). While some patients experience an indolent disease course, approximately half have steadily progressive disease leading to significant morbidity and mortality (Zenz, Nat Rev Cancer, 2010, 10:37-50). Our ability to predict a more aggressive disease course has improved through the use of biologic markers (such as presence of somatic hypermutation of the immunoglobulin heavy chain variable region [IGHV status] and ZAP70 expression), and detection of cytogenetic abnormalities (such as deletions in chromosomes 11q, 13q, and 17p and trisomy 12) (Rassenti, N Engl J Med, 2004, 351:893-901; Dohner, N Engl J Med, 2000, 343:1910-6). Still, prediction of disease course is not highly reliable. Accordingly a need exists for the identification of biomarkers that can predict aggressive disease progression in patients with CLL.
The invention provides, inter alia, prognostic factors for chronic lymphocytic leukemia (CLL). An example of such a prognostic factor is SF3B1. According to some aspect of the invention, it has been found unexpectedly that the presence of a SF3B1 mutation in a CLL sample indicates a poor prognosis. Detection of SF3B1 mutations may dictate, in some instances, an altered treatment, including but not limited to an aggressive treatment. The invention contemplates integrating SF3B1 mutation status into predictive and prognostic algorithms that currently use other markers, given the now recognized value of SF3B1 as an independent prognostic factor. SF3B1 mutation status can be used together with other factors, such as ZAP70 expression status and mutated IGVH status, to more accurately determine disease progression and likelihood of response to treatment, among other things. Other such prognostic factors include HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2.
In one aspect, the invention provides methods of determining a treatment regimen for a subject having CLL by identifying a mutation in the SF3B1 gene in a subject sample. The presence of one or more mutations in the SF3B1 gene may indicate that the subject should receive an alternative treatment regimen (compared to a prior treatment regimen administered to the patient). In some embodiments, the presence of one or more mutations in the SF3B1 gene indicates that the subject should receive an aggressive treatment regimen (for example a treatment that is more aggressive than a prior treatment administered to the patient). In some embodiments, the presence of one or more mutations in the SF3B1 gene indicates that the subject should receive a treatment that acts through a different mechanism than a prior treatment or a modality that is different from a prior treatment.
In another aspect, the invention provides methods of determining whether a subject having CLL would derive a clinical benefit of early treatment by identifying a mutation in the SF3B1 gene in a subject sample. The presence of one or more mutations in the SF3B1 gene indicates that the subject would derive a clinical benefit of early treatment.
In a further aspect, the invention provides methods predicting survivability of a subject having CLL by identifying a mutation in the SF3B1 gene in a subject sample. The presence of one or more mutations in the SF3B1 gene indicates the subject is less likely to survive or has a poor clinical prognosis.
Also included in the invention is method of identifying a candidate subject for a clinical trial for a treatment protocol for CLL by identifying a mutation in the SF3B1 gene in a subject sample. The presence of one or more mutations in the SF3B1 gene indicates that the subject is a candidate for the clinical trial.
In some embodiments, the mutation is a missense mutation. In some embodiments, the mutation is a R625L, a N626H, a K700E, a G740E, a K741N or a Q903R mutation in the SF3B1 polypeptide. In some embodiments, the mutation is a E622D, a R625G, a Q659R, a K666Q, a K666E, and a G742D mutation in the SF3B1 polypeptide. It is to be understood that the invention contemplates detection of nucleic acid mutations that correspond to the various amino acid mutations recited herein. In some embodiments, the mutation in the SF3B1 gene is within exons 14-17 of the SF3B1 gene.
In some embodiments, the method further comprises detecting at least one other CLL-associated marker. In some embodiments, the at least one other CLL-associated marker is mutated IGVH status or ZAP70 expression status.
In some embodiments, the method further comprises detecting (or identifying) at least one CLL-associated chromosomal abnormality. In some embodiments, the at least one CLL-associated chromosomal abnormality is selected from the group consisting of 8p deletion, 11q deletion, 13q deletion, 17p deletion, trisomy 12, monosomy 13, and rearrangements of chromosome 14.
The invention further contemplates methods related to those recited above but wherein mutations in one or more of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2 genes are analyzed.
Any of the foregoing methods may further comprise analyzing genomic DNA for the presence of mutations in one or more of TP53, ATM, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, and POT1.
In yet another aspect the invention provides methods of treating or alleviating a symptom of CLL by administering to a subject a compound that modulates SF3B1. Such a compound may inhibit or activate SF3B1 activity or may alter SF3B1 expression. The compound may be, for example, spliceostatin, E7107, or pladienolide.
In another aspect, the invention provides a kit comprising (i) a first reagent that detects a mutation in a SF3B1 gene; (ii) optionally, a second reagent that detects at least one other CLL-associated marker; (iii) optionally, a third reagent that detects at least one CLL-associated chromosomal abnormality; and (iv) instructions for their use. The mutations in (i), (ii), and (iii) may be any of the foregoing recited mutations. The invention further provides other related kits in which the first reagent detects mutations in a risk allele selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2. The second reagent may be a reagent that detects mutations in TP53, ATM, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, or POT1. The third reagent may be a reagent that detects a 8p deletion, 11q deletion, 13q deletion, 17p deletion, trisomy 12, monosomy 13, or a rearrangement of chromosome 14. The kit may comprise one or more first reagents (specific for the same or different risk alleles), one or more second reagents (specific for the same or different risk alleles), and one or more third reagents (specific for the same or different risk alleles).
In some embodiments, the first, second and third reagents are polynucleotides that are capable of hybridizing to the genes or chromosomes of (i), (ii) and/or (iii), wherein said polynucleotides are optionally linked to a detection label. The binding pattern of these polynucleotides denotes the presence or absence of the above-noted mutations.
The invention is further premised in part on the discovery that the clonal (including subclonal) profile of a CLL has independent prognostic value. It has been found that the presence of particular mutations, referred to herein as drivers, in CLL subclones is indicative of more rapid disease progression, greater likelihood of relapse, and shorter remission times. The ability to analyze a CLL sample for the presence of subclonal populations and more importantly drivers in the subclonal populations informs the subject and the medical practitioner about the likely disease course, and thereby influences decisions relating to whether to treat a subject or to delay treatment of the subject, the nature of the treatment (e.g., relative to prior treatment), and the timing and frequency of the treatment.
Some aspects of this disclosure therefore relate to the surprising discovery that the clonal heterogeneity of CLL in a subject is prognostic of the course of the disease, and informs decisions regarding treatment. In some aspects, the disclosure provides novel, independent prognostic markers of CLL. The invention provides methods and apparati for detection of one or more of these independent prognostic factors. In some aspects, the presence of one or more of these independent prognostic markers in a CLL sample, and particularly in a subclonal population, alone or in combination with other CLL prognostic markers whether or not in subclonal populations, indicates the severity or aggressiveness of the disease, and informs the type, timing, and degree of treatment to be prescribed for a patient.
These independent prognostic factors include mutations in a risk allele selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, and POT1, and mutations that are selected from the group consisting of del(8p), del(13q), del(11q), del(17p), and trisomy 12. Any combination of two or more of these mutations may be used, in some methods of the invention. In some embodiments where two or more mutations are analyzed, at least one of those mutations is selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2, and optionally also including SF3B1.
In some embodiments, the independent prognostic factors include subclonal mutations in any one of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, NOTCH1, XPO1, CHD2, POT1, del(8p), del(11q), and del(17p). Additional independent prognostic factors include subclonal mutations in SF3B1, MYD88, and TP53 and subclonal del(13q) and subclonal trisomy 12.
In another aspect, the invention provides a method comprising (a) analyzing genomic DNA in a sample obtained from a subject having or suspected of having CLL for the presence of mutation in a risk allele, (b) determining whether the mutation is clonal or subclonal (i.e., whether the mutation is present in a clonal population of CLL cells or a subclonal population of CLL cells), and optionally (c) identifying the subject as a subject at elevated risk of having CLL with rapid disease progression if the mutation is a driver event and subclonal.
In some embodiments, the risk allele is selected from SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, TP53, ATM, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, and POT1. In some embodiments, the risk allele is selected from SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, TP53, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, and POT1. In some embodiments, the risk allele is selected from HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, and POT1. In some embodiments, the risk allele is selected from HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2.
In some embodiments, the risk allele is selected from del(8p), del(13q), del(11q), del(17p), and trisomy 12. In some embodiments, the risk allele is selected from del(8p), del(11q), and del(17p).
In some embodiments, the method comprises analyzing genomic DNA for (a) a mutation in one or more risk alleles selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, and POT1, and/or (b) a mutation that is selected from the group consisting of del(8p), del(13q), del(11q), del(17p), and trisomy 12.
In some embodiments, the method comprises analyzing genomic DNA for (a) a mutation in one or more risk alleles selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, TP53, MYD88, NOTCH1, XPO1, CHD2, and POT1, and/or (b) a mutation that is selected from the group consisting of del(8p), del(13q), del(11q), del(17p), and trisomy 12.
In some embodiments, the method comprises analyzing genomic DNA for (a) a mutation in one or more risk alleles selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, NOTCH1, XPO1, CHD2, and POT1, and/or (b) a mutation that is selected from the group consisting of del(8p), del(11q), and del(17p).
In some embodiments, the method comprises analyzing genomic DNA for a mutation in one or more risk alleles selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2.
In some embodiments, the method comprises analyzing genomic DNA for the presence of a mutation in one or more of at least 2 risk alleles chosen from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12.
In some embodiments, the method comprises analyzing genomic DNA for the presence of a mutation in one or more of at least 2 risk alleles chosen from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12.
In some embodiments, the method comprises analyzing genomic DNA for the presence of a mutation in one or more of at least 2 risk alleles chosen from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, NOTCH1, XPO1, CHD2, POT1, del(8p), del(11q), and del(17p).
In some embodiments, the method comprises analyzing genomic DNA for the presence of a mutation in one or more of at least 2 risk alleles chosen from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2.
At least 2 intends and embraces at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10. In some embodiments, the at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, or at least 9 of the risk alleles analyzed are selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2.
In another aspect, the invention provides a method comprising (a) detecting a mutation in genomic DNA from a sample obtained from a subject having or suspected of having CLL, (b) detecting clonal and/or subclonal populations of cells carrying the mutation, and optionally (c) identifying the subject as a subject at elevated risk of having CLL with rapid disease progression if the mutation is a driver event present in a subclonal population of cells.
In another aspect, the invention provides a method comprising detecting, in genomic DNA of a sample from a subject having or suspected of having CLL, presence or absence of a mutation in a risk allele selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, and POT1 and/or a mutation that is selected from the group consisting of del(8p), del(13q), del(11q), del(17p), and trisomy 12, and determining if the mutation, if present, is in a subclonal population of the CLL sample. In some embodiments, the mutation is in a risk allele selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, TP53, MYD88, NOTCH1, XPO1, CHD2, and POT1. In some embodiments, the mutation is in a risk allele selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, NOTCH1, XPO1, CHD2, and POT1. In some embodiments, the mutation is in a risk allele selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2. In some embodiments, the mutation is selected from the group consisting of del(8p), del(11q), and del(17p).
Various embodiments apply equally to the foregoing methods and these are recited now for brevity.
The methods of the invention are typically performed on a sample obtained from a subject and are in vitro methods. In some embodiments, the sample is obtained from peripheral blood, bone marrow, or lymph node tissue. In some embodiments, the genomic DNA is analyzed using whole genome sequencing (WGS), whole exome sequencing (WES), single nucleotide polymorphism (SNP) analysis, or deep sequencing, targeted gene sequencing, or any combination thereof. These techniques may be used in whole or in part to detect the mutations and the subclonal nature of the mutations.
In some embodiments, the methods further comprise treating a subject identified as a subject at elevated risk of having CLL with rapid disease progression. In some embodiments, the methods further comprise delaying treatment of the subject for a specified or unspecified period of time (e.g., months or years). In some embodiments, the methods are performed before and after treatment. In some embodiments, the methods are repeated every 6 months or if there is a change in clinical status. In some embodiments, genomic DNA is analyzed for mutations in more than one risk allele.
In some embodiments, the method analyzes genomic DNA for mutations in two or more of the HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2 genes, including three or more, four or more, five or more, six or more, seven or more, eight or more, or all nine of the genes.
Any of the foregoing subclonal driver methods may be combined with detection of mutations in other genes (or gene loci or chromosomal regions) regardless of whether these latter mutations are clonal or subclonal. For example, the methods may comprise detection of mutations in one or more of TP53, ATM, MYD88, SF3B1, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12, without determining the clonal or subclonal nature of such mutations.
In another aspect, the invention provides a kit comprising reagents for detecting (1) mutations in one or more risk alleles selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, POT1, TP53, MYD88, NOTCH1, and ATM, and/or (2) mutations selected from the group consisting of del(8p), del(13q), del(11q), del(17p), or trisomy 12, in a sample obtained from a patient.
In another aspect, the invention provides a kit comprising reagents for detecting (1) mutations in one or more risk alleles selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, POT1, TP53, MYD88, and NOTCH1, and/or (2) mutations selected from the group consisting of del(8p), del(13q), del(11q), del(17p), or trisomy 12, in a sample obtained from a patient.
In another aspect, the invention provides a kit comprising reagents for detecting (1) mutations in one or more risk alleles selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, POT1, and NOTCH1, and/or (2) mutations selected from the group consisting of del(8p), del(11q), and del(17p), in a sample obtained from a patient.
The kit may comprise reagents for detecting on mutations in (1) or only mutations in (2), or any combination thereof. In some embodiments, the kit comprises reagents for detecting mutations in at least one, two, three, four, five, six, seven, eight, or nine risk alleles selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2. In some embodiments, the kit is used to determine whether the mutation is a subclonal mutation. In some embodiments, the kit comprises instructions for determining whether the mutation is a subclonal mutation. In some embodiments, the subclonal mutation is at least one, two, three, four, five, six, seven, eight, nine or ten risk alleles selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, TP53, MYD88, NOTCH1, DDX3x, ZMYM3, FBXW7, XPO1, CHD2, POT1, and EGR2. In some embodiments, the kit comprises instructions for the prognosis of the patient based on presence or absence of subclonal mutations, wherein the presence of a subclonal mutation indicates the patient has an elevated risk of rapid CLL disease progression. The kits are therefore useful in determining prognosis of a patient with CLL.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are expressly incorporated by reference in their entirety. In cases of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples described herein are illustrative only and are not intended to be limiting.
Other features and advantages of the invention will be apparent from and encompassed by the following detailed description and claims.
The invention is based, in part, upon the surprising discovery that patients with chronic lymphocytic leukemia (CLL) who harbor mutations in the SF3B1 gene and certain other genes demonstrate a significantly shorter time to first therapy, signifying a more aggressive disease course. This is particularly the case if such mutations are subclonal. Furthermore, a Cox multivariable regression model for clinical factors contributing to an earlier time to first therapy in a series of 91 CLL samples revealed that SF3B1 mutation was predictive of shorter time to requiring treatment, independent of other established predictive markers such as IGHV mutation, presence of del(17p) or ATM mutation. Accordingly, mutations in the SF3B1 and certain other genes are prognostic markers of disease aggressiveness in CLL patients.
Ninety-one CLL samples, consisting of 88 exomes and 3 genomes, representing the broad clinical spectrum of CLL were analyzed. Nine driver genes in six distinct pathways involved in pathogenesis of this disease were identified. These driver genes were identified as TP53, ATM, MYD88, SF3B1, NOTCH1, DDX3X, ZMYM3, and FBXW7. Moreover, novel associations with prognostic markers that shed light on the biology underlying this clinically heterogeneous disease were discovered.
These data led to several general conclusions. First, similar to other hematologic malignancies (Ley, Nature 2008; 456:66-72), the somatic mutation rate is lower in CLL than in most solid tumors (Fabbri, J Exp Med, 2011; Puente, Nature, 2011). Second, the rate of non-synonymous mutation was not strongly affected by therapy. Third, in addition to expected mutations in cell cycle and DNA repair pathways, genetic alterations were found in Notch signaling, inflammatory pathways and RNA splicing and processing. Fourth, driver mutations showed striking associations with standard prognostic markers in CLL, suggesting that particular combinations of genetic alterations may cooperate to drive malignancy.
A surprise was the finding that a core spliceosome component, SF3B1, is mutated in about 15% of CLL patients. Further analysis demonstrated that CLL samples with SF3B1 mutations displayed enhanced intron retention within two specific transcripts previously shown to be affected by compounds that disrupt SF3b spliceosome function (Kotake, Nat Chem Biol, 2007, 3:570-5; Kaida, Nat Chem Biol, 2007, 3:576-83). Studies of these compounds have suggested that rather than inducing a global change in splicing, SF3b inhibitors alter the splicing of a narrow spectrum of transcripts derived from genes involved in cancer-related processes, including cell-cycle control (p27, CCA2, STK6, MDM2) (Kaida, Nat Chem Biol, 2007, 3:576-83; Corrionero, Genes Dev 2011, 25:445-59; Fan, ACS Chem Biol, 2011), angiogenesis, and apoptosis (Massiello, FASEB J, 2006, 20:1680-2). These results suggest that SF3B1 mutations induce mistakes in splicing of these and other specific transcripts that affect CLL pathogenesis. Since mutations in SF3B1 are highly enriched in patients with del(11q), SF3B1 mutations may synergize with loss of ATM, a possibility further supported by the observation of 2 patients with point mutations in both ATM and SF3B1 without del(11q).
The invention is further premised, in part, on the discovery of additional novel CLL drivers. These drivers include mutations in risk alleles HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2.
The invention is further based, in part, on the discovery of the significance and impact of subclonal mutations, and particularly subclonal driver mutations such as subclonal SFB1 mutation, including SF3B1, in CLL on disease progression. As shown in the Examples, presence of a subclonal driver mutation (or event) was predictive of the clinical course of CLL from first diagnosis and then following therapy. In both instances, patients with subclonal driver mutations (otherwise referred to herein as subclonal drivers for brevity) had poorer clinical course as compared to patients without subclonal drivers. This discovery indicates that CLL disease course and treatment regimens can be informed by an analysis of subclonal mutation at the time of first presentation but also throughout the disease progression including before and after treatment or simply at staged intervals even in the absence of treatment. Significantly, the data show and the invention contemplates that the impact of certain mutations will vary depending on whether the mutation is present in a clonal population of the CLL or a subclonal population. Certain mutations, when present in subclonal populations, were found to be better predictors of clinical course and outcome than if they were present in clonal populations. Prior to these findings, the effect of any given mutation, when present subclonally, on disease progression was not recognized. Thus, the invention allows subclonal mutation profiles in a subject to be determined, thereby resulting in a more targeted, personalized therapy.
The invention contemplates that subclonal analysis can inform disease management and treatment including decisions such as whether to treat a subject (e.g., if a subclonal driver mutation is found), or whether to delay treatment and monitor the subject instead (e.g., if no subclonal driver mutation is found), when to treat a subject, how to treat a subject, and when to monitor a subject post-treatment for expected relapse. Prior to this disclosure, the impact of the frequency, identity and evolution of subclonal genetic alterations on clinical course was unknown.
Subclonal mutations in one or more of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, TP53, ATM, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12 are of interest in some embodiments. Analysis of a genomic DNA sample for the presence (or absence) of mutation in any one, any two, any three, any four, any five, any six, any seven, any eight, any nine, any ten, any eleven, or more of these genes is contemplated by the invention, in any combination.
As described in the Examples in greater detail, Briefly, analysis of 160 matched CLL and germline DNA samples (including 82 of the 91 samples described above) was performed. These patients represented the broad spectrum of CLL clinical heterogeneity, and included patients with both low- and high-risk features based on established prognostic risk factors (ZAP70 expression, the degree of somatic hypermutation in the variable region of the immunoglobulin heavy chain (IGHV) gene, and presence of specific cytogenetic abnormalities). Somatic single nucleotide variations (sSNVs) present in as few as 10% of cancer cells were detected, and in total, 2,444 nonsynonymous and 837 synonymous mutations in protein-coding sequences were identified, corresponding to a mean (±SD) somatic mutation rate of 0.6±0.28 per megabase (range, 0.03 to 2.3), and an average of 15.3 nonsynonymous mutations per patient (range, 2 to 53).
Expansion of the sample cohort provided the sensitivity to detect 20 putative CLL cancer genes (q<0.1). These included 8 of the 9 genes identified in the 91 CLL sample cohort described above (TP53, ATM, MYD88, SF3B1, NOTCH1, DDX3X, ZMYM3, FBXW7). The 12 newly identified genes were mutated at lower frequencies, and hence were not detected in the subset of the 91 sequenced samples. Three of the 12 additional candidate driver genes were recently identified (XPO1, CHD2, and POT1) (Fabbri et al., J Exp Med. 208, 1389-1401 (2011); Puente et al., Nature. 475, 101-105. (2011)). The 9 remaining genes, NRAS, KRAS, BCOR, EGR2, MED12, RIPK1, SAMHD1, ITPKB, and HIST1H1E, represent additional novel candidate CLL drivers. Together, the 20 candidate CLL driver genes appear to fall into 7 core signaling pathways. Two new pathways were implicated by the analysis: B cell receptor signaling and chromatin modification.
Because recurrent chromosomal abnormalities have defined roles in CLL biology (Darner et al., N Engl J. Med. 343, 1910-1916 (2000); Klein et al., Cancer Cell. 17, 28-40 (2010)), loci that were significantly amplified or deleted were searched by analyzing somatic copy-number alterations (sCNAs). Analysis of 111 matched tumor and normal samples identified deletions in chromosome 8p, 13q, 11q, and 17p and trisomy of chromosome 12 as significantly recurrent events. Thus, based on sSNV and sCNA analysis, 20 mutated genes and 5 cytogenetic alterations were identified as CLL driver events.
Methods described herein were also used to determine whether the CLL driver events were clonal or subclonal. Overall, 1,543 clonal mutations (54% of all detected mutations, average of 10.3±5.5 mutations per sample) were identified, and a total of 1,266 subclonal sSNVs were detected in 146 of 149 samples (46%; average of 8.5±5.8 subclonal mutations per sample). Further analysis revealed that age and mutated IGHV status are associated with an increased number of clonal somatic mutations, subclonal mutations are increased with treatment, and the presence of subclonal driver mutations adversely impacts clinical outcome.
While generally considered incurable, CLL progresses slowly in most cases. Many people with CLL lead normal and active lives for many years—in some cases for decades. Because of its slow onset, early-stage CLL is, in general, not treated since it is believed that early CLL intervention does not improve survival time or quality of life. Instead, the condition is monitored over time to detect any change in the disease pattern.
Traditionally, the decision to start CLL treatment is taken when the patient's clinical symptoms or blood counts indicate that the disease has progressed to a point where it may affect the patient's quality of life.
Clinical “staging systems” such as the Rai 4-stage system and the Binet classification can help to determine when and how to treat the patient (Dohner, N Engl J Med, 2000, 343:1910-6).
Determining when to start treatment and by what means is often difficult; studies have shown there is no survival advantage to treating the disease too early. The invention provided herein is useful in determining whether and when to start treatment.
Accordingly, the invention provides methods of determining the aggressiveness of the disease course in subjects having or suspected of having CLL by identifying one or more mutations in the group consisting of SF3B1, NRAS, KRAS, BCOR, EGR2, MED12, RIPK1, SAMHD1, ITPKB, and HIST1H1E in a subject. Mutations in such genes are considered to be drivers (referred to interchangeably as CLL drivers), intending that they play a central role in the survival and continued growth of CLL cells in a subject. In some aspects, the disclosure provides methods for determining the aggressiveness of the disease course in subjects having or suspected of having CLL by determining whether a CLL driver is clonal or subclonal.
These methods are also useful for monitoring subjects undergoing treatments and therapies for CLL and for selecting therapies and treatments that would be efficacious in subjects having CLL, wherein selection and use of such treatments and therapies slow the progression of the cancer. More specifically, the invention provides methods of determining whether a patient with CLL will derive a clinical benefit of early treatment. Also included in the invention are methods of treating CLL by administering a compound that modulates the expression or activity of SF3B1, including compounds that activate or inhibit expression or activity of SF3B1.
“Accuracy” refers to the degree of conformity of a measured or calculated quantity (a test reported value) to its actual (or true) value. Clinical accuracy relates to the proportion of true outcomes (true positives (TP) or true negatives (TN) versus misclassified outcomes (false positives (FP) or false negatives (FN)), and may be stated as a sensitivity, specificity, positive predictive values (PPV) or negative predictive values (NPV), or as a likelihood, odds ratio, among other measures.
“Biomarker” in the context of the present invention encompasses, without limitation, proteins, nucleic acids, and metabolites, together with their polymorphisms, mutations, variants, modifications, subunits, fragments, protein-ligand complexes, and degradation products, protein-ligand complexes, elements, related metabolites, and other analytes or sample-derived measures. Biomarkers can also include mutated proteins or mutated nucleic acids. Biomarkers also encompass non-blood borne factors or non-analyte physiological markers of health status, such as “clinical parameters” defined herein, as well as “traditional laboratory risk factors”, also defined herein. Biomarkers also include any calculated indices created mathematically or combinations of any one or more of the foregoing measurements, including temporal trends and differences. Where available, and unless otherwise described herein, biomarkers which are gene products are identified based on the official letter abbreviation or gene symbol assigned by the international Human Genome Organization Naming Committee (HGNC) and listed at the date of this filing at the US National Center for Biotechnology Information (NCBI) web site.
A “CLL driver” is any mutation, chromosomal abnormality, or altered gene expression, that contributes to the etiology, progression, severity, aggressiveness, or prognosis of CLL. In some aspects, a CLL driver is a mutation that provides a selectable fitness advantage to a CLL cell and facilitates its clonal expansion in the population. CLL driver may be used interchangeably with CLL driver event and CLL driver mutation. CLL driver mutations occur in genes, genetic loci, or chromosomal regions which may be referred to herein interchangeably as CLL risk alleles, CLL alleles, CLL risk genes, CLL genes, CLL-associated genes and the like.
The disclosure also refers to CLL-associated markers. Such markers may be those known in the art including for example ZAP expression status and IGHV mutation status. Such markers may also include those newly discovered and described herein. Accordingly, CLL-associated markers include CLL drivers, including subclonal CLL drivers, of the invention. Some CLL-associated markers have prognostic value and may be referred to as CLL prognostic markers. Some prognostic markers are referred to as independent prognostic markers intending that they can be used individually to assess prognosis of a patient.
A “clinical indicator” is any physiological datum used alone or in conjunction with other data in evaluating the physiological condition of a collection of cells or of an organism. This term includes pre-clinical indicators.
“Clinical parameters” encompasses all non-sample or non-analyte biomarkers of subject health status or other characteristics, such as, without limitation, age (Age), ethnicity (RACE), gender (Sex), or family history (FamHX).
“FN” is false negative, which for a disease state test means classifying a disease subject incorrectly as non-disease or normal.
“FP” is false positive, which for a disease state test means classifying a normal subject incorrectly as having disease.
A “formula,” “algorithm,” or “model” is any mathematical equation, algorithmic, analytical or programmed process, or statistical technique that takes one or more continuous or categorical inputs (herein called “parameters”) and calculates an output value, sometimes referred to as an “index” or “index value.” Non-limiting examples of “formulas” include sums, ratios, and regression operators, such as coefficients or exponents, biomarker value transformations and normalizations (including, without limitation, those normalization schemes based on clinical parameters, such as gender, age, or ethnicity), rules and guidelines, statistical classification models, and neural networks trained on historical populations. Of particular use in combining biomarkers are linear and non-linear equations and statistical classification analyses to determine the relationship between biomarkers detected in a subject sample and the subject's responsiveness to chemotherapy. In panel and combination construction, of particular interest are structural and synactic statistical classification algorithms, and methods of risk index construction, utilizing pattern recognition features, including established techniques such as cross-correlation, Principal Components Analysis (PCA), factor rotation, Logistic Regression (LogReg), Linear Discriminant Analysis (LDA), Eigengene Linear Discriminant Analysis (ELDA), Support Vector Machines (SVM), Random Forest (RF), Recursive Partitioning Tree (RPART), as well as other related decision tree classification techniques, Shrunken Centroids (SC), StepAIC, Kth-Nearest Neighbor, Boosting, Decision Trees, Neural Networks, Bayesian Networks, Support Vector Machines, and Hidden Markov Models, among others. Other techniques may be used in survival and time to event hazard analysis, including Cox, Weibull, Kaplan-Meier and Greenwood models well known to those of skill in the art. Many of these techniques are useful as forward selection, backwards selection, or stepwise selection, complete enumeration of all potential panels of a given size, genetic algorithms, or they may themselves include biomarker selection methodologies in their own technique. These may be coupled with information criteria, such as Akaike's Information Criterion (AIC) or Bayes Information Criterion (BIC), in order to quantify the tradeoff between additional biomarkers and model improvement, and to aid in minimizing overfit. The resulting predictive models may be validated in other studies, or cross-validated in the study they were originally trained in, using such techniques as Bootstrap, Leave-One-Out (LOO) and 10-Fold cross-validation (10-Fold CV). At various steps, false discovery rates may be estimated by value permutation according to techniques known in the art. A “health economic utility function” is a formula that is derived from a combination of the expected probability of a range of clinical outcomes in an idealized applicable patient population, both before and after the introduction of a diagnostic or therapeutic intervention into the standard of care. It encompasses estimates of the accuracy, effectiveness and performance characteristics of such intervention, and a cost and/or value measurement (a utility) associated with each outcome, which may be derived from actual health system costs of care (services, supplies, devices and drugs, etc.) and/or as an estimated acceptable value per quality adjusted life year (QALY) resulting in each outcome. The sum, across all predicted outcomes, of the product of the predicted population size for an outcome multiplied by the respective outcome's expected utility is the total health economic utility of a given standard of care. The difference between (i) the total health economic utility calculated for the standard of care with the intervention versus (ii) the total health economic utility for the standard of care without the intervention results in an overall measure of the health economic cost or value of the intervention. This may itself be divided amongst the entire patient group being analyzed (or solely amongst the intervention group) to arrive at a cost per unit intervention, and to guide such decisions as market positioning, pricing, and assumptions of health system acceptance. Such health economic utility functions are commonly used to compare the cost-effectiveness of the intervention, but may also be transformed to estimate the acceptable value per QALY the health care system is willing to pay, or the acceptable cost-effective clinical performance characteristics required of a new intervention.
For diagnostic (or prognostic) interventions of the invention, as each outcome (which in a disease classifying diagnostic test may be a TP, FP, TN, or FN) bears a different cost, a health economic utility function may preferentially favor sensitivity over specificity, or PPV over NPV based on the clinical situation and individual outcome costs and value, and thus provides another measure of health economic performance and value which may be different from more direct clinical or analytical performance measures. These different measurements and relative trade-offs generally will converge only in the case of a perfect test, with zero error rate (a.k.a., zero predicted subject outcome misclassifications or FP and FN), which all performance measures will favor over imperfection, but to differing degrees.
“Measuring” or “measurement,” or alternatively “detecting” or “detection,” means assessing the presence, absence, quantity or amount (which can be an effective amount) of either a given substance within a clinical or subject-derived sample, including the derivation of qualitative or quantitative concentration levels of such substances, or otherwise evaluating the values or categorization of a subject's non-analyte clinical parameters. It is to be understood, as will be described in greater detail herein, that the analyzing and detecting steps of the invention are typically carried out using sequencing techniques including but not limited to nucleic acid arrays. Accordingly, analysis or detection, as referred to in the invention, generally depends upon the use of a device or a machine that transforms a nucleic acid into a visible rendering of its nucleic acid sequence in whole or in part. Such rendering may take the form of a computer read-out or output. In order for nucleic acid mutations to be detected, as provided herein, such nucleic acids must be extracted from their natural source and manipulated by devices or machines.
“Mutation” encompasses any change in a DNA, RNA, or protein sequence from the wild type sequence or some other reference, including without limitation point mutations, transitions, insertions, transversions, translocations, deletions, inversions, duplications, recombinations, or combinations thereof. A “clonal mutation” is a mutation present in the majority of CLL cells in a CLL tumor or CLL sample. In some preferred embodiments, “clonal mutation” is a mutation likely present in more than 0.95 (95%) of the cancer cells of a CLL sample, i.e. the cancer cell fraction of the mutation (CCF)>0.95. In other words, there is a probability of greater than 50% that the mutation is present in more than 95% of the cancer cells. A “subclonal mutation” is a mutation present in a single cell or a minority of cells in a CLL tumor or CLL sample. In some preferred aspects, a “subclonal mutation” is a mutation that is unlikely to be present in more than 0.95 (95%) of the cancer cells of a CLL sample (i.e., there is a probability of greater than 50% that the mutation is present in less than 95% of the cancer cells). As will be appreciated, a “clonal mutation” exists in the vast majority of cancer cells and while a “sub-clonal mutation” is only in a fraction of the cancer cells.
“Negative predictive value” or “NPV” is calculated by TN/(TN+FN) or the true negative fraction of all negative test results. It also is inherently impacted by the prevalence of the disease and pre-test probability of the population intended to be tested. See, e.g., O'Marcaigh A S, Jacobson R M, “Estimating The Predictive Value Of A Diagnostic Test, How To Prevent Misleading Or Confusing Results,” Clin. Ped. 1993, 32(8): 485-491, which discusses specificity, sensitivity, and positive and negative predictive values of a test, e.g., a clinical diagnostic test. Often, for binary disease state classification approaches using a continuous diagnostic test measurement, the sensitivity and specificity is summarized by Receiver Operating Characteristics (ROC) curves according to Pepe et al., “Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening Marker,” Am. J. Epidemiol 2004, 159 (9): 882-890, and summarized by the Area Under the Curve (AUC) or c-statistic, an indicator that allows representation of the sensitivity and specificity of a test, assay, or method over the entire range of test (or assay) cut points with just a single value. See also, e.g., Shultz, “Clinical Interpretation Of Laboratory Procedures,” chapter 14 in Teitz, Fundamentals of Clinical Chemistry, Burtis and Ashwood (eds.), 4th edition 1996, W.B. Saunders Company, pages 192-199; and Zweig et al., “ROC Curve Analysis: An Example Showing The Relationships Among Serum Lipid And Apolipoprotein Concentrations In Identifying Subjects With Coronory Artery Disease,” Clin. Chem., 1992, 38(8): 1425-1428. An alternative approach using likelihood functions, odds ratios, information theory, predictive values, calibration (including goodness-of-fit), and reclassification measurements is summarized according to Cook, “Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction,” Circulation 2007, 115: 928-935.
Finally, hazard ratios and absolute and relative risk ratios within subject cohorts defined by a test are a further measurement of clinical accuracy and utility. Multiple methods are frequently used to defining abnormal or disease values, including reference limits, discrimination limits, and risk thresholds.
“Analytical accuracy” refers to the reproducibility and predictability of the measurement process itself, and may be summarized in such measurements as coefficients of variation, and tests of concordance and calibration of the same samples or controls with different times, users, equipment and/or reagents. These and other considerations in evaluating new biomarkers are also summarized in Vasan, 2006.
“Performance” is a term that relates to the overall usefulness and quality of a diagnostic or prognostic test, including, among others, clinical and analytical accuracy, other analytical and process characteristics, such as use characteristics (e.g., stability, ease of use), health economic value, and relative costs of components of the test. Any of these factors may be the source of superior performance and thus usefulness of the test, and may be measured by appropriate “performance metrics,” such as AUC, time to result, shelf life, etc. as relevant.
“Positive predictive value” or “PPV” is calculated by TP/(TP+FP) or the true positive fraction of all positive test results. It is inherently impacted by the prevalence of the disease and pre-test probability of the population intended to be tested.
“Risk” in the context of the present invention, relates to the probability that an event will occur over a specific time period, as in the responsiveness to treatment, cancer recurrence or survival and can mean a subject's “absolute” risk or “relative” risk. Absolute risk can be measured with reference to either actual observation post-measurement for the relevant time cohort, or with reference to index values developed from statistically valid historical cohorts that have been followed for the relevant time period. Relative risk refers to the ratio of absolute risks of a subject compared either to the absolute risks of low risk cohorts or an average population risk, which can vary by how clinical risk factors are assessed. Odds ratios, the proportion of positive events to negative events for a given test result, are also commonly used (odds are according to the formula p/(1−p) where p is the probability of event and (1−p) is the probability of no event) to no-conversion.
“Elevated risk” relates to an increased probability than an event will occur compared to another population. In the context of the present disclosure, “a subject at elevated risk of having CLL with rapid disease progression” refers to a CLL subject having an increased probability of rapid disease progression due to the presence of one or more mutations, including subclonal mutations, in a CLL risk allele, as compared to a CLL subject not having such mutation(s).
“Risk evaluation” or “evaluation of risk” in the context of the present invention encompasses making a prediction of the probability, odds, or likelihood that an event or disease state may occur, the rate of occurrence of the event or conversion from one disease state. Risk evaluation can also comprise prediction of future clinical parameters, traditional laboratory risk factor values, or other indices of cancer, either in absolute or relative terms in reference to a previously measured population. The methods of the present invention may be used to make continuous or categorical measurements of the responsiveness to treatment thus diagnosing and defining the risk spectrum of a category of subjects defined as being responders or non-responders. In the categorical scenario, the invention can be used to discriminate between normal and other subject cohorts at higher risk for responding. Such differing use may require different biomarker combinations and individualized panels, mathematical algorithms, and/or cut-off points, but be subject to the same aforementioned measurements of accuracy and performance for the respective intended use.
A “sample” in the context of the present invention is a biological sample isolated from a subject and can include, by way of example and not limitation, tissue biopies, lymph node tissue, whole blood, serum, plasma, blood cells, endothelial cells, lymphatic fluid, ascites fluid, interstitial fluid (also known as “extracellular fluid” and encompasses the fluid found in spaces between cells, including, inter alia, gingival crevicular fluid), bone marrow, cerebrospinal fluid (CSF), saliva, mucous, sputum, sweat, urine, or any other secretion, excretion, or other bodily fluids. A “sample” may include a single cell or multiple cells or fragments of cells. The sample is also a tissue sample. The sample is or contains a circulating endothelial cell or a circulating tumor cell. The sample includes a primary tumor cell, primary tumor, a recurrent tumor cell, or a metastatic tumor cell.
“CLL sample” refers to a sample taken from a subject having or suspected of having CLL, wherein the sample is believed to contain CLL cells if such cells are present in the subject. The CLL sample preferably contains white blood cells from the subject.
“Sensitivity” is calculated by TP/(TP+FN) or the true positive fraction of disease subjects.
“Specificity”, as it relates to some aspects of the invention, is calculated by TN/(TN+FP) or the true negative fraction of non-disease or normal subjects.
By “statistically significant”, it is meant that the alteration is greater than what might be expected to happen by chance alone (which could be a “false positive”). Statistical significance can be determined by any method known in the art. Commonly used measures of significance include the p-value, which presents the probability of obtaining a result at least as extreme as a given data point, assuming the data point was the result of chance alone. A result is considered highly significant at a p-value of 0.05 or less. Preferably, the p-value is 0.04, 0.03, 0.02, 0.01, 0.005, 0.001 or less.
A “subject” in the context of the present invention is preferably a mammal. The mammal can be a human, non-human primate, mouse, rat, dog, cat, horse, or cow, but are not limited to these examples. Mammals other than humans can be advantageously used as subjects that represent animal models of cancer. A subject can be male or female. In some aspects, a subject is a mammal having or suspected of having CLL. Human subjects may be referred to herein as patients.
“TN” is true negative, which for a disease state test means classifying a non-disease or normal subject correctly.
“TP” is true positive, which for a disease state test means correctly classifying a disease subject.
“Traditional laboratory risk factors” correspond to biomarkers isolated or derived from subject samples and which are currently evaluated in the clinical laboratory and used in traditional global risk assessment algorithms. Traditional laboratory risk factors for tumor recurrence include for example Proliferative index, tumor infiltrating lymphocytes. Other traditional laboratory risk factors for tumor recurrence known to those skilled in the art.
The methods disclosed herein are used with subjects undergoing treatment and/or therapies for CLL, subjects who are at risk for developing a reoccurrence of CLL, and subjects who have been diagnosed with CLL. The methods of the present invention are to be used to monitor or select a treatment regimen for a subject who has CLL, and to evaluate the predicted survivability and/or survival time of a CLL-diagnosed subject.
Aggressiveness of the disease course of CLL is determined by detecting a mutation in one or more of the driver genes provided herein, such as for example the SF3B1 gene, in a test sample (e.g., a subject-derived sample). Optionally, the mutation in the SF3B1 gene occurs at nucleotides that provide coding sequence for the amino acid region between amino acids 550 to 1050 of a SF3B1 polypeptide. The mutation associated with an aggressive disease course includes for example one or more somatic mutations in the SF3B1 gene leading to an amino acid substitution at positions 622, 625, 626, 659, 666, 700, 740, 741, 742 and 903 of the SF3B1 polypeptide. Specifically these mutations results in: glutamic acid to aspartic acid at 622 (E622D); an arginine to leucine or arginine to glycine at position 625 (R625L, R625G); an asparagine to histidine at position 626 (N626H); a glutamine to arginine at 656 (Q659R); a lysine to glutamine or lysine to glutamic acid at 666 (K666Q, K666E); a lysine to glutamic acid at position 700 (K700E); a glycine to glutamic acid at position 740 (G740E); a lysine to asparagine at position 741 (K741N); a glycine to aspartic acid at 742 (G742D); and/or a glutamine to arginine at position 903 (Q903R). These mutations associated with aggressiveness of disease course are referred to herein as the CLL/SF3B1 mutations. In analyzing 160 CLL samples, the K700E SF3B1 mutation was identified in 9 samples, the G742D mutation in four samples, and the following mutations were identified in one CLL sample: E622D, R625G, R625L, Q659R, K666E, G740E, K741N, and Q903R. See Table 1.1 for further details regarding the specific mutations identified in the cohort of 160 CLL samples. The presence of a CLL/SF3B1 mutation indicates a more aggressive disease course. Other mutations in the SF3B1 gene are also contemplated by the invention.
In some aspects, aggressiveness of the CLL disease course, or identifying a subject as a subject at elevated risk of having CLL with rapid disease progression, is determined by detecting a mutation in a test sample (e.g., a subject-derived sample) in one or more genes selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, and POT1, whether alone or in some combination with each other or with other mutations. In some important embodiments of the invention these driver events are subclonal.
In some embodiments, the mutation in HIST1H1E is DV72del, R79H, A167V, P196S, and/or K202E. In some embodiments, the mutation in NRAS is Q61R, and/or Q61K. In some embodiments, the mutation in BCOR is a frame shift mutation at V132, T200, and/or P463, and/or a nonsense mutation at E1382. In some embodiments, the mutation in RIPK1 is A448V, K599R, R603S, and/or a nonsense mutation at Q375. In some embodiments, the mutation in SAMHD1 is M254I, R339S, I386S, and/or a frame shift mutation at R290. In some aspects, the mutation in KRAS is G13D, and/or Q61H. In some embodiments, the mutation in MED12 is E33K, G44S, and/or A59P. In some embodiments, the mutation in ITPKB is a frame shift mutation at E207, and/or E584, and/or the mutation T626S. In some embodiments, the mutation in EGR2 is H384N. In some embodiments, the mutation in DDX3X is a nonsense mutation at S24, and/or a splicing mutation at K342, and/or a frame shift mutation at S410. In some embodiments, the mutation in ZMYM3 is Y1113del, F1302S, and/or a frame shift mutation at S53, and/or a nonsense mutation at Q399. In some embodiments, the mutation in FBXW7 is F280L, R465H, R505C, and/or G597E. In some embodiments, the mutation in ATM is L120R, H2038R, E2164Q, Y2437S, Q2522H, Y2954C, A3006T, and/or a frame shift mutation at K468, L546, and/or L2135, and/or a splicing mutation at C1726, and/or a nonsense mutation at Y2817. In some embodiments, the mutation in TP53 occurs in the DNA binding domain (DBD) of TP53. In some embodiments the mutation in TP53 is L111R, N131del, R175H, H193P, I195T, H214R, 1232F, C238S, C242F, R248Q, I255F, G266V, R267Q, R273C, R273H, R267Q, C275Y, D281N, and/or a splicing mutation at G187. In some embodiments, the mutation in MYD88 occurs in the Toll/Interleukin-1 receptor (TIR) domain of MYD88. In some embodiments, the mutation in MYD88 is M219T, and or L252P. In some embodiments, the mutation in NOTCH1 occurs in the glutamic acid/serine/threonine (PEST) domain of NOTCH1. In some embodiments, the mutation in NOTCH1 is a nonsense mutation at Q2409, and/or a frame shift mutation at P2514. In some embodiments, the mutation in XPO1 is E571K, E571A, and/or D624G. In some embodiments, the mutation in CHD2 is T645M, K702R, R836P, and/or a nonsense mutation at R1072, and/or a splicing mutation at I1427 and/or I1471. In some embodiments, the mutation in POT1 is Y36H, D77G, R137C, and/or a nonsense mutation at Y73 and/or W194. These mutations associated with aggressiveness of disease course are referred to herein as CLL mutations and/or CLL drivers. In some embodiments, the presence of a CLL mutation indicates a more aggressive disease course, or identifies a subject as a subject at elevated risk of having CLL with rapid disease progression.
In some aspects, methods are provided for determining the aggressiveness of the disease course, or identifying a subject as a subject at elevated risk of having CLL with rapid disease progression, by detecting in a test sample (e.g., a subject-derived sample) one or more chromosomal abnormalities including deletions in chromosome 8p, 13q, 11q, and 17p, and trisomy of chromosome 12, whether alone or in some combination with each other or with other mutations. In some important embodiments of the invention these driver events are subclonal. These chromosomal abnormalities are also referred to herein as CLL mutations and/or CLL drivers, and are associated with aggressiveness of disease course. In some embodiments, the presence of a CLL mutation such as a chromosomal abnormality indicates a more aggressive disease course, or identifies a subject as a subject at elevated risk of having CLL with rapid disease progression.
In some aspects, the disclosure provides methods for determining the aggressiveness of the disease course, or identifying a subject as a subject at elevated risk of having CLL with rapid disease progression, in subjects having or suspected of having CLL by determining whether a mutation or a chromosomal abnormality in a CLL driver is clonal or subclonal. In some embodiments, the detection of a subclonal CLL mutation or chromosomal abnormality indicates a more aggressive disease course, or identifies a subject as a subject at elevated risk of having CLL with rapid disease progression. In some embodiments, individual or combined subclonal CLL mutations are independent prognostic markers of CLL, and are used to determine a treatment regimen. For example, as shown in
In some aspects, the detection of a subclonal CLL driver mutation in a subject-derived sample identifies the subject as a subject requiring immediate treatment. In some aspects, the presence of a subclonal CLL mutation in a subject-derived sample identifies the subject as a subject requiring aggressive treatment. In some aspects, the detection of a CLL mutation, including a subclonal CLL mutation, in a subject-derived sample identifies the subject as a subject requiring alternative therapy. By an alternative therapy it is meant that the subject should be treated with a different or altered dose of a medicament, different combinations of medicaments, medicaments that work through varied mechanisms (including a mechanism that is different from that of a previous treatment), or the timing of treatment should be adjusted depending on the identification of a CLL mutation, including subclonal CLL mutations, and/or other clinical indicators. In some examples, alternative therapies are to be considered for subjects identified as having a CLL mutation, including subclonal CLL mutations, wherein the subject had previously been treated for CLL.
In some aspects, methods are methods for determining the aggressiveness of the disease course, or identifying a subject as a subject at elevated risk of having cancer with rapid disease progression, by detecting mutations, and particularly subclonal mutations, in one or more (including two or more) risk alleles selected from the group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12. The presence of a mutations, and particularly subclonal mutations, in two or more risk alleles indicates a more aggressive disease course. The presence of two or more subclonal driver mutations indicates a more aggressive disease course, or identifies a subject as a subject at elevated risk of having CLL with rapid disease progression.
In some aspects, methods are provided for determining the aggressiveness of the disease course, or identifying a subject as a subject at elevated risk of having cancer with rapid disease progression, by (i) detecting a mutation in one or more (including two or more) risk alleles group consisting of SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, and FBXW7; and (ii) detecting a mutation in one or more CLL drivers TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), or trisomy 12. In some aspects, the method further comprises determining whether the mutations in the risk alleles in (i) and (ii) are clonal or subclonal. In some aspects, the presence of two or more subclonal driver mutations indicates a more aggressive disease course, or identifies a subject as a subject at elevated risk of having CLL with rapid disease progression.
In some aspects, methods are provided for determining the aggressiveness of the disease course, or identifying a subject as a subject at elevated risk of having cancer with rapid disease progression, by detecting a mutation in a CLL sample in one or more risk alleles selected from the group consisting SF3B1, HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12, wherein mutations are detected in at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9 or at least 10 risk alleles selected from the group consisting of HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, and EGR2, and optionally SF3B1. In some aspects the method further comprises determining whether the mutation is clonal or subclonal, and identifying the subject as a subject at elevated risk of having CLL with rapid disease progression if the mutation is a driver event and subclonal.
The cell is for example a cancer cell. In all preferred embodiments, the cancer is leukemia such as chronic lymphocytic leukemia (CLL).
By a more aggressive disease course it is meant that the subject having CLL will need treatment earlier than in a CLL subject that does not have the mutation. The methods of the present invention are useful to treat, alleviate the symptoms of, monitor the progression of or delay the onset of cancer.
Preferably, the methods of the present invention are used to identify and/or diagnose subjects who are asymptomatic for a cancer recurrence. “Asymptomatic” means not exhibiting the traditional symptoms.
The methods of the present invention are also useful to identify and/or diagnose subjects already at higher risk of developing a CLL.
Identification of one or more mutations in the SF3B1 gene and other CLL drivers identified herein allows for the determination of whether a subject will derive a benefit from a particular course of treatment, e.g. choice of treatment (i.e., more aggressive) or timing of treatment (e.g., earlier treatment). In this method, a biological sample is provided from a subject before undergoing treatment. Alternately, the sample is provides after a subject has undergone treatment. By “derive a benefit” it is meant that the subject will respond to the course of treatment. By responding it is meant that the treatment decreases in size, prevalence, a cancer in a subject. When treatment is applied prophylactically, “responding” means that the treatment retards or prevents a cancer recurrence from forming or retards, prevents, or alleviates a symptom. Assessments of cancers are made using standard clinical protocols.
The invention also provides method of treating CLL by administering to the subject a compound that modulates (e.g., inhibits or activates) the expression or activity of SF3B1 in which patients harboring mutated SF3B1 may be more sensitive to this compound. The methods are useful to alleviate the symptoms of cancer. Any cancer containing a SF3B1 mutation described herein is amenable to treatment by the methods of the invention. In some aspects the subject is suffering from CLL.
Treatment is efficacious if the treatment leads to clinical benefit such as, a decrease in size, prevalence, or metastatic potential of the tumor in the subject. When treatment is applied prophylactically, “efficacious” means that the treatment retards or prevents tumors from forming or prevents or alleviates a symptom of clinical symptom of the tumor. Efficaciousness is determined in association with any known method for diagnosing or treating the particular tumor type.
In some aspects, methods of treating a subject are provided. In some examples, a method of treatment comprises administering to a subject a therapy (including a therapeutic agent (or medicament), radiation, or other procedures such as transplantation), wherein the subject is identified as having an unfavorable CLL prognosis based upon the detection of one or more CLL mutations, including subclonal mutations.
Treatments or therapeutic agents contemplated by the present disclosure include but are not limited to immunotherapy, chemotherapy, bone marrow and stem cell transplantation, and others known in the art. In some examples, a subject-derived sample wherein a CLL mutation, including a subclonal CLL mutation, is detected, identifies the subject as requiring chemotherapy, wherein one or more of the following non-limiting chemotherapy regimens is administered to the subject: FC (fludarabine with cyclophosphamide), FR (fludarabine with rituximab), FCR (fludarabine, cyclophosphamide, and rituximab), and CHOP (cyclophosphamide, doxorubicin, vincristine and prednisolone). In some examples, combination chemotherapy regimens are administered to a subject identified according to the methods described herein, in both newly-diagnosed and relapsed CLL. In some aspects, combinations of fludarabine with alkylating agents (cyclophosphamide) produce higher response rates and a longer progression-free survival than single agents. Alkylating agents include bendamustine and cyclophosphamide.
In some examples, a subject-derived sample wherein a CLL mutation, including a subclonal CLL mutation, is detected, identifies the subject as requiring immunotherapy, wherein one or more of the following non-limiting immunotherapeutic agents is administered: alemtuzumab (Campath, MabCampath or Campath-1H), rituximab (Rituxan, MabThera) and ofatumumab (Arzerra, HuMax-CD20).
In some examples, a subject-derived sample harboring a CLL mutation, including a subclonal CLL mutation, identifies the subject as requiring bone marrow and/or stem cell transplantation. In some examples, a subject is identified according to the methods provided herein and is indicated as requiring more aggressive therapies, including lenalidomide, flavopiridol, and bone marrow and/or stem cell transplantation.
In some aspects, an aggressive treatment may comprise administering any therapeutic agent described herein or known in the art, either alone or in combination, and will depend upon individual patient characteristics and clinical indicators, as well the identification of prognostic markers as herein described.
Other therapies contemplated include compounds that decrease expression or activity of SF3B1. A decrease in SF3B1 expression or activity can be defined by a reduction of a biological function of SF3B1. A reduction of a biological function of SF3B1 includes a decrease in splicing of a gene or a set of genes. Altered splicing of genes can be measured by detecting a certain gene or subset of genes that are known to be spliced by SF3b spliceosome complex, or SF3B1 in particular, by methods known in the art and described herein. For example, the genes are ROIK3 or BRD2. SF3B1 is measured by detecting by methods known in the art.
SF3B1 modulators, including inhibitors, are known in the art or are identified using methods described herein. The SF3B1 inhibitor is for example splicostatin, E71707 or pladienolide. SF3B1 inhibitors alter splicing activity, for example, reduce, decrease or inhibit splicing. The invention further contemplates targeting of splice variants generated from mutated SF3B1, as a therapeutic target. For example, the impact of these splice variants may be reduced by targeting through inhibitory nucleic acid technologies such as siRNA and antisense.
The present invention can also be used to screen patient or subject populations in any number of settings. For example, a health maintenance organization, public health entity or school health program can screen a group of subjects to identify those requiring interventions, as described above, or for the collection of epidemiological data. Insurance companies (e.g., health, life or disability) may screen applicants in the process of determining coverage or pricing, or existing clients for possible intervention. Data collected in such population screens, particularly when tied to any clinical progression to conditions like cancer, will be of value in the operations of, for example, health maintenance organizations, public health programs and insurance companies. Such data arrays or collections can be stored in machine-readable media and used in any number of health-related data management systems to provide improved healthcare services, cost effective healthcare, improved insurance operation, etc. See, for example, U.S. Patent Application No. 2002/0038227; U.S. Patent Application No. US 2004/0122296; U.S. Patent Application No. US 2004/0122297; and U.S. Pat. No. 5,018,067. Such systems can access the data directly from internal data storage or remotely from one or more data storage sites as further detailed herein.
Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. The language can be a compiled or interpreted language. Each such computer program can be stored on a storage media or device (e.g., ROM or magnetic diskette or others as defined elsewhere in this disclosure) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The health-related data management system of the invention may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform various functions described herein.
Differences in the genetic makeup of subjects can result in differences in their relative abilities to metabolize various drugs, which may modulate the symptoms or risk factors of cancer or metastatic events. Subjects that have cancer, or at risk for developing cancer or a metastatic event can vary in age, ethnicity, and other parameters. Accordingly, detection of the CLL/SF3B1 and/or other CLL driver mutations disclosed herein, both alone and together in combination with known prognostic markers for CLL, allow for a pre-determined level of predictability of the aggressiveness of the disease course and may impact on responsiveness to therapy.
The performance and thus absolute and relative clinical usefulness of the invention may be assessed in multiple ways as noted above. Amongst the various assessments of performance, the invention is intended to provide accuracy in clinical diagnosis and prognosis. The accuracy of a diagnostic, predictive, or prognostic test, assay, or method concerns the ability of the test, assay, or method to distinguish between subjects responsive to chemotherapeutic treatment and those that are not, is based on whether the subjects have the one or more of the CLL/SF3B1 and/or other CLL driver mutations disclosed herein.
In the categorical diagnosis of a disease state, changing the cut point or threshold value of a test (or assay) usually changes the sensitivity and specificity, but in a qualitatively inverse relationship. Therefore, in assessing the accuracy and usefulness of a proposed medical test, assay, or method for assessing a subject's condition, one should always take both sensitivity and specificity into account and be mindful of what the cut point is at which the sensitivity and specificity are being reported because sensitivity and specificity may vary significantly over the range of cut points. Use of statistics such as AUC, encompassing all potential cut point values, is preferred for most categorical risk measures using the invention, while for continuous risk measures, statistics of goodness-of-fit and calibration to observed results or other gold standards, are preferred.
Using such statistics, an “acceptable degree of diagnostic accuracy”, is herein defined as a test or assay in which the AUC (area under the ROC curve for the test or assay) is at least 0.60, desirably at least 0.65, more desirably at least 0.70, preferably at least 0.75, more preferably at least 0.80, and most preferably at least 0.85. By a “very high degree of diagnostic accuracy”, it is meant a test or assay in which the AUC (area under the ROC curve for the test or assay) is at least 0.80, desirably at least 0.85, more desirably at least 0.875, preferably at least 0.90, more preferably at least 0.925, and most preferably at least 0.95.
The predictive value of any test depends on the sensitivity and specificity of the test, and on the prevalence of the condition in the population being tested. This notion, based on Bayes' theorem, provides that the greater the likelihood that the condition being screened for is present in an individual or in the population (pre-test probability), the greater the validity of a positive test and the greater the likelihood that the result is a true positive. Thus, the problem with using a test in any population where there is a low likelihood of the condition being present is that a positive result has limited value (i.e., more likely to be a false positive). Similarly, in populations at very high risk, a negative test result is more likely to be a false negative.
As a result, ROC and AUC can be misleading as to the clinical utility of a test in low disease prevalence tested populations (defined as those with less than 1% rate of occurrences (incidence) per annum, or less than 10% cumulative prevalence over a specified time horizon). Alternatively, absolute risk and relative risk ratios as defined elsewhere in this disclosure can be employed to determine the degree of clinical utility. Populations of subjects to be tested can also be categorized into quartiles by the test's measurement values, where the top quartile (25% of the population) comprises the group of subjects with the highest relative risk for therapeutic unresponsiveness, and the bottom quartile comprising the group of subjects having the lowest relative risk for therapeutic unresponsiveness. Generally, values derived from tests or assays having over 2.5 times the relative risk from top to bottom quartile in a low prevalence population are considered to have a “high degree of diagnostic accuracy,” and those with five to seven times the relative risk for each quartile are considered to have a “very high degree of diagnostic accuracy.” Nonetheless, values derived from tests or assays having only 1.2 to 2.5 times the relative risk for each quartile remain clinically useful are widely used as risk factors for a disease; such is the case with total cholesterol and for many inflammatory biomarkers with respect to their prediction of future events. Often such lower diagnostic accuracy tests must be combined with additional parameters in order to derive meaningful clinical thresholds for therapeutic intervention, as is done with the aforementioned global risk assessment indices.
A health economic utility function is yet another means of measuring the performance and clinical value of a given test, consisting of weighting the potential categorical test outcomes based on actual measures of clinical and economic value for each. Health economic performance is closely related to accuracy, as a health economic utility function specifically assigns an economic value for the benefits of correct classification and the costs of misclassification of tested subjects. As a performance measure, it is not unusual to require a test to achieve a level of performance which results in an increase in health economic value per test (prior to testing costs) in excess of the target price of the test.
In general, alternative methods of determining diagnostic accuracy are commonly used for continuous measures, when a disease category or risk category has not yet been clearly defined by the relevant medical societies and practice of medicine, where thresholds for therapeutic use are not yet established, or where there is no existing gold standard for diagnosis of the pre-disease. For continuous measures of risk, measures of diagnostic accuracy for a calculated index are typically based on curve fit and calibration between the predicted continuous value and the actual observed values (or a historical index calculated value) and utilize measures such as R squared, Hosmer-Lemeshow P-value statistics and confidence intervals. It is not unusual for predicted values using such algorithms to be reported including a confidence interval (usually 90% or 95% CI) based on a historical observed cohort's predictions, as in the test for risk of future breast cancer recurrence commercialized by Genomic Health, Inc. (Redwood City, Calif.).
Detection of the SF3B1 mutations and/or other CLL driver mutations can be determined at the protein or nucleic acid level using any method known in the art. Preferred SF3B1 mutations and/or CLL driver mutations of the invention are missense mutations, for example, R625L, N626H, K700E, K741N, G740E, E622D, R625G, Q659R, K666Q, K666E, G742D, or Q903R in SF3B1. Suitable sources of the nucleic acids encoding SF3B1 include, for example, the human genomic SF3B1 nucleic acid, available as GenBank Accession No: NG—032903.1, the SF3B1 mRNA nucleic acid available as GenBank Accession Nos: NM—001005526.1 and NM—012433.2, and the human SF3B1 protein, available as GenBank Accession Nos: NP—036565.2 and NP—001005526.1.
Suitable sources of the nucleic acids and proteins for the following CLL drivers may be found in Table 1.2: NRAS, KRAS, BCOR, EGR2, MED12, RIPK1, SAMHD1, ITPKB, HIST1H1E, ATM, TP53, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, and POT1.
SF3B1 mutation-specific reagents and/or CLL driver mutation-specific reagents useful in the practice of the disclosed methods include nucleic acids (polynucleotides) and amino acid based reagents such as proteins (e.g., antibodies or antibody fragments) and peptides.
SF3B1 mutation-specific reagents and/or CLL driver mutation-specific reagents useful in the practice of the disclosed methods include, among others, mutant polypeptide specific antibodies and AQUA peptides (heavy-isotope labeled peptides) corresponding to, and suitable for detection and quantification of, mutant polypeptide expression in a biological sample. A mutant polypeptide-specific reagent is any reagent, biological or chemical, capable of specifically binding to, detecting and/or quantifying the presence/level of expressed mutant polypeptide in a biological sample, while not binding to or detecting wild type. The term includes, but is not limited to, the preferred antibody and AQUA peptide reagents discussed below, and equivalent reagents are within the scope of the present invention. The mutation-specific reagents specifically recognize SF3B1 with missense mutations, for example, a SF3B1 polypeptide with mutations at R625L, N626H, K700E, K741N, G740E, E622D, R625G, Q659R, K666Q, K666E, G742D or Q903R. In some aspects, the mutation-specific reagents specifically recognize CLL driver mutations, including but not limited to mutations in HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12.
Reagents suitable for use in practice of the methods of the invention include a mutant polypeptide-specific antibody. A mutant-specific antibody of the invention is an isolated antibody or antibodies that specifically bind(s) a mutant polypeptide of the invention, but does not substantially bind either wild type or mutants with mutations at other positions.
Mutant-specific reagents provided by the invention also include nucleic acid probes and primers suitable for detection of a mutant polynucleotide. These probes are used in assays such as fluorescence in-situ hybridization (FISH) or polymerase chain reaction (PCR) amplification. These mutant-specific reagents specifically recognize or detect nucleic acids encoding a mutant SF3B1 polypeptide, wherein the mutations are at R625L, N626H, K700E, K741N, G740E, E622D, R625G, Q659R, K666Q, K666E, G742D or Q903R. In some aspects, the mutation-specific reagents specifically recognize other CLL driver mutations, including but not limited to mutations in HIST1H1E, NRAS, BCOR, RIPK1, SAMHD1, KRAS, MED12, ITPKB, EGR2, DDX3X, ZMYM3, FBXW7, ATM, TP53, MYD88, NOTCH1, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12.
Mutant polypeptide-specific reagents useful in practicing the methods of the invention may also be mRNA, oligonucleotide or DNA probes that can directly hybridize to, and detect, mutant or truncated polypeptide expression transcripts in a biological sample. Briefly, and by way of example, formalin-fixed, paraffin-embedded patient samples may be probed with a fluorescein-labeled RNA probe followed by washes with formamide, SSC and PBS and analysis with a fluorescent microscope.
Polynucleotides encoding the mutant polypeptide may also be used for diagnostic/prognostic purposes. The polynucleotides that may be used include oligonucleotide sequences, antisense RNA and DNA molecules. The polynucleotides may be used to detect and quantitate gene expression in biopsied tissues, for example the expression of the S3FB1 gene and/or other CLL genes. For example, the diagnostic assay may be used to distinguish between absence, presence, and increased or excess expression of nucleic acids encoding the mutant polypeptide, and to monitor regulation of mutant polypeptide levels during therapeutic intervention.
In one preferred embodiment, hybridization with PCR probes which are capable of detecting polynucleotide sequences, including genomic sequences, encoding mutant polypeptide or truncated active polypeptide, or closely related molecules, may be used to identify nucleic acid sequences which encode mutant polypeptide. The construction and use of such probes is described above. The specificity of the probe, whether it is made from a highly specific region, e.g., 10 unique nucleotides in the mutant junction, or a less specific region, e.g., the 3′ coding region, and the stringency of the hybridization or amplification (maximal, high, intermediate, or low) will determine whether the probe identifies only naturally occurring sequences encoding mutant SF3B1 and/or other CLL mutant polypeptides, alleles, or related sequences.
Probes may also be used for the detection of related sequences, and should preferably contain at least 50% of the nucleotides from any of the mutant polypeptide encoding sequences. The hybridization probes of the subject invention may be DNA or RNA and derived from the nucleotide sequence and encompassing the mutation, or from genomic sequence including promoter, enhancer elements, and introns of the naturally occurring polypeptides but comprising the mutation.
A mutant polynucleotide may be used in Southern or Northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; or in dip stick, pin, ELISA or chip assays utilizing fluids or tissues from patient biopsies to detect altered polypeptide expression. Such qualitative or quantitative methods are well known in the art. Mutant polynucleotides may be labeled by standard methods, and added to a fluid or tissue sample from a patient under conditions suitable for the formation of hybridization complexes. After a suitable incubation period, the sample is washed and the signal is quantitated and compared with a standard value. If the amount of signal in the biopsied or extracted sample is significantly altered from that of a comparable control sample, the nucleotide sequences have hybridized with nucleotide sequences in the sample, and the presence of altered levels of nucleotide sequences encoding mutant polypeptide in the sample indicates the presence of the associated disease. Such assays may also be used to evaluate the efficacy of a particular therapeutic treatment regimen in animal studies, in clinical trials, or in monitoring the treatment of an individual patient.
In order to provide a basis for the diagnosis of disease characterized by expression of mutant polypeptide, a normal or standard profile for expression is established. This may be accomplished by combining body fluids or cell extracts taken from normal subjects, either animal or human, with a sequence, or a fragment thereof, which encodes mutant polypeptide, under conditions suitable for hybridization or amplification. Standard hybridization may be quantified by comparing the values obtained from normal subjects with those from an experiment where a known amount of a substantially purified polynucleotide is used. Standard values obtained from normal samples may be compared with values obtained from samples from patients who are symptomatic for disease. Deviation between standard and subject values is used to establish the presence of disease.
Once disease is established and a treatment protocol is initiated, hybridization assays may be repeated on a regular basis to evaluate whether the level of expression in the patient begins to approximate that which is observed in the normal patient. The results obtained from successive assays may be used to show the efficacy of treatment over a period ranging from several days to months.
Additional diagnostic uses for mutant polynucleotides of the invention may involve the use of polymerase chain reaction (PCR), a preferred assay format that is standard to those of skill in the art. See, e.g., MOLECULAR CLONING, A LABORATORY MANUAL, 2nd edition, Sambrook, J., Fritsch, E. F. and Maniatis, T., eds., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). PCR oligomers may be chemically synthesized, generated enzymatically, or produced from a recombinant source. Oligomers will preferably consist of two nucleotide sequences, one with sense orientation (5′ to 3′) and another with antisense (3′ to 5′), employed under optimized conditions for identification of a specific gene or condition. The same two oligomers, nested sets of oligomers, or even a degenerate pool of oligomers may be employed under less stringent conditions for detection and/or quantitation of closely related DNA or RNA sequences.
In certain preferred embodiments, sequencing technologies, including but not limited to whole genome sequencing (WGS), whole exome sequencing (WES), deep sequencing, and targeted gene sequencing, are used to detect, measure, or analyze a sample for the presence of a CLL mutation.
WGS (also known as full genome sequencing, complete genome sequencing, or entire genome sequencing), is a process that determines the complete DNA sequence of a subject. In some aspects, WGS, as embodied in the methods of Ng and Kirkness, Methods Mol. Biol.; 628:215-26 (2010), may be employed with the methods of the present disclosure to detect CLL mutations in a sample.
WES (also known as exome sequencing, or targeted exome capture), is an efficient strategy to selectively sequence the coding regions of the genome of a subject as a cheaper but still effective alternative to WGS. As exemplified by the methods of Gnirke et al., Nature Biotechnology 27, 182-189 (2009), WES of tumors and their patient-matched normal samples is an affordable, rapid and comprehensive technology for detecting somatic coding mutations. In some aspects, WES may be employed with the methods of the present disclosure to detect CLL mutations in a sample.
Deep sequencing methods provide for greater coverage (depth) in targeted sequencing approaches. “Deep sequencing,” “deep coverage,” or “depth” refers to having a high amount of coverage for every nucleotide being sequenced. The high coverage allows not only the detection of nucleotide changes, but also the degree of heterogeneity at every single base in a genetic sample. Moreover, deep sequencing is able to simultaneously detect small indels and large deletions, map exact breakpoints, calculate deletion heterogeneity, and monitor copy number changes. In some aspects, deep sequencing strategies, as provided by Myllykangas and Ji, Biotechnol Genet Eng Rev. 27:135-58 (2010), may be employed with the methods of the present disclosure to detect CLL mutations in a sample.
In preferred embodiments, sequencing technologies, including but not limited to whole genome sequencing (WGS), whole exome sequencing (WES), deep sequencing, and targeted gene sequencing, as described herein, are used to determine whether a CLL mutation in a sample is clonal or subclonal. In some examples, WES of tumors and their patient-matched normal samples combined with analytical tools provides for analysis of subclonal mutations because: (i) the high sequencing depth obtained by WES (typically ˜100-150×) enables reliable detection of a sufficient number of subclonal mutations required for defining subclones and tracking them over time; (ii) coding mutations likely encompass many of the important driver events that provide fitness advantage for specific clones; and finally, (iii) the relatively low cost of whole-exome sequencing permits studies of large cohorts, which is key for understanding the relative fitness and temporal order of driver mutations and for assessing the impact of clonal heterogeneity on disease outcome. WES thus allows for identification of CLL subclones and the mutations that they harbor by integrative analysis of coding mutations and somatic copy number alterations, which enable estimation of the cancer cell fraction (CCF). WES analysis further provides for the study of mutation frequencies, observation of clonal evolution, and linking of subclonal mutations to clinical outcome.
In some examples, the sequencing data generated using sequencing technologies is processed using analytical tools including but not limited to the Picard data processing pipeline (DePristo et al., Nat. Genet. 43, 491-498 (2011)), the Firehose pipeline available at The Broad Institute, Inc. website, MutSig available at The Broad Institute, Inc. website, HAPSEG (Carter et al., Available from Nature Preceedings), GISTIC2.0 algorithm (Mermel et al., Genome Biol. 12(4):R41 (2011)), and ABSOLUTE available at The Broad Institute, Inc. website. Such analytical tools allow for, in some examples, the identification of sSNVs, sCNAs, indels, and other structural chromosomal rearrangements, and provide for the determination of sample purity, ploidy, and absolute somatic copy numbers. In some examples, the use of analytical tools with sequencing data obtained from a CLL sample allows for the determination of the cancer cell fraction (CCF) harboring a mutation, thus identifying whether a mutation is clonal or subclonal.
Methods which may also be used to quantitate the expression of mutant polynucleotide include radiolabeling or biotinylating nucleotides, coamplification of a control nucleic acid, and standard curves onto which the experimental results are interpolated (Melby et al., J. Immunol. Methods, 159:235-244 (1993); Duplaa et al. Anal. Biochem. 229-236 (1993)). The speed of quantitation of multiple samples may be accelerated by running the assay in an ELISA format where the oligomer of interest is presented in various dilutions and a spectrophotometric or calorimetric response gives rapid quantitation.
Other suitable methods for nucleic acid detection, such as minor groove-binding conjugated oligonucleotide probes (see, e.g. U.S. Pat. No. 6,951,930, “Hybridization-Triggered Fluorescent Detection of Nucleic Acids”) are known to those of skill in the art. Also provided by the invention is a kit for the detection of the mutation in a biological sample, the kit comprising an isolated mutant-specific reagent of the invention and one or more secondary reagents. Suitable secondary reagents for employment in a kit are familiar to those of skill in the art, and include, by way of example, buffers, detectable secondary antibodies or probes, activating agents, and the like.
In some aspects, a kit is provided for the detection of a mutation in a biological sample, the kit comprising isolated mutant-specific reagents for the detection of a mutation in one or more CLL drivers in the group consisting of SF3B1, NRAS, KRAS, BCOR, EGR2, MED12, RIPK1, SAMHD1, ITPKB, HIST1H1E, ATM, TP53, MYD88, NOTCH1, DDX3X, ZMYM3, FBXW7, XPO1, CHD2, POT1, del(8p), del(13q), del(11q), del(17p), and trisomy 12. In some aspects, the kit further comprises reagents for evaluating the degree of somatic hypermutation in the IGHV gene; and reagents for evaluating the expression status of ZAP70.
In some aspects, a kit is provided for the detection of a mutation in a biological sample, the kit comprising mutant-specific reagents comprising mutant-specific antibodies that specifically bind a mutant polypeptide encoded by a CLL gene, but does not substantially bind either wild type or mutants with mutations at other positions. Such antibodies are used in assays such as immunohistochemistry (IHC), ELISA, and flow cytometry assays such as fluorescence activated cell sorting (FACS).
In some aspects, a kit is provided for the detection of a mutation in a biological sample, the kit comprising mutant-specific reagents comprising nucleic acid probes and primers suitable for detection of a CLL mutation. These probes are used in assays such as fluorescence in-situ hybridization (FISH) or polymerase chain reaction (PCR) amplification. These mutant-specific reagents specifically recognize or detect nucleic acids of a CLL driver in a biological sample.
In some aspects, a kit is provided for the detection of a mutation in a biological sample, the kit comprising mutant-specific reagents comprising mRNA, oligonucleotide or DNA probes that can directly hybridize to, and detect, mutant or truncated expression transcripts off a CLL driver, or directly hybridize to and detect chromosomal abnormalities in a biological sample.
In some aspects, a kit is provided for the detection of a mutation in a biological sample, the kit comprising a single nucleotide polymorphism (SNP) array that detects one or more mutations in a CLL gene.
In some aspects, a kit is provided for the detection of a mutation in a biological sample, the kit comprising mutant-specific reagents for the detection of one or more mutations in one or more CLL drivers using sequencing methods such as whole genome sequencing (WGS), whole exome sequencing, deep sequencing, targeted sequencing of cancer genes, or any combination thereof, as described herein.
In preferred embodiments, any kit described herein further comprises instructions for use.
The methods of the invention may be carried out in a variety of different assay formats known to those of skill in the art.
Other clinical indicators that are useful for diagnosing, prognosing, or evaluating a subject with CLL for determining treatment regimens or predicting survival are known in the art. These other clinical indicators are referred to herein as “CLL biomarkers” or CLL-associated markers and include, for example, but are not limited to mutations in CLL-associated genes, increased expression of CLL-associated genes, chromosomal rearrangements, and micro-RNAs. These other clinical indicators can also be used in methods of the present invention in combination with identifying a SF3B1 and/or CLL driver mutation.
Other biomarkers associated with CLL that may be used in the methods described herein include, for example, mutated IGHV, increased expression of ZAP70, increased levels of β2-microglobulin, increased levels of enzyme sTK, increased CD38 expression, and increased levels of Ang-2. Other genes that are known in the art to be indicative or prognostic of CLL initiation, progression or response to treatment can also be used in the present invention. Polynucledotides encoding these biomarkers or the polypeptides of the CLL biomarkers disclosed herein can be detected or the levels can be determined by methods known in the art and described herein. For example, the mutational status of IGHV can be assessed by various DNA sequencing methods known in the art, such as Sanger sequencing. In other embodiments, CD38 and ZAP70 expression levels can be assessed by flow cytometry.
Other CLL biomarkers can include various chromosomal abnormalities, such as 11q deletion, 17p deletion, Trisomy 12, 13q deletion, monosomy 13, and rearrangements of chromosome 14. Other chromosomal rearrangements, amplifications, deletions, or other abnormalities can also be used in the methods described herein. Particularly of interest are chromosomal abnormalities, rearrangements, or deletions that affect p53 or ATM function, wherein p53 and/or ATM function is decreased or inhibited. Methods for identifying chromosomal status are well known in the art. For example, fluorescence in-situ hybridization (FISH) can be utilized to detect chromosomal abnormalities.
Additional clinical indicators for CLL include lymphocyte doubling time, which can be calculated by determining the number of months it takes for the absolute lymphocyte count to double in number. Another clinical indicator for CLL includes atypical circulating lymphocytes in the blood, wherein the lymphocytes show abnormal nuclei (such as cleaved or lobated), irregular nuclear contours, or enlarged size.
The invention includes administering to a subject compositions comprising an SF3B1 modulator such as an inhibitor.
SF3B1 modulators such as inhibitors alter splicing activity, for example, reduce, decrease, increase, activate or inhibit the biological function of SF3B1, such as splicing. SF3B1 inhibitors can be readily identified by an ordinarily skilled artisan by assaying for altered SF3B1 activity, i.e., splicing.
Altered splicing of genes can be measured by detecting a certain gene or subset of genes that are known to be spliced by SF3b spliceosome complex, or SF3B1 in particular, by methods known in the art and described herein. For example, the genes are ROIK3 or BRD2.
Other therapeutic regimens are contemplated by the invention as described above.
An effective amount of a therapeutic compound is preferably from about 0.1 mg/kg to about 150 mg/kg. Effective doses vary, as recognized by those skilled in the art, depending on route of administration, excipient usage, and coadministration with other therapeutic treatments including use of other anti-proliferative agents or therapeutic agents for treating, preventing or alleviating a symptom of a cancer. A therapeutic regimen is carried out by identifying a mammal, e.g., a human patient suffering from a cancer that has a SF3B1 mutation using standard methods.
The pharmaceutical compound is administered to such an individual using methods known in the art. Preferably, the compound is administered orally, rectally, nasally, topically or parenterally, e.g., subcutaneously, intraperitoneally, intramuscularly, and intravenously. The modulators (such as inhibitors) are optionally formulated as a component of a cocktail of therapeutic drugs to treat cancers. Examples of formulations suitable for parenteral administration include aqueous solutions of the active agent in an isotonic saline solution, a 5% glucose solution, or another standard pharmaceutically acceptable excipient. Standard solubilizing agents such as PVP or cyclodextrins are also utilized as pharmaceutical excipients for delivery of the therapeutic compounds.
The therapeutic compounds described herein are formulated into compositions for other routes of administration utilizing conventional methods. For example, the therapeutic compounds are formulated in a capsule or a tablet for oral administration. Capsules may contain any standard pharmaceutically acceptable materials such as gelatin or cellulose. Tablets may be formulated in accordance with conventional procedures by compressing mixtures of a therapeutic compound with a solid carrier and a lubricant. Examples of solid carriers include starch and sugar bentonite. The compound is administered in the form of a hard shell tablet or a capsule containing a binder, e.g., lactose or mannitol, conventional filler, and a tableting agent. Other formulations include an ointment, suppository, paste, spray, patch, cream, gel, resorbable sponge, or foam. Such formulations are produced using methods well known in the art.
Therapeutic compounds are effective upon direct contact of the compound with the affected tissue. Accordingly, the compound is administered topically. Alternatively, the therapeutic compounds are administered systemically. For example, the compounds are administered by inhalation. The compounds are delivered in the form of an aerosol spray from pressured container or dispenser which contains a suitable propellant, e.g., a gas such as carbon dioxide, or a nebulizer.
Additionally, compounds are administered by implanting (either directly into an organ or subcutaneously) a solid or resorbable matrix which slowly releases the compound into adjacent and surrounding tissues of the subject.
Heparinized blood samples and skin biopsies were obtained from normal donors and patients enrolled on clinical research protocols that were approved by the Human Subjects Protection Committee at the Dana-Farber Cancer Institute (DFCI). In some cases, 2 ml of saliva was collected from study participants as a source of normal epithelial cell DNA. Peripheral blood mononuclear cells (PBMC) from normal donors and patients were isolated by Ficoll/Hypaque density gradient centrifugation. CD19+ B cells from normal volunteers were isolated by immunomagnetic selection (Miltenyi Biotec, Auburn Calif.). Mononuclear cells were used fresh or cryopreserved with FBS 10% DMSO and stored in vapor-phase liquid nitrogen until the time of analysis. Primary skin fibroblast lines were generated from five mm diameter punch biopsies of skin that were provided to the Cell Culture Core lab of the Harvard Skin Disease Research Center, as previously described (Zhang, Clin Cancer Res 2010; 16:2729-39). Second or third passage cultures were used for genomic DNA isolation.
Prognostic Factor Analysis.
Immunoglobulin heavy-chain variable (IGHV) homology (high risk unmutated was defined as greater than or equal to 98% homology to the closest germline match) and ZAP-70 expression (high risk positive defined as >20%) were determined as previously described (Rassenti, N Engl J Med, 2004, 351:893-901). Cytogenetics were evaluated by FISH for the most common CLL abnormalities (del(13q), trisomy 12, del(11q), del(17p), rearrangements of chromosome 14; all probes from Vysis, Des Plaines, Ill.) at the Brigham and Women's Hospital Cytogenetics Laboratory, Boston Mass. (Dohner, N Engl J Med, 2000, 343:1910-6). Samples were scored positive for a chromosomal aberration based on consensus cytogenetic scoring (Cancer, Genet Cytogenet, 2010, 203:141-8). Percent tumor cells harboring common CLL cytogenetic abnormalities, detected by FISH cytogenetics, are tabulated per sample in Table 9.
Whole-Genome and -Exome DNA Sequencing.
Informed consent on DFCI IRB-approved protocols for whole genome sequencing of patients' samples was obtained prior to the initiation of sequencing studies. Genomic DNA was isolated from patient CD19+CD5+ tumor cells and autologous skin fibroblasts (Wizard kit; Promega, Madison Wis.) per manufacturer's instructions. Alternatively, germline genomic DNA was extracted from autologous epithelial cells, obtained from saliva samples (DNA Genotek, Kanata, Ontario, Canada) or from autologous blood granulocytes, isolated following Ficoll/Hypaque density gradient centrifugation.
Whole genome shotgun (WG) and whole exome (WE) capture libraries were constructed as previously described (Chapman, Nature, 2011, 471:467-72; Gnirke, Nat Biotechnol, 2009, 27:182-9; Berger, Nature, 2011, 470:214-20). For 51 (56%) of the 91 CLL samples included in the analysis, sequencing was performed on capture libraries generated from whole genome amplified (WGA) samples. For those samples, 100 ng inputs of samples were whole genome amplified with the Qiagen REPLI-g Midi Kit (Valencia, Calif.). No significant differences in mutation rate were observed between data originating from WGA and non-WGA samples (see Table 3). WGS libraries were sequenced on an average of 39 lanes of an Illumina GA-II sequencer, using 101 bp paired-end reads, with the aim of reaching 30× genomic coverage of distinct molecules per sample (Chapman, Nature, 2011, 471:467-72; Berger, Nature, 2011, 470:214-20). Exome sequencing libraries were sequenced on three lanes of the same instrument, using 76 bp paired-end reads.
Sequencing data subsequently was processed using the “Picard” pipeline, developed at the Broad Institute's Sequencing Platform (Fennell T, unpublished; Cambridge, Mass.), which includes base-quality recalibration (DePristo, Nat Genet. 2011, 43:491-8), alignment to the NCBI Human Reference Genome Build hg18 using MAQ (Li, Genome Res 2008, 18:1851-8), and aggregation of lane- and library-level data.
Identification of Somatic Tumor Mutations and Calculation of Significance.
From the sequencing data, tumor-specific gene alterations were identified using a set of tools contained with the “Firehose” pipeline (Chapman, Nature, 2011, 471:467-72; Berger, Nature, 2011, 470:214-20), developed at the Broad Institute. Somatic single nucleotide variations (SSNVs) were detected using muTect, while somatic small insertions and deletions were detected using the algorithm Indelocator. The algorithm MutSig (Lawrence in preparation; (Ding, Nature 2008, 455:1069-75; Network, Nature 2008, 455:1061-8; Getz, Science 2007, 317:1500)) was applied to sequencing data from the 3 genomes and 88 exomes. Briefly, MutSig tabulates the number of mutations and the number of adequately covered bases for each gene (i.e. bases with >=14 tumor and >=8 normal reads). The counts are broken down by mutation context category (i.e. CpG transitions, other C:G transitions, any transversion, A:T transitions). For each gene, the probability of seeing the observed constellation of mutations or a more extreme one, given the background mutation rates calculated across the dataset was calculated (see Table 3 for background mutation rate). This is done by convoluting a set of binomial distributions as described previously, which results in a p and q value (Getz, Science 2007, 317:1500). The 4 samples for which normal germline DNA was derived from blood granulocytes had a significantly lower detection of somatic mutations, suggesting contamination with tumor DNA. Reanalysis excluding these 4 samples had little effect on mutation rate (increased by only 5%: 0.71 mutations/Mb to 0.75 mutations/Mb) and yielded the same results of significantly mutated genes (q<0.1). All mutations in genes that were significantly mutated or within pathways related to these significantly mutated genes were confirmed by manual inspection of the sequencing data (Robinson, Nat Biotechnol 2011; 29:24-6). Furthermore, these mutations were also validated using an independent platform (Sequenom mass spectrometry-based genotyping). There was no significant difference in non-synonymous mutation rate between IGHV-mutated and unmutated patients (despite 82% power to detect differences of 0.6 standard deviations; one-sided 0.05 level test) or between different clinical stages. The ability to detect mutations of low allele fraction depends on several factors, including the purity and ploidy of the sample, and the copy number at the locus in question. Graphical representation of the distribution of allelic fraction among the total number of 2348 mutations detected is depicted in
Statistical Analysis of Mutation Rate in Association with Clinical Variables.
Clinical data were available from 91 CLL samples comprising the genome/exome sequenced discovery set, and from 101 CLL samples used for extension and validation. The association between patient characteristics and clinical variables such as time to first treatment (TTFT) and mutation rate or presence or absence of driver mutations was tested. P-values were calculated using the Wilcoxon rank sum test for quantitatively measured variables across two groups, the Fisher Exact test for categorical variables, the Kruskal-Wallis test for quantitatively measured variables across three groups and for ordered categorical data, and the log rank test for comparing Kaplan-Meier estimated censored time to event variables. Time to first therapy was defined as the elapsed time between initial diagnosis and first treatment for CLL. Patients who remained untreated for their disease at the most recent follow-up were censored at that time. All statistical tests were performed using SAS software version 9.2 and R version 2.8.0.
Univariate analysis was performed using Cox proportional hazards regression for the 19 variables potentially predictive of TTFT including (IGHV mutated vs. unmutated vs. unknown, ZAP-70 negative vs. positive vs. unknown, Rai stage at sampling 0/1 vs 2/3/4 vs unknown, age (≧55 yrs. vs. <55 yrs), sex, presence of del(17p), del(11q), trisomy(12), homozygous del(13q), heterozygous del(13q), presence of mutations in ATM, NOTCH1, SF3B1, TP53, DDX3X, ZMYM3, FBXW7, MYD88. A stepwise Cox proportional hazards regression model of TTFT was performed for the 91 discovery samples, using the 19 variables listed above. The same final model was obtained with a forward selection procedure. Step-up models using the −2 log likelihood statistic to assess goodness of fit using the appropriate degrees of freedoms were also explored. Cox modeling results are reported as hazard ratios along with the 95% confidence intervals.
Detection of Altered RNA Splicing.
Total RNA was extracted from normal B and CLL-B cells (TRIZOL; Invitrogen, Carlsbad Calif.). 2 μg total RNA from each sample was treated with DNase I (2 units/sample; New England BioLabs, Ipswich Mass.) at 37° C. for 20 minutes to remove contaminating genomic DNA, followed by heat-inactivation of DNase I at 75° C. for 15 minutes, and then used as template to synthesize cDNA by reverse transcription (SuperScript® III First-Strand kit; Invitrogen, Carlsbad Calif.). We designed in parallel quantitative Taqman assays primers to detected spliced transcripts across consecutive exons, and unspliced transcripts in which one primer was localized within the retained intron. Details of primer design the splicing assays for RIOK3, and BRD2 are noted in Table 11. All assays were run in triplicate using the 7500 Fast System (Applied Biosystems, Carlsbad Calif.), and all values were normalized to GAPDH gene expression. Relative splicing activity was measured by calculating the ratio of unspliced to spliced forms of each target gene. For some experiments, splicing was measured following treatment of 293 cells or normal B cells or CLL cells with the SF3b-complex targeting drug E7107 at 1 μM (gift of Robin Reed, HMS).
DNA derived from CD19+CD5+ leukemia cells was sequenced and matched germline DNA derived from autologous skin fibroblasts, saliva-derived epithelial cells or blood granulocytes. Samples were taken from patients displaying a broad range of clinical characteristics, including the high-risk deletions of chromosomes 11q and 17p, and both unmutated and mutated IGHV (
1838 non-synonymous and 539 synonymous mutations were detected in protein-coding sequences, corresponding to an average somatic mutation rate of 0.72/Mb (SD=0.36, range 0.075-2.14), and an average of 20 non-synonymous mutations per individual (range 2-76) (Table 1; Table 2). This rate is similar to that previously reported for CLL and other hematologic malignancies (Fabbri, J Exp Med, 2011; Puente, Nature, 2011; Chapman, Nature, 2011, 471:467-72; Mardis, N Engl J Med 2009, 361:1058-66; Ley, Nature 2008; 456:66-72). There was no significant difference in non-synonymous mutation rate between IGHV-mutated and -unmutated tumors or between different clinical stages of disease (Table 3). Prior exposure to chemotherapy (30 of 91 samples) was not associated with increased non-synonymous mutation rate (p=0.14,
To identify genes whose mutations were associated with CLL tumorigenesis (‘driver’ mutations), all 91 leukemia/normal pairs were examined using the MutSig algorithm for genes that were mutated significantly more than the background rate given their sequence composition. Eight such genes were identified, with q<0.1 after correction for multiple hypothesis testing: TP53, SF3B1, MYD88, ATM, FBXW7, NOTCH1, ZMYM3, and DDX3X (
Four of the significantly mutated genes, TP53, ATM, MYD88 and NOTCH1, have been described previously in CLL (Puente, Nature, 2011; Austen, Blood, 2005, 106:3175-82; Zenz, J Clin Oncol, 2010, 28:4473-9; Trbusek, J Clin Oncol 2011; 29:2703-8). 15 TP53 mutations in 14 of 91 CLL samples (15%; q≦6.3×10−8), mostly localized to the DNA binding domain that is critical for its tumor suppressor activity (Zenz, J Clin Oncol, 2010, 28:4473-9) (
Four of the significantly mutated genes (SF3B1, FBXW7, DDX3X, ZMYM3) have not been reported in CLL. Strikingly, the second most frequently mutated gene within our cohort was splicing factor 3b, subunit 1 (SF3B1), with missense mutations in 14 of 91 CLL samples (15%) (
The four remaining significantly mutated genes are novel to CLL and appear to have functions that interact with the 5 frequently mutated genes cited above (
The three most recurrent mutations, SF3B1-K700E, MYD88-L265P, and NOTCH1-P2514fs, were validated on 101 independent paired CLL-germline DNA samples, in which comparable detection frequencies was observed between the discovery and extension cohort (p=0.20, 0.58, and 0.38, respectively) (Table 6).
The nine significantly mutated genes fall into five core signaling pathways, in which the genes play well-established roles: DNA damage repair and cell-cycle control (TP53 and ATM), Notch signaling (FBXW7 and NOTCH1 (O'Neil J Exp Med, 2007, 204:1813-24)), inflammatory pathways (MYD88 and DDX3X) and RNA splicing/processing (SF3B1, DDX3X) (
To examine the association between driver mutations and particular clinical features, CLL-associated cytogenetic aberrations and IGHV mutation status in samples harboring mutations in the 9 significantly mutated genes were assessed. Samples were ordered based on FISH cytogenetics, utilizing an established model of hierarchical risk (Dohner, N Engl J Med, 2000, 343:1910-6) (i.e. del(13q), most favorable prognosis when present alone; trisomy 12; and del(11q) and del(17p), both associated with aggressive chemotherapy-refractory disease) (
The distinct prognostic implications of these cytogenetic abnormalities have suggested that they may reflect distinct pathogenesis. These data demonstrate associations of different driver mutations with different key FISH abnormalities, providing support for this hypothesis. Consistent with prior literature (Zenz, J Clin Oncol, 2010, 28:4473-9), most TP53 mutations (11 of 17) were present in samples also harboring del(17p) (p<0.001), resulting in homozygous p53 inactivation. Mutations in ATM—which lies in the minimally deleted region of chromosome 11q—were marginally associated with del(11q) (4 of 22 del(11q) samples, (p=0.09)). Strikingly, mutations in SF3B1 were associated with del(11q) (8 of 22 (36%) del(11q) samples; p=0.004). Of the six CLL samples with mutated SF3B1 and without del(11q), two also harbored a heterozygous mutation in ATM. These findings strongly suggest an interaction between del(11q) and SF3B1 mutation in the pathogenesis of this clinical subgroup of CLL.
Furthermore, the NOTCH1 and FBXW7 mutations were associated with trisomy 12 (p=0.009, and 0.05, respectively). As in previous reports (Fabbri, J Exp Med, 2011; Puente, Nature, 2011), NOTCH1 mutations consistently associated with unmutated IGHV status. The data described herein show that the NOTCH1 and FBXW7 mutations were present in independent samples, suggesting they may similarly lead to aberrant Notch signaling in this clinical subgroup.
All MYD88 mutations were present in samples harboring heterozygous del(13q) (p=0.009). As in recent reports (Fabbri, J Exp Med, 2011; Puente, Nature, 2011), the data demonstrate that MYD88 mutation was always associated with mutated IGHV status (p=0.001), which suggests a post-germinal center origin. These results indicate that, like in DLBCL, where MYD88 is frequently mutated (Ngo, Nature 2011, 470:115-9), constitutive activation of the NF-κB/TLR pathway may have larger impact in the germinal center context.
Mutations in NOTCH1 and MYD88 were respectively associated with unmutated and mutated IGHV status across the 192 CLL samples in the discovery and extension sets. Mutation SF3B1-K700E was associated with unmutated IGHV, p=0.048, but was also distributed in IGHV-mutated samples, suggesting that it is an independent risk factor (
Because SF3B1 encodes a splicing factor that lies at the catalytic core of the spliceosome, functional evidence of alterations in splicing associated with SF3B1 mutation was examined. Kotake et al. previously used intron retention in the endogenous genes BRD2 and RIOK3 to assay function of the SF3b complex (Kotake, Nat Chem Biol, 2007, 3:570-5). The SF3B1 inhibitor E7107, which targets the spliceosome complex, inhibits splicing of BRD2 and RIOK3 in both normal and CLL-B cells (
Experimental Procedures.
149 patients with CLL provided tumor and normal DNA for sequencing and copy number assessment in this study. Tumor and normal DNA from 11 additional patients were also analyzed by DNA sequencing alone (a total of 160 CLL samples). 82 CLL samples were previously reported (Quesada et al., 2012; Wang et al., 2011), and the raw BAM files for these samples were re-processed and re-analyzed together with the new data, to ensure the consistency of the results as well as enable the detection of smaller subclones made possible with a newer version of the mutation caller [MuTect]. Written informed consent was obtained prior to sample collection according to the Declaration of Helsinki. DNA was extracted from blood- or marrow-derived lymphocytes (tumor) and autologous epithelial cells (saliva), fibroblasts or granulocytes (normal).
Libraries for whole-exome sequencing (WES) were constructed and sequenced on either an Illumina HiSeq 2000 or Illumina GA-IIX using 76 bp paired-end reads, and data were processed, as detailed elsewhere (Berger et al., 2011; Chapman et al., 2011; Fisher et al., 2011). As previously described (Chapman et al., 2011), output from Illumina software was processed by the Picard data processing pipeline to yield BAM files containing well calibrated, aligned reads (DePristo et al., 2011). BAM files were processed by the Firehose pipeline, which performs QC and identifies somatic single nucleotide variations (sSNVs), indels, and other structural chromosomal rearrangements. Recurrent sSNV and indels in 160 CLLs were identified using MutSig2.0 (Lohr et al., 2012). For 111 of 149 matched CLL-normal DNA samples, copy number profiles were obtained using the Genome-wide Human SNP Array 6.0 (Affymetrix), according to the manufacturer's protocol (Genetic Analysis Platform, Broad Institute, Cambridge Mass.), with allele-specific analysis [HAPSEG (Carter, 2011)]. Significant recurrent somatic copy number alterations (sCNAs) were identified using the GISTIC2.0 algorithm (Mermel et al., 2011). Regions with germline copy number variants were excluded from the analysis. For CLL samples with no available SNP arrays (38 of 149 CLLs), sCNAs were estimated directly from the WES data, based on the ratio of CLL sample read-depth to the average read-depth observed in normal samples for that region. We applied the algorithm ABSOLUTE (Carter et al., 2012), to estimate sample purity, ploidy, and absolute somatic copy numbers. These were used to infer the cancer cell fraction (CCF) of point mutations from the WES data. Following the framework previously described (Carter et al., 2012), we computed the posterior probability distribution over CCF c as follows. Consider a somatic mutation observed in a of N sequencing reads on a locus of absolute somatic copy-number q in a sample of purity α. The expected allele-fraction f of a mutation present in one copy in a fraction c of cancer cells is calculated by f(c)=αc/(2(1−α)+αaq, with cε[0.01,1]. Then P(c)∝Binom(a|N,f(c)), assuming a uniform prior on c. The distribution over CCF was then obtained by calculating these values over a regular grid of 100 c values and normalizing. Mutations were thereafter classified as clonal based on the posterior probability that the CCF exceeded 0.95, and subclonal otherwise. Validation of allelic fraction was performed by using deep sequencing with indexed libraries recovered on a Fluidigm chip. Resulting normalized libraries were loaded on a MiSeq instrument (Illumina) and sequenced using paired-end 150 bp sequencing reads to an average coverage depth of 4200×.
Associations between mutation rates and clinical features were assessed by the Wilcoxon rank-sum test, Fisher exact test, or the Kruskal-Wallis test, as appropriate. Time-to-event data were estimated by the method of Kaplan and Meier, and differences between groups were assessed using the log-rank test. Unadjusted and adjusted Cox modeling was performed to assess the impact of the presence of a subclonal driver on clinical outcome measures alone and in the presence of clinical features known to impact outcome, such as IGHV status, cytogenetics, and mutation identity. A chi-square test with 1 degree of freedom and the −2 Log-likelihood statistic were used to test the prognostic independence of subclonal status in Cox modeling.
Human Samples.
Heparinized blood, skin biopsies and saliva were obtained from patients enrolled on clinical research protocols at the Dana-Farber Harvard Cancer Center (DFHCC) approved by the DFHCC Human Subjects Protection Committee. The diagnosis of CLL according to WHO criteria was confirmed in all cases by flow cytometry, or by lymph node or bone marrow biopsy. Peripheral blood mononuclear cells (PBMC) from normal donors and patients were isolated by Ficoll/Hypaque density gradient centrifugation. Mononuclear cells were used fresh or cryopreserved with FBS 10% DMSO and stored in vapour-phase liquid nitrogen until the time of analysis. Primary skin fibroblast lines were generated from skin punch biopsies as previously described (Wang et al., 2011). The patients included in the cohort represent the broad clinical spectrum of CLL (data not shown).
Established CLL Prognostic Factor Analysis.
Immunoglobulin heavy-chain variable (IGHV) homology (“unmutated was defined as greater than or equal to 98% homology to the closest germline match) and ZAP-70 expression (high risk defined as >20% positive) were determined (Rassenti et al., 2008). Cytogenetics were evaluated by FISH for the most common CLL abnormalities (del(13q), trisomy 12, del(11q), del(17p), rearrangements of chromosome 14) (all probes from Vysis, Des Plaines, Ill., performed at the Brigham and Women's Hospital Cytogenetics Laboratory, Boston Mass.). Samples were scored positive for a chromosomal aberration based on consensus cytogenetic scoring (Smoley et al., 2010).
DNA Quality Control.
We used standard Broad Institute protocols as recently described (Berger et al., 2011; Chapman et al., 2011). Tumor and normal DNA concentration were measured using PicoGreen® dsDNA Quantitation Reagent (Invitrogen, Carlsbad, Calif.). A minimum DNA concentration of 60 ng/μl was required for sequencing. In select cases where concentration was <60 ng/μl, ethanol precipitation and re-suspension was performed. Gel electrophoresis confirmed that the large majority of DNA was high molecular weight. All Illumina sequencing libraries were created with the native DNA. The identities of all tumor and normal DNA samples (native and WGA product) were confirmed by mass spectrometric fingerprint genotyping of 24 common SNPs (Sequenom, San Diego, Calif.).
Whole-Exome DNA Sequencing.
Informed consent on DFCI IRB-approved protocols for whole exome sequencing of patients' samples was obtained prior to the initiation of sequencing studies. DNA was extracted from blood or marrow-derived lymphocytes (tumor) and saliva, fibroblasts or granulocytes (normal), as previously described (Wang et al., 2011). Libraries for whole exome (WE) sequencing were constructed and sequenced on either an Illumina HiSeq 2000 or Illumina GA-IIX using 76 bp paired-end reads. Details of whole exome library construction have been detailed elsewhere (Fisher et al., 2011). Standard quality control metrics, including error rates, percentage passing filter reads, and total Gb produced, were used to characterize process performance before 15 downstream analysis. Average exome coverage depth was 132×/146× for tumor/germline. The Illumina pipeline generates data files (BAM files) that contain the reads together with quality parameters. Of the 160 CLL samples reported in the current manuscript, 82 were included in a previous study (Wang et al., 2011). 340 CLL and germline samples were sequenced overall. These include 160 CLL and matched germline DNA samples as well as timepoint 2 samples for 17 of 160 CLLs, and an additional sample pair and germline for a longitudinal sample pair not included in the 160 cohort (CLL020).
Identification of Somatic Mutations.
Output from Illumina software was processed by the “Picard” data processing pipeline to yield BAM files containing aligned reads (via MAQ, to the NCBI Human Reference Genome Build hg18) with well-calibrated quality scores (Chapman et al., 2011; DePristo et al., 2011). For 51 of the 160 CLL samples included in the analysis, sequencing was performed on capture libraries generated from whole genome amplified (WGA) samples. For those samples, 100 ng inputs of samples were whole genome amplified with the Qiagen REPLI-g Midi Kit (Valencia, Calif.). From the sequencing data, somatic alterations were identified using a set of tools within the “Firehose” pipeline, developed at The Broad Institute, Inc. and available at its website. The details of our sequencing data processing have been described elsewhere (Berger et al., 2011; Chapman et al., 2011). Somatic single nucleotide variations (sSNVs) were detected using MuTect; somatic small insertions and deletions (indels) were detected using Indelocator. All mutations identified in longitudinal samples were confirmed by manual inspection of the sequencing data (Robinson et al., 2011). An estimated contamination threshold of 5% was used for all samples based on the highest contamination values seen in a formal contamination analysis done with ContEst based on matched SNP arrays (Cibulskis et al., 2011). Ig loci mutations were not included in this analysis. Somatic mutations detected in the 160 CLL samples were compiled (data not shown). WES data is deposited in dbGaP (phs000435.v1.p1).
Significance Analysis for Recurrently Mutated Genes.
The prioritization of somatic mutations in terms of conferring selective advantage was done with the statistical method MutSig2.0 (Lohr et al., 2012). In short, the algorithm takes an aggregated list of mutations and tries to detect genes that are affected more than expected by chance, as those likely reflect positive selection (i.e., driver events). There are two main components to MutSig2.0:
The first component attempts to model the background mutation rate for each gene, while taking into account various different factors. Namely, it takes into account the fact that the background mutation rate may vary depending on the base context and base change of the mutation, as well as the fact that the background rate of a gene can also vary across different patients. Given these factors and the background model, it uses convolutions of binomial distributions to calculate a P value, which represents the probability that we obtain the observed configuration of mutations, or a more significant one.
The second component of the algorithm focuses on the positional configuration of mutations and their sequence conservation (Lohr et al., 2012). For each gene, the algorithm permutes the mutations preserving their tri-nucleotide context, and for each permutation calculates two metrics: one that measures the degree of clustering into hotspots along the coding length of the gene, and one that measures the average conservation of mutations in the gene. These two null models are then combined into a joint distribution, which is used to calculate a P value that reflects the probability by chance that we can obtain by chance the observed mutational degree of clustering and conservation, or a more significant outcome.
The two P values that are produced by the two components are then combined using Fisher-Combine (Fisher, 1932) which yields a final P value which is used to sort the genes by degree of mutational significance. This is subsequently corrected for multihypothesis using the Benjamini Hochberg procedure.
Genome-Wide Copy Number Analysis.
Genome-wide copy number profiles of 111 CLL samples and their patient-matched germline DNA were obtained using the Genome-wide Human SNP Array 6.0 (Affymetrix), according to the manufacturer's protocol (Genetic Analysis Platform, The Broad Institute, Inc. Cambridge, Mass.). SNP array data were deposited in dbGaP (phs000435.v1.p1). Allele-specific analysis also allowed for the identification of copy neutral LOH events as well as quantification of the homologous copy-ratios (HSCSs) [HAPSEG (Carter, 2011)]. Significant recurrent chromosomal abnormalities were identified using the GISTIC2.0 algorithm ((Mermel et al., 2011), v87). Regions with germline copy number variants were excluded from the analysis.
For CLL samples with no available SNP arrays (38/160), sCNAs were estimated directly from the WES data, based on the ratio of CLL sample read-depth to the average readdepth observed in normal samples for that region. 11/160 samples were excluded from this analysis due to inability to obtain copy number information from the WES data. See
Validation Deep Sequencing.
Validation targeted resequencing of 256 selected somatic mutations sSNVs was performed using microfluidic PCR. Target specific primers with Fluidigm-compatible tails were designed to flank sites of interest and produce amplicons of 200+/−20 bp. Molecular barcoded, Illumina-compatible oligonucleotides, containing sequences complementary to the primer tails were added to the Fluidigm Access Array chip (San Francisco, Calif.) in the same well as the genomic DNA samples (20-50 ng of input) such that all amplicons for a given genomic sample shared the same index, and PCR was performed according to the manufacturer's recommendations. Indexed libraries were recovered for each sample in a single collection well on the Fluidigm chip, quantified using picogreen and then normalized for uniformity across libraries. Resulting normalized libraries were loaded on a MiSeq instrument (Illumina) and sequenced using paired end 150 bp sequencing reads. 95.2% of called sSNVs were detected in the validation experiment (data not shown). For 91.8% of the mutations, the allelic fraction estimates were concordant (with the discordant events enriched in sites of lower WES coverage). RNA sequencing (dUTP Library Construction). 5 μg of total RNA was poly-A selected using oligo-dT beads to extract the desired mRNA. The purified mRNA is treated with DNAse, and cleaned up using SPRI (Solid Phase Reversible Immobilization) beads according to the manufacturers' protocol. Selected Poly-A RNA was then fragmented into ˜450 bp fragments in an acetate buffer at high heat. Fragmented RNA was cleaned with SPRI and primed with random hexamers before first strand cDNA synthesis. The first strand was reverse transcribed off the RNA template in the presence of Actinomycin D to prevent hairpinning and purified using SPRI beads. The RNA in the RNA-DNA complex was then digested using RNase H. The second strand was next synthesized with a dNTP mixture in which dTTPs had been replaced with dUTPs. After another SPRI bead purification, the resultant cDNA was processed using Illumina library construction according to manufacturers protocol (end repair, phosphorylation, adenylation, and adaptor ligation with indexed adaptors). SPRI-based size selection was performed to remove adapter dimers present in the newly constructed cDNA library. Libraries were then treated with Uracil-Specific Excision Reagent (USER) to nick the second strand at every incorporated Uracil (dUTP). Subsequently, libraries were enriched with 8 cycles of PCR using the entire volume of sample as template. After enrichment, the library is quantified using pico green, and the fragment size is measured using the Agilent Bioanalyzer according to manufactures protocol. Samples were pooled and sequenced using either 76 or 101 bp paired end reads.
RNASeq Data Analysis.
RNAseq BAMs were aligned to the hg18 genome using the TopHat suite. Each somatic base substitution detected by WES was compared to reads at the same location in RNAseq. Based on the number of alternate and reference reads, a power calculation was obtained with beta-binomial distribution (power threshold used was greater than 80%). Mutation calls were deemed validated if 2 or greater alternate allele reads were observed in RNA-Seq at the site, as long as RNAseq was powered to detect an event at the specified location.
FACS Validation of Ploidy Estimates with ABSOLUTE.
Consistent with published studies of CLL (Brown et al., 2012; Edelmann et al., 2012), ABSOLUTE measured all CLL samples to be near diploid (data not shown; median −2, range 1.95-2.1). We confirmed the measurements using a standard assay for measuring DNA content. For this analysis, peripheral blood mononuclear cells from normal volunteers and CLL patients and cell lines are first stained with anti-CD5 FITC and anti-CD19 PE antibodies in a PBS buffer containing 1% BSA for 30 minutes on ice. After extensive washes, the cells were then stained with a PBS buffer contained 1% BSA, 0.03% saponin (Sigma) and 250 ug/m17-AAD (Invitrogen) for 1 hour on ice, followed by analysis on a Beckman Coulter FC500 machine (
Estimation of Mutation Cancer Cell Fraction Using ABSOLUTE.
We used the ABSOLUTE algorithm to calculate the purity, ploidy, and absolute DNA copy-numbers of each sample (Carter et al., 2012). Modifications were made to the algorithm, which are implemented in version 1.05 of the software, available for download at The Broad Institute, Inc. website. Specifically, we added to the ability to determine sample purity from sSNVs alone, in samples where no sCNAs are present (the ploidy of such samples is 2N). In addition, estimates of sample purity and absolute copy-numbers are used to compute distributions over cancer cell fraction (CCF) values of each sSNV, as described (Experimental Procedures), and for sCNAs (described below). The current implementation of ABSOLUTE does not automatically correct for sCNA subclonality when computing CCF distributions of sSNVs (this is an area of ongoing development). Fortunately, the few sCNAs that occurred in our CLL samples were predominantly clonal. Manual corrections were made for CLL driver sSNVs occurring at site of subclonal sCNAs (5 TP53 sSNVs and 1 ATM sSNV), based on the sample purity, allelic fraction and the copy ratio of the matching sCNA.
Each sSNV was classified as clonal or subclonal based on the probability that the CCF exceeded 0.95. A probability threshold of 0.5 was used throughout the manuscript. However, as the histogram in
One of the recurrent CLL cancer genes, NOTCH1, had 15 mutations, 14 of which were the identical canonical 2 base-pair deletions. Unlike sSNVs, the observed allelic fractions of indels events were not modeled as binomial sampling of reference and alternate sequence reads according to their true concentration in the sample (Carter et al., 2012). This was due to biases affecting the alignment of the short sequencing reads, which generally favor reference over alternate alleles. To measure the magnitude of this effect, we examined the allelic fraction (AF) of 514 germline 2 bp deletions called in 4 normal germline WES samples. We observed that the distribution (data not shown) of allelic-fractions for heterozygous events was peaked at 0.41, as opposed to the expected mode of 0.5, with nearly all AFs between 0.3 to 0.6. Therefore, the bias factor towards reference is peaked at 0.82 but may range from 0.6 to 1 (unlikely to be greater than 1). CCF distributions for the 14 somatic indels in NOTCH1 were calculated using bias factors of 1.0 (no bias), 0.82 (bias point-estimate), and 0.6 (worst case observed). Reassuringly, the classification of NOTCH1 indels as clonal or subclonal was highly robust and was essentially the same using the three values—only a single case (CLL155) was ambiguous and was classified as subclonal using 1.0 and 0.82, and clonal using 0.6. Taking a conservative approach, not classifying a mutation as sub-clonal unless there is clear evidence for it, we decided to call this event as clonal for downstream analysis.
Estimation of CCF values for subclonal sCNAs is implemented (ABSOLUTEv1.05) in a manner analogous to the procedure for sSNVs (Experimental Procedures), although the transformation is more complex, due to the need for assumptions of the subclonal structure and the error model of microarray based copy-number data. Segmental sCNAs are defined as subclonal based on the mixture model used in ABSOLUTE (Carter et al., 2012). Let the functions hx and h′x denote a variance stabilizing transformation and its derivative, respectively. For SNP microarray data, these are defined as:
The values σε, and ση denote additive and multiplicative noise scales, respectively, for the microarray hybridization being analyzed; these are estimated by HAPSEG (Carter et al., 2011). The calibrated probe-level microarray data become approximately normal under this transformation, which is used by HAPSEG to estimate the segmental allelic copy-ratios ri and the posterior standard deviation of their mean (under the transformation), σi (Carter, 2011). An additional parameter σH is estimated by ABSOLUTE (Carter et al., 2012), which represents additional sample-level variance corresponding to regional biases not captured in the probe-level model. For a subclonal segment i, let qc denote the absolute copy number in the unaffected cells, and qs denote the absolute copy number in the altered cells. Both of these values are unknown but we used a simplifying assumption that the difference between qc and qs is one copy with qc being closer to the modal copy-number. Therefore, for subclonal deletions (copy ratios below the ratio of modal copy number), qs was set to the nearest copy number below the measured value, and qc=qs+1. For subclonal gains (ratios above the modal number), qs was set to the nearest copy number above the measured value, and qc=qs−1. Because the CLL genomes analyzed here were universally near diploid, this was nearly equivalent to assuming that subclonal deletions had qs=0 in the affected cells and gains qs=2, with qc=1 in both cases (in allelic units). However, we note that these assumptions would not be strictly correct in genomes after doubling, or in cases of high level amplification. In these cases, calculation of posterior CCF distributions will require integration over qs and qc, averaging over the set of plausible subclonal genomic configurations.
Let rc and rs be the theoretical copy ratio values corresponding to qc and qs (accounting for sample purity, ploidy, and the modeled attenuation rate of the microarray (Carter et al., 2011; Carter et al., 2012)). Let d=rs−rc, then, for CCF c, let rx c=dc+rc. Then P(c)∝(hrx(c))|h(ri), (σi+σH)2)h′(rx(c)). The distribution over CCF is obtained by calculating these values over a regular grid of 100 c values and normalizing. We note that, when copy numbers are estimated directly from sequencing data, the calculation is simpler, as there is no attenuation effect and h x=x. These calculations were used to generate the 95% confidence intervals on the CCF of subclonal driver sCNAs shown in
Cancer Gene Census List and Conservation Annotations.
Conservation of a specific mutated site was adapted from UCSC conservation score track. A scale of 0-100 was linearly converted from the −6 to 6 scale used in the phastCons track (Siepel et al., 2005). To confirm that driver mutations are more likely to occur in conserved sites, we quantified the conservation in the COSMIC database (Forbes et al., 2008) hotspots and compared it to non-COSMIC hotspots coding location. We matched conservation information for 5085 sites that had greater than 3 exact hits reported in mutations deposited in the COSMIC database, and compared it to conservation found for a set of non-overlapping 5085 randomly sampled coding sites. The conservation was higher in the COSMIC sites than in the non-COSMIC coding sites set (mean conservation 82.39 and 62.15, respectively, p<1e-50). We noted that the distribution of events was not uniform, and nearly one half of COSMIC hotspots had a conservation measure greater than 95 (49.65%, compared to 15.5% in the non-COSMIC set, p<1e-50). For our calculations, we used a cut off of >95 to designate conserved sites likely to contain higher proportion of cancer drivers. We complemented the analysis for putative driver event enrichment by matching the altered genes to the Cancer Gene Census (Futreal et al., 2004).
Clustering Analysis of sSNVs in 18 CLL Sample Pairs.
In order to better resolve the true cancer cell fraction (CCF) of sSNVs detected in longitudinal samples, we employed a previously described Bayesian clustering procedure (Escobar and West, 1995). This approach exploits the assumption that the observed subclonal sSNV CCF values were sampled from a smaller number of subclonal cell populations (subclones). All remaining uncertainty (including the exact number of clusters) was integrated out using a mixture of Dirichlet processes, which was fit using a Gibbs sampling approach, building on a previously described framework (Escobar and West, 1995).
The inputs to this procedure are the posterior CCF distributions for each sSNV being considered. We note that the CCF distributions for sCNAs could be added into the model, however we did not attempt this in the present study. CCF distributions are represented as 100-bin histograms over the unit interval; the two-dimensional CCF distributions used for the 2D clustering of longitudinal samples were obtained as the outer product of the matched histogram pairs for each mutation, resulting in 10,000-bin histograms (
At each iteration of the Gibbs sampler, each mutation is assigned to a unique cluster and the posterior CCF distribution of each cluster is computed using Bayes' rule, as opposed to drawing a sample from the posterior (a uniform prior on CCF from 0.01 to 1 is used). When considering the probability of a mutation to join an existing cluster, the likelihood calculation of the mutation arising from the cluster is integrated over the uncertainty in the cluster CCF. This allows for rapid convergence of the Gibbs sampler to its stationary distribution, which was typically obtained in fewer than 100 iterations for the analysis presented in this study. We ran the Gibbs sampler for 1,000 iterations, of which the first 500 were discarded before summarization. Because of the small number of clonal mutations in some WES samples, we make an additional modification to the standard Dirichlet process model by adding a fixed clonal cluster that persists even if no mutation is assigned to it. This reflects our prior knowledge that clonal mutations must exist, even if they are the minority of detected mutations. For the samples analyzed here, this modification had very little effect. A key aspect of implementing the Dirichlet process model on WES datasets is reparameterization of prior distributions on the number of subclones k as priors on the concentration parameter α of the Dirichlet process model. Importantly, this must take into account the number of mutations N input to the model, as the effect of α on k is strongly dependent on N (Escobar and West, 1995). We accomplish this by constructing a map from a regular grid over α to expected values of k, given N, using the fact that:
(Antoniak, 1974), where the cN(k) factors correspond to the unsigned Stirling numbers of the first kind. With this map in hand, we perform an optimization procedure to find parameters a and b of a prior Gamma distribution over α resulting in the minimal Kullback-Leibler divergence with the specified prior over k (the divergence was computed numerically on the histograms). Once the prior over α has been represented as a Gamma distribution, learning about α (and therefore k) from the data can be directly incorporated into the Gibbs sampling procedure, resulting in a continuous mixture of Dirichlet processes (Escobar and West, 1995). This allows consistent parameterization of prior knowledge (or lack thereof) on the number of subclonal populations in the face of vastly different numbers of input mutations, which is necessary for making consistent inferences across differing datasets (e.g. WES vs. WGS). We note that taking uncertainty about α into account is necessary for inferences on the number of subclonal populations to be strictly valid, since implementations with fixed values of α result in an implicit prior over k that depends upon N (this is especially important for smaller values of N). For the application presented in this study (
Upon termination of the Gibbs sampler, we summarized the posterior probability over the CCF of each sSNV by averaging the posterior cluster distribution for all clusters to which the sSNV was assigned during sampling. This allowed shrinkage of the CCF probability distributions (as shown in
Gene Expression Profiling.
Total RNA was isolated from viably frozen PBMCs or B cells from CLL patients that were followed longitudinally (Midi kit; Qiagen, Valencia Calif.), and hybridized to the U133Plus 2.0 array (Affymetrix, Santa Cruz, Calif.) at the DFCI Microarray Core Facility. All expression profiles were processed using RMA, implemented by the PreprocessDataset module in GenePattern available at The Broad Institute, Inc. website (Irizarry et al., 2003; Reich et al., 2006). Probes were collapsed to unique genes by selecting the probe with the maximal average expression for each gene. Batch effects were further removed using the ComBat module in GenePattern (Johnson et al., 2007) (Reich et al., 2006). Visualizations in GENE-E, available at The Broad Institute, Inc. website, were based on logarithmic transformation (log 2) of the data and centering each gene (zero mean). These data can be accessed at NCBI website with accession number GSE37168.
RNA Pyrosequencing for Mutation Confirmation.
Quantitative targeted sequencing to detect somatic mutation within cDNA was performed, as previously described (Armistead et al., 2008). In brief, biotinylated amplicons generated from PCR of the regions of transcript surrounding the mutation of interest were generated. Immobilized biotinylated single-stranded DNA fragments were isolated per manufacturer's protocol, and sequencing undertaken using an automated pyrosequencing instrument (PSQ96; Qiagen, Valencia Calif.), followed by quantitative analysis using Pyrosequencing software (Qiagen).
Statistical Methods.
Statistical analysis was performed with MATLAB (MathWorks, Natick, Mass.), R version 2.11.1 and SAS version 9.2 (SAS Institute, Cary, N.C.). Categorical variables were compared using the Fisher Exact test, and continuous variables were compared using the Student's t-test, Wilcoxon rank sum test, or Kruskal Wallis test as appropriate; the association between two continuous variables was assessed by the Pearson correlation coefficient. The time from the date of sample to first therapy or death (failure-free survival from sample time or FFS_Sample) was calculated as the time from sample to the time of the first treatment after the sample or death and was censored at the date of last contact. FFS_Rx (failure-free survival from first treatment after sampling) was defined as the time to the 2nd treatment or death from the 1st treatment following sampling, was calculated only for those patients who had a 1st treatment after the sample and was censored at the date of last contact for those who had only one treatment after the sample. Time to event data were estimated by the method of Kaplan and Meier, and differences between groups were assessed using the log-rank test. Unadjusted and adjusted Cox modeling was performed to assess the impact of the presence of a subclonal driver and a driver irrespective of the CCF on FFS_Sample and FFS_Rx. A chi-square test with 1 degree of freedom and the −2 Log-likelihood statistic was used to test the prognostic independence of subclonal status in Cox modeling using a full model and one without subclonal status included. We also formally tested for nonproportionality of the hazards in
Large-Scale WES Analysis of CLL Expands the Compendium of CLL Drivers and Pathways.
We performed whole-exome sequencing (WES) (Gnirke et al., 2009) of 160 matched CLL and germline DNA samples (including 82 of the 91 samples previously reported (Wang et al., 2011)). These patients represented the broad spectrum of CLL clinical heterogeneity, and included patients with both low- and high-risk features based on established prognostic risk factors (ZAP70 expression, the degree of somatic hypermutation in the variable region of the immunoglobulin heavy chain (IGHV) gene, and presence of specific cytogenetic abnormalities) (data not shown). We applied MuTect (a highly sensitive and specific mutation-calling algorithm) to the WES data to detect somatic single nucleotide variations (sSNVs) present in as few as 10% of cancer cells. Average sequencing depth of WES across samples was ˜130×. In total, we detected 2,444 nonsynonymous and 837 synonymous mutations in protein-coding sequences, corresponding to a mean (±SD) somatic mutation rate of 0.6±0.28 per megabase (range, 0.03 to 2.3), and an average of 15.3 nonsynonymous mutations per patient (range, 2 to 53) (data not shown).
Expansion of our sample cohort provided us with the sensitivity to detect 20 putative CLL cancer genes (q<0.1), which was accomplished through recurrence analysis using the MutSig2.0 algorithm (Lohr et al., 2012) which detects genes enriched with mutations beyond the background mutation rate (
Together, the 20 candidate CLL driver genes appeared to fall into 7 core signaling pathways, in which the genes play roles. These include all five pathways that we previously reported to play a role in CLL (DNA repair and cell-cycle control, Notch signaling, inflammatory pathways, Wnt signaling, RNA splicing and processing). Two new pathways were implicated by our analysis: B cell receptor signaling and chromatin modification (
Because recurrent chromosomal abnormalities have defined roles in CLL biology (Döhner et al., 2000; Klein et al., 2010), we further searched for loci that were significantly amplified or deleted by analyzing somatic copy-number alterations (sCNAs). We applied GISTIC2.0 (Mermel et al., 2011) to 111 matched tumor and normal samples which were analyzed by SNP6.0 arrays (Brown et al., 2012). Through this analysis, we identified deletions in chromosome 8p, 13q, 11q, and 17p and trisomy of chromosome 12 as significantly recurrent events (
Inference of Genetic Evolution with Whole-Exome Sequencing Data.
In order to study clonal evolution in CLL, we performed integrative analysis of sCNAs and sSNVs using a recently reported algorithm ABSOLUTE (Carter et al., 2012), which jointly estimated the purity of the sample (fraction of cancer nuclei) and the average ploidy of the cancer cells. All samples were estimated to have near-diploid DNA content; these estimates were confirmed by FACS analysis of 7 CLL samples (
Overall, we identified 1,543 clonal mutations (54% of all detected mutations, average of 10.3±5.5 mutations per sample, data not shown). These mutations were likely acquired either before or during the most recent complete selective sweep. This set therefore includes both neutral somatic mutations that preceded transformation and the driver and passenger event(s) present in each complete clonal sweep. A total of 1,266 subclonal sSNVs were detected in 146 of 149 samples called by ABSOLUTE (46%; average of 8.5±5.8 subclonal mutations per sample). These subclonal sSNVs exist in only a fraction of leukemic cells, and hence occurred after the emergence of the “most-recent common ancestor”, and by definition, also after disease initiation. The mutational spectra were similar in clonal and subclonal sSNVs (
Age and Mutated IGHV Status are Associated with an Increased Number of Clonal Somatic Mutations.
The presence of subclones in nearly all CLL samples enabled us to analyze several aspects of leukemia progression. We first addressed how clonal and subclonal mutations relate to the salient clinical characteristics of CLL. CLL is generally a disease of the elderly with established prognostic factors, such as the IGHV mutation (Döhner, 2005) and ZAP70 expression. Patients with a high number of IGHV mutations (mutated IGHV) tend to have better prognosis than those with a low number (unmutated IGHV) (Damle et al., 1999; Lin et al., 2009). This marker may reflect the molecular differences between leukemias originating from B cells that have or have not yet, respectively, undergone the process of somatic hypermutation that occurs as part of normal B cell development. We examined the association of these factors, as well as patient age at diagnosis, with the prevalence of clonal and subclonal mutations. We found that age and mutated IGHV status were associated with greater numbers of clonal (but not subclonal) mutations (age, P<0.001; mutated vs unmutated IGHV, P=0.05;
Cancer therapy has been theorized to be an evolutionary bottleneck, in which a massive reduction in malignant cell numbers results in reduced genetic variation in the cell population (Gerlinger and Swanton, 2010). The overall diversity in CLL may be diminished after therapeutic bottlenecks as well. Because most of the genetic heterogeneity within a cancer is present at very low frequencies (Gerstung et al., 2012)—below the level of detection afforded by the ˜130× sequence coverage we generated—we were unable to directly assess reduction in overall genetic variation.
However, in the range of larger subclones that were observable by our methods, (>10% of malignant cells), we witnessed increased diversity after therapy (
Inferring the Order of Genetic Changes Underlying CLL.
While general aspects of temporal evolution could not be completely resolved in single timepoint WES samples, the order of driver mutation acquisition could be partially inferred from the aggregate frequencies at which they are found to be clonal or subclonal. We considered the 149 samples as a series of “snapshots” taken along a temporal axis. Clonal status in all or most mutations affecting a specific gene or chromosomal lesion would indicate that this alteration was acquired at or prior to the most recent selective sweep before sampling and hence could be defined as a stereotypically early event. Conversely, predominantly subclonal status in a specific genetic alteration implies a likely later event that is tolerated and selected for only in the presence of an additional mutation.
This strategy was used to infer temporal ordering of the recurrent sSNVs and sCNAs (
Direct Observation of Clonal Evolution by Longitudinal Data Analysis of Chemotherapy-Treated CLL.
To directly assess the evolution of somatic mutations in a subset of patients, we compared CCF for each alteration across two clinical timepoints in 18 of the 149 samples (median years between timepoints was 3.5; range 3.1-4.5). Six patients (‘untreated’) did not receive treatment throughout the time of study. The remaining 12 patients (‘treated’) received chemotherapy (primarily fludarabine and/or rituxan-based) in the interval between samples (data not shown). The two patient groups were not significantly different in terms of elapsed time between first and second sample (median 3.7 years for the 6 untreated patients compared to 3.5 years for the 12 treated patients, P=0.62; exact Wilcoxon rank-sum test), nor did it differ between time of diagnosis to first sample (P=0.29).
Analysis of the 18 sets of data revealed that 11% of mutations increased (34 sSNVs, 15 sCNAs), 2% decreased (6 sSNVs, 2 sCNAs) and 87% did not change their CCF over time (q<0.1 for significant change in CCF, data not shown). As shown by our single timepoint analysis, we observed a shift of subclonal driver mutations (e.g., del(11q), SF3B1 and TP53) towards clonality over time. Changes in the genetic composition of CLL cells with clonal evolution were associated with network level changes in gene expression related to emergence of specific subclonal populations (e.g. changes in signatures associated with SF3B1 or NRAS mutation,
Clustering analysis of CCF distributions of individual genetic events over the two timepoints, revealed clear clonal evolution in 11 of 18 CLL sample pairs. We observed clonal evolution in 10 of 12 sample pairs which had undergone intervening treatment between timepoints 1 and 2 (
Presence of Subclonal Drivers Adversely Impacts Clinical Outcome.
We observed treatment-associated clonal evolution to lead to the replacement of the incumbent clone by a fitter pre-existing subclone (
Within the 12 of 18 longitudinally analyzed samples that received intervening treatment, we observed that the 10 samples with clonal evolution exhibited shortened FFS_Rx (log-rank test; P=0.015,
We tested this hypothesis in the set of 149 patient samples, of which subclonal driver mutations were detected in 46% (
Regression models adjusting for multiple CLL prognostic factors (IGHV status, prior therapy and high risk cytogenetics) supported the presence of a subclonal driver as an independent risk factor for earlier retreatment (adjusted hazard ratio (HR) of 3.61 (CI 1.42-9.18), Cox P=0.007; unadjusted HR, 3.20 (CI 1.35-7.60);
The analysis of clonal heterogeneity in CLL provides a glimpse into the past, present and future of a patient's disease. While inter-tumoral (Quesada et al., 2012; Wang et al., 2011) and intra-tumoral (Schuh et al., 2012; Stilgenbauer et al., 2007) genetic heterogeneity had been previously demonstrated in CLL, our use of novel WES-based algorithms enabled a more comprehensive study of clonal evolution in CLL and its impact on clinical outcome. Through the cross-sectional analysis of 149 samples, we derived the number and genetic composition of clonal and subclonal mutations and thus uncovered footprints of the past history of CLL, such as the accumulation of passenger mutations related to age and aberrant somatic hypermutation preceding transformation. Furthermore, we inferred a temporal order of genetic events implicated in CLL. Finally, our combined longitudinal and cross-sectional analyses revealed that knowledge of subclonal mutations can anticipate the genetic composition of the future relapsing leukemia and the rapidity with which it will occur.
We proposed the existence of distinct periods in CLL progression, with unique selection pressures acting at each period. In the first period prior to transformation, passenger events accumulate in the cell that will eventually be the founder of the leukemia (in proportion to the age of the patient;
An important question addressed here is how treatment affects clonal evolution in CLL. In the 18 patients monitored at 2 timepoints, we observed two general patterns—clonal equilibrium in which the relative sizes of each subclone were maintained and clonal evolution in which some subclones emerge as dominant (
CLL is an incurable disease with a prolonged course of remissions and relapses. It has been long recognized that relapsed disease responds increasingly less well to therapy over time. We now show an association between increased clinical aggressiveness and genetic evolution, which has therapeutic implications. We found that the presence of pre-treatment subclonal driver mutations anticipated the dominant genetic composition of the relapsing tumor. Such information may eventually guide the selection of therapies to prevent the expansion of highly fit subclones. In addition, the potential hastening of the evolutionary process with treatment provides a mechanistic justification for the empirical practice of ‘watch and wait’ as the CLL treatment paradigm (CLL Trialists Collaborative Group, 1999). The detection of driver mutations in subclones (a testimony to an active evolutionary process) may thus provide a new prognostic approach in CLL, which can now be rigorously tested in larger clinical trials.
In conclusion, we demonstrate the ability to study tumor heterogeneity and clonal evolution with standard WES (coverage depth of ˜130×). These innovations will allow characterization of the subclonal mutation spectrum in large, publically available datasets (Masica and Karchin, 2011). The implementation described here may also be readily adopted for clinical applications. Even more importantly, our studies underscore the importance of evolutionary development as the engine driving cancer relapse. This new knowledge challenges us to develop novel therapeutic paradigms that not only target specific drivers (i.e., ‘targeted therapy’) but also the evolutionary landscape (Nowak and Sigmund, 2004) of these drivers.
While several embodiments of the present invention have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the functions and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the present invention. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings of the present invention is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, the invention may be practiced otherwise than as specifically described and claimed. The present invention is directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present invention.
All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application No. 61/567,941, filed Dec. 7, 2011, the entire contents of which are incorporated by reference herein.
This invention was made with U.S. Government support under grant number 1RO1HL103532-01 from the NHLBI and grant number 1RO1CA155010-01A1 from the NCI. Accordingly, the U.S. Government has certain rights in this invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/068633 | 12/7/2012 | WO | 00 | 6/4/2014 |
Number | Date | Country | |
---|---|---|---|
61567941 | Dec 2011 | US |