The present application relates to the field of cancer, particularly to colorectal cancer (CRC). A panel of biomarkers is presented herein that can be used to cluster CRC samples into distinct genetic subtypes. It further relates to the use of the clustering method on patients treated with an anti-VEGF therapy and the identification of anti-VEGF responsive genetic subtypes.
Colorectal cancer (CRC) is the third most commonly diagnosed cancer in both men and women and an important contributor to cancer mortality and morbidity. CRC develops through an ordered series of events beginning with the transformation of normal colonic epithelium to an adenomatous intermediate and then ultimately adenocarcinoma, the so-called “adenoma-carcinoma sequence” (Pino and Chung, 2010). It is now generally accepted that multiple genetic events are required for tumor progression and that the temporal acquisition of these genetic changes matters. Recent genome-wide sequencing efforts have calculated as many as 80 mutated genes per colorectal tumor, but a smaller group of mutations (<15) were considered to be the true “drivers” of tumorigenesis (Wood et al 2007; Leary et al 2008). Genomic instability is recognized as an essential cellular feature that accompanies the acquisition of these mutations. In colorectal cancer, at least 3 distinct pathways of genomic instability have been described: the chromosomal instability (CIN), microsatellite instability (MSI), and CpG island methylator phenotype (CIMP) pathways. The CIN pathway underlies the majority of all colorectal cancers. CIN is observed in 65%-70% of sporadic colorectal cancers; the term refers to an accelerated rate of gains or losses of whole or large portions of chromosomes that results in karyotypic variability from cell to cell (Lengauer et al 1998). The consequence of CIN is an imbalance in chromosome number (aneuploidy), sub-chromosomal genomic amplifications, and a high frequency of loss of heterozygosity (LOH).
Although the rich history of investigations and the identification of numerous genetic changes that are causative for CRC development, CRC is still a frequently lethal disease with heterogeneous outcomes and heterogeneous drug responses. To move to personalized medicine and thus to more effective treatment strategies, it would be advantageous to identify clinically relevant and molecularly homogeneous subtyping of CRC tumors. However subclassification perse, even when built on what are believed to be relevant features of cancer cells (such as expression of cancer pathway components or driver gene mutations), may still not be predictive of differential drug responses. This can be due to the drugs themselves, with promiscuous mechanisms of action that may not track well with single pathway descriptors, or to our inability to properly define pathway engagement or cross-talk using static ‘omics’ data. Recently a consensus gene expression-based subtyping classification system for CRC was identified by the CRC Subtyping Consortium (Guinney et al 2015). The published gene expression-based subtyping classification makes use of six independent classification systems to categorize CRC samples into one of the four consensus molecular subtypes (CMS). However, this classification can only sort 87% of the CRC samples. Still 13% of the samples do not fall within one of the four CMS groups and should be considered separately as indeterminate subtypes, of yet unknown biological and clinical behavior. Moreover none of the currently available gene expression based CRC sub-classification methods is predictive of one or more differential drug responses. To solve this problem we developed a new DNA sub-classification method based on copy number alterations (CNA) of specific DNA regions. The use of CNAs to classify cancer has been shown previously for e.g. non-small lung cancer (Li et al 2014), melanoma (WO2010/051319) and colorectal cancer (WO2010/051318). However, with the method described in this application we are not only using distinct DNA regions but we were also able to classify 100% of all tested metastatic CRC (mCRC) tumor samples in one of three different subgroups. Moreover the subgroups defined by this new DNA-based classification method are related with the patients' response to Avastin therapy. Avastin or bevacizumab is a frequently used anti-VEGF antibody for treating cancer (Ferrara et al 2004).
Using copy number aberrations of specific DNA regions in a mCRC sample and subsequent unsupervised clustering we were able to classify mCRC tumors in 3 different subgroups. These subgroups are related with the patients' response to chemotherapy and outcome. Tumors that are classified in clusters 2 and 3 show additional benefit from Avastin treatment when compared to patients from the same clusters that received chemotherapy only. Hypermutator phenotypes, such as tumors with POLE or POLD1 mutations or micro-satellite instable tumors show no additional benefit from Avastin treatment. Copy number instability of specifically selected DNA regions is thus a biomarker for Avastin response. Tumors with a high proportion of the genome affected by CNAs have a significantly better response when treated with Avastin compared to copy number stable tumors.
It is an object of the invention to provide a colorectal cancer biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1.
Another aspect of the invention is the use of said biomarker panel to determine the copy number alteration status of a colorectal cancer sample. The biomarker panel can also be used to determine the copy number instability of a colorectal cancer sample. According to particular embodiments, the biomarker panel comprising at least 5 genomic DNA regions or fragments thereof listed in Table 1, is used to cluster colorectal cancer samples in 3 distinct genetic subtypes wherein said subtypes are characterized by the copy number alteration specifications depicted in Tables 2, 3 and 4. According to particular embodiments, the said biomarker panel of the invention is used to predict the responsiveness of a colorectal cancer patient to anti-VEGF therapy.
According to another aspect a method is provided for determining the genetic subtype of a colorectal cancer sample, comprising determining the copy number alteration status of a colorectal cancer sample of a colorectal cancer patient using a biomarker panel, comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1; and classifying said colorectal cancer sample in one of 3 distinct genetic subtypes wherein said subtypes are characterized by the copy number alteration specifications depicted in Tables 2, 3 and 4. According to particular embodiments, said method for determining the genetic subtype of a colorectal cancer sample can also be used to identify a patient responsive to anti-VEGF therapy, wherein classification of said patient in genetic subtypes 2 or 3 respectively depicted in Table 3 or 4 is indicative for said patient to be responsive to anti-VEGF therapy.
According to another aspect, a method is provided for the identification of a patient responsive to anti-VEGF therapy comprising determining the copy number instability of a CRC sample of a CRC patient using the biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1, wherein a copy number instability of 15% or more is indicative for said patient to be responsive to anti-VEGF therapy.
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps. Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated. Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein. The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Plainsview, N.Y. (2012); and Ausubel et al., current Protocols in Molecular Biology (Supplement 100), John Wiley & Sons, New York (2012), for definitions and terms of the art. The definitions provided herein should not be construed to have a scope less than understood by a person of ordinary skill in the art.
In a first aspect, the invention relates to a colorectal cancer biomarker panel for determining the copy number alteration status of a colorectal cancer sample, comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1. It also relates to a colorectal cancer biomarker panel for determining the copy number instability of a CRC sample comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1. In a particular embodiment, said biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Table 5. In a more particular embodiment, said biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Table 6. In an even more particular embodiment, said biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Table 10.
The term “colorectal cancer biomarker panel” as used herein, and from hereon also referred to as “biomarker panel”, means a limited list of genomic DNA regions which can be used to determine the copy number alteration status or the copy number instability of a CRC sample. Importantly, although all genomic DNA regions listed in Table 1 are valuable and can be used as markers to evaluate copy number instability of CRC samples, it does not imply that all regions are needed to classify a CRC sample as copy number stable or unstable. Depending on the classification method used and the % accuracy the practitioner aims for, subselections of the listed genomic DNA regions can be used. In this application, Applicant teaches that a selection of 5 from the 180 genomic DNA regions listed in Table 1 can be enough to cluster CRC samples and thus to determine whether a CRC sample is copy number stable or instable with a high accuracy. Accordingly, CRC biomarker panel comprising “at least 5 genomic DNA regions” is envisaged in the embodiments described above. In alternative embodiments, the colorectal cancer biomarker panel of the application comprises at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12 or at least 20 genomic DNA regions or fragments thereof selected from Table 1 or from Table 5 or from Table 6 or from Table 10. In other alternative embodiments, a colorectal cancer biomarker panel is provided comprising at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12 or at least 20 genomic DNA regions or fragments thereof selected from the genomic DNA regions listed in Table 6 and Table 7 or from the genomic DNA regions listed in Table 10 and Table 11.
The term “genomic DNA region” as used herein means a DNA sequence that is part of the genome of a cell or organism, as distinguished from extrachromosomal DNA, such as plasmids. The genomic DNA regions which are used within the scope of this invention are listed in Table 1. For every genomic DNA region within the scope of this application, Table 1 and Table 5 show the “wide peak limits” and the “peak limits”. The “wide peak limits” indicate the full sequence which contains the most comprehensive information. However, the invention also relates to smaller fragments of the listed genomic DNA regions. The “peak limits” for example indicate a subregion within the “wide peak limits” which contains the most condensed information to determine the copy number alteration status or copy number instability. Even smaller fragments within the “peak limits” can be used to cluster CRC samples according to the methods of the invention or can be used to determine the genetic subtype of CRC samples according to the methods of the invention or can be used to determine copy number instability of CRC samples according to the methods of the invention. Thus also smaller fragments which are part of the listed genomic DNA regions of Table 1 or Table 5 and which for example can be amplified by PCR or detected with probes related to DNA detection techniques (e.g. Southern blot) fall within the scope of the invention. Hence, the “biomarker panel” has to be read as comprising at least 5 genomic DNA regions listed in Table 1 or Table 5 or fragments thereof. Said fragments are thus smaller DNA fragments that are part of said genomic DNA regions. However, the said fragments of the at least 5 genomic DNA regions still need to have predictive power to determine the CNA status of a CRC sample, to determine the CIN of a CRC sample, to be useful to cluster CRC samples into the 3 genetic subtypes of the invention, to predict the responsiveness of a CRC patient to anti-VEGF therapy, to determine the genetic subtypes of the invention and/or to identify patients responsive to anti-VEGF therapy.
The term “copy number alterations” or “copy number aberrations”, both abbreviated as CNAs and interchangeably used in this application, are changes in copy number of specific DNA regions whereby the changes have arisen in somatic tissue, for example, only in a tumor. These changes can be amplifications or deletions. The “copy number alteration status” is thus the level or number of changes in copy number of a predefined list of DNA regions. For example, tumors can be categorized in CNA-high tumors and CNA-low tumors.
The relative number of regions affected by CNAs can be seen as a measure for copy number instability. Using different thresholds to define tumors as copy number unstable and stratify the patients accordingly we were able to observe beneficial responses to Avastin treatment for tumor instabilities ranging from 10% to 40% of regions affected by CNAs. We performed this analysis on 6 different subsets (1) using only the 102 focal regions, (2) using the top 50 ranked regions from the random forest classification model built with the 102 focal regions, (3) using the tier 1 and tier 2 regions from the recursive partitioning applied on the 102 focal regions, (4) using all 180 genomic regions (5) using the tier 1 and tier 2 regions from the recursive partitioning applied all 180 regions and (6) using the top 50 ranked regions from the random forest classification model built with the 180 focal regions.
In this application, CNA-high tumors are defined as tumors in which preferably 10% or more, more preferably 15% or more of the DNA region consisting of the biomarker panel used for the analysis (i.e. the genomic regions selected from Table 1 or Table 5 or fragments thereof that were used to determine the CNAs) is affected by CNAs, more preferably in which 20% or more of the DNA region consisting of the biomarker panel used for the analysis (i.e. the genomic regions selected from Table 1 or Table 5 or fragments thereof that were used to determine the CNAs) is affected by CNAs, and most preferably in which 26% or more of the DNA region consisting of the biomarker panel used for the analysis (i.e. the genomic regions selected from Table 1 or Table 5 or fragments thereof that were used to determine the CNAs) is affected by CNAs. For example, if 6 of the 180 genomic regions listed in Table 1 or Table 5 or fragments thereof are used to classify a CRC sample into one of the three genetic subtypes, then “x % or more of the DNA sequence consisting of the biomarker panel used for the analysis” means x % or more of the DNA sequence consisting of the 6 used genomic regions or fragments thereof. Similarly, if 10 of the 180 genomic regions listed in Table 1 or Table 5 or fragments thereof are used to classify a CRC sample into one of the three genetic subtypes, then “x % or more of the DNA sequence consisting of the biomarker panel used for the analysis” means x % or more of the DNA sequence consisting of the 10 used genomic regions or fragments thereof. A CNA-high tumor is thus copy number instable and therefore also referred to as “copy number instability high tumor” or CIN-high tumor. CNA-low tumors as used herein are tumors in which less than 15% of the DNA sequence consisting of the biomarker panel used for the analysis (i.e. the genomic regions selected from Table 1 or Table 5 or fragments thereof that were used to determine the CNAs) is affected by CNAs. Thus, if 6 of the 180 genomic regions listed in Table 1 or Table 5 or fragments thereof are used to classify a CRC sample into one of the three genetic subtypes, than less than 15% of the DNA sequence consisting of the 6 used genomic regions or fragments thereof is affected by CNA's to categorize the tumor as CNA-low. In alternative embodiments, CNA-low tumors as used herein are tumors in which less than 10% of the DNA sequence consisting of the biomarker panel used for the analysis (i.e. the genomic regions selected from Table 1 or Table 5 or fragments thereof that were used to determine the CNAs) is affected by CNAs. A CNA-low tumor has thus a low copy number instability and therefore also referred to as “copy number instability low tumor” or CIN-low tumor. Copy number alterations or copy number aberrations are not the same as copy number variations (CNVs). CNVs originate from changes in copy number in germline cells (and are thus in all cells of the organism).
The term “colorectal cancer” as used herein is meant to include malignant neoplasms of colon (C18 in ICD-10), malignant neoplasms of rectosigmoid junction (C19 in ICD-10), malignant neoplasms of rectum (C20 in ICD-10) and malignant neoplasms of anus and anal canal (C21 in ICD-10). A “colorectal cancer sample” refers to a biological sample of a “colorectal cancer patient”. A “colorectal cancer patient” refers to a living subject diagnosed with colorectal cancer or suspected to have colorectal cancer. In case that colorectal cancer is diagnosed with a living subject, a CRC sample comprises at least one colorectal cancer cell.
The 180 genomic DNA regions or fragments thereof which are listed in Table 1 or the 102 genomic DNA regions of fragments thereof which are listed in Table 5 are DNA regions which can be used to evaluate the copy number alteration status of a CRC sample and thus whether a colorectal cancer sample is copy number stable or instable or which can be used to cluster CRC samples (explained below). Although all 180 genomic DNA regions or fragments thereof are all informative and as valuable, some have a larger impact on the outcome of the analysis. In first instance, the impact of deletions or amplifications of specific genomic DNA regions on the evaluation of the copy number instability or of the copy number alteration status of a CRC sample depends on the classification method used. In current application, Applicant has confirmed the relevance of all 180 genomic DNA regions listed in Table 1 and of all 102 genomic DNA regions listed in Table 5 with three different methods, i.e. using regression trees, using the random forest classification and using the K-nearest neighbour classification (see Example 7). Another reason for the genomic DNA region dependent impact, is that some mutations affecting the copy number of specific genomic DNA regions occur early in colorectal tumor development, while other mutations occur at a later stage. However, the observation that copy number alterations of some genomic DNA regions have more or less impact does not mean that selecting genomic DNA regions with less impact is not useful to cluster CRC samples. All the genomic DNA regions listed in Table 1 are as valuable. The impact of selecting DNA regions with less strength might be that more of these regions will have to be used in the analysis to achieve the same level of accuracy.
Depending on the analysis method, different subselections can be made from the 180 genomic DNA regions or fragments thereof listed in Table 1 or from the 102 genomic DNA regions or fragments thereof listed in Table 5.
Therefore, in a particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6, 7, 8 and/or 9, wherein at least 2 genomic DNA regions or fragments thereof are selected from Table 6. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6, 7, 8 and/or 9, wherein at least 3, at least 4 or at least 5 genomic DNA regions or fragments thereof are selected from Table 6. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions or fragments thereof selected from Tables 6, 7, 8 and/or 9, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions or fragments thereof are selected from Table 6.
In another particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6, 7 and/or 8, wherein at least 2 genomic DNA regions or fragments thereof are selected from Table 6. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6, 7 and/or 8, wherein at least 3, at least 4 or at least 5 genomic DNA regions or fragments thereof are selected from Table 6. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions or fragments thereof selected from Tables 6, 7 and/or 8, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions or fragments thereof are selected from Table 6.
In another particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6 and/or 7, wherein at least 2 genomic DNA regions or fragments thereof are selected from Table 6. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6 and/or 7, wherein at least 3, at least 4 or at least 5 genomic DNA regions or fragments thereof are selected from Table 6. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions or fragments thereof selected from Tables 6 and/or 7, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions or fragments thereof are selected from Table 6.
In another particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 6 and/or 7. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6, at least 7, at least 8, at least 9, at least 10, at least 11 or at least 12 genomic DNA regions or fragments thereof selected from Tables 6 and/or 7.
In yet another particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 10, 11 and/or 12, wherein at least 2 genomic DNA regions or fragments thereof are selected from Table 10. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 10, 11 and/or 12, wherein at least 3, at least 4 or at least 5 genomic DNA regions or fragments thereof are selected from Table 10. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions or fragments thereof selected from Tables 10, 11 and/or 12, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions or fragments thereof are selected from Table 10.
In another particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 10 and/or 11, wherein at least 2 genomic DNA regions or fragments thereof are selected from Table 10. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 10 and/or 11, wherein at least 3, at least 4 or at least 5 genomic DNA regions or fragments thereof are selected from Table 10. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions or fragments thereof selected from Tables 10 and/or 11, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions or fragments thereof are selected from Table 10.
In another particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions or fragments thereof selected from Tables 10 and/or 11. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6, at least 7, at least 8, at least 9, at least 10, at least 11 or at least 12 genomic DNA regions or fragments thereof selected from Tables 10 and/or 11.
Using the random forest classification method, a contribution value could be determined for every genomic DNA region listed in Table 1 or Table 5. The contribution value illustrates the importance of a CNA for correct classification of a sample. For each tree, the prediction error rate on the out-of-bag portion of the data is recorded. Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences.
Therefore, in a particular embodiment of the first aspect, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions selected from Table 13, wherein said at least 5 genomic DNA regions have a contribution value of at least 1, at least 2, at least 3, at least 4 or at least 5 as listed in Table 13. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions selected from Table 13, wherein said at least 6 genomic DNA regions have a contribution value of at least 1, at least 2, at least 3, at least 4 or at least 5 as listed in Table 13.
In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions selected from Table 13, wherein at least 2, at least 3, at least 4 or at least 5 genomic DNA regions from said at least 5 genomic DNA regions have a contribution value between 1 and 6 or between 2 and 6 or between 3 and 6 or between 4 and 6 or between 1 and 5 or between 2 and 5 or between 3 and 5 or between 2 and 4 as listed in Table 13.
In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions selected from Table 13, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions from said at least 6 genomic DNA regions have a contribution value between 1 and 6 or between 2 and 6 or between 3 and 6 or between 4 and 6 or between 1 and 5 or between 2 and 5 or between 3 and 5 or between 2 and 4 as listed in Table 13.
In yet another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions selected from Table 14, wherein said at least 5 genomic DNA regions have a contribution value of at least 1, at least 2, at least 3, at least 4, at least 7, at least 8 or at least 9 as listed in Table 14. In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions selected from Table 14, wherein said at least 6 genomic DNA regions have a contribution value of at least 1, at least 2, at least 3, at least 4, at least 7, at least 8 or at least 9 as listed in Table 14.
In another particular embodiment, said colorectal cancer biomarker panel comprises at least 5 genomic DNA regions selected from Table 14, wherein at least 2, at least 3, at least 4 or at least 5 genomic DNA regions from said at least 5 genomic DNA regions have a contribution value between 2 and 10 or between 3 and 10 or between 4 and 10 or between 7 and 10 or between 2 and 8 or between 3 and 8 or between 4 and 8 as listed in Table 14.
In another particular embodiment, said colorectal cancer biomarker panel comprises at least 6 genomic DNA regions selected from Table 14, wherein at least 2, at least 3, at least 4, at least 5 or at least 6 genomic DNA regions from said at least 6 genomic DNA regions have a contribution between 2 and 10 or between 3 and 10 or between 4 and 10 or between 7 and 10 or between 2 and 8 or between 3 and 8 or between 4 and 8 as listed in Table 14.
In yet other embodiments, said colorectal cancer biomarker panel comprises at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120, at least 140 or at least 160 genomic DNA regions selected from Table 1. In a most particular embodiment, said colorectal cancer biomarker panel consist of the genomic DNA regions depicted in Table 1.
In yet other embodiments, said colorectal cancer biomarker panel comprises at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90 or at least 100 genomic DNA regions selected from Table 5.
In most particular embodiments, said colorectal cancer biomarker panel consist of the genomic DNA regions depicted in Table 10, in Table 10 and Table 11, in Table 6 or in Table 6 and Table 7. In another most particular embodiment, said colorectal cancer biomarker panel consist of the genomic DNA regions depicted in Table 5 or in Table 1.
The colorectal cancer biomarker panels disclosed above in the first aspect of this application are from here on referred as “one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of the application” or as “one of the colorectal cancer biomarker panels of the application”.
In a second aspect, the invention relates to the use of a biomarker panel comprising at least 5 or at least 6 genomic DNA regions or fragments thereof selected from Table 1 to determine the copy number alteration status of a colorectal cancer sample. The invention also relates to the use of said biomarker panel to predict copy number instability of a colorectal cancer sample of a CRC patient. In particular embodiments, said Table 1 is Table 5, Table 6 or Table 10. In other particular embodiments, the use of a biomarker panel is provided for determining the copy number alteration status of a CRC sample or the copy number instability of a CRC sample, wherein said biomarker panel is one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of the application described above.
“Copy number instability” as used herein is defined as the gain and/or loss of copies of a specific set of genomic DNA regions. In a more particular embodiment, the invention relates to the use of one of the colorectal cancer biomarker panels of the application to classify a colorectal cancer sample of a CRC patient in a copy number instability (CIN) high or copy number instability low group. Copy number instability-high sample is defined as a sample in which 10%, preferable 15% or more of the DNA region consisting of the biomarker panel used for the analysis (e.g. the genomic regions selected from Table 1 or fragments thereof that were used to determine the CNAs) is affected by copy number alterations, more preferably 20% or more of the DNA region consisting of the biomarker panel used for the analysis (e.g. the genomic regions selected from Table 1 or fragments thereof that were used to determine the CNAs) is affected by CNAs and most preferably 26% or more of the DNA region consisting of the biomarker panel used for the analysis (e.g. the genomic regions selected from Table 1 or fragments thereof that were used to determine the CNAs) is affected by CNAs. This is especially important since our data surprisingly revealed that patients' samples with a high copy number instability are responsive to anti-VEGF therapy. To investigate whether the irresponsiveness of patients from cluster 1 to Avastin therapy was caused by tumors that show microsatellite instability we stratified patients from cluster 1 in MSI (a) and microsatellite stable (MSS) patients (b) and determined the relation with response to Avastin therapy. No difference between the samples was observed indicating that copy number stable samples that are MSS show no improved response on Avastin therapy and the effect is not solely dependent on MSI. In an alternative embodiment, a CIN-low sample is defined as a sample in which less than 10% of the DNA region consisting of the biomarker panel used for the analysis (e.g. the genomic regions selected from Table 1 or fragments thereof that were used to determine the CNAs) is affected by CNAs.
In a third aspect, the invention relates to the use of a biomarker panel comprising at least 5 or at least 6 genomic DNA regions or fragments thereof selected from Table 1 to cluster colorectal cancer samples in distinct genetic subtypes, wherein said subtypes are constructed using a dataset of multiple CRC samples. In a particular embodiment, said Table 1 is Table 5, Table 6 or Table 10. In another particular embodiment, the use of a biomarker panel to cluster colorectal cancer samples in distinct genetic subtypes is provided, wherein said subtypes are constructed using a dataset of multiple CRC samples and wherein said biomarker panel is one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of the application described above. More particular said biomarker panel comprises at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90 or at least 100 genomic regions or fragments thereof selected from Table 1 or Table 5. Even more particularly, said biomarker panel comprises at least 120, at least 130, at least 140, at least 150, at least 160 or at least 170 genomic regions or fragments thereof selected from Table 1. Most particularly, said biomarker panel consists of the genomic regions or fragments thereof selected from Table 1.
Dataset of multiple CRC samples are free available and are accessible to the person skilled in the art. In a preferred embodiment, said dataset consist of at least 100 CRC samples, more preferably at least 200 CRC samples, more preferably at least 300 CRC samples, most preferably at least 400 CRC samples.
In a particular embodiment, the use of a biomarker panel to cluster colorectal cancer samples in distinct genetic subtypes is provided, wherein said subtypes are constructed using a dataset of multiple CRC samples and wherein said biomarker panel is one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of the application described above, and wherein said subtypes are constructed using unsupervised hierarchical clustering. In a more particular embodiment, said subtypes are characterized by the copy number alteration specifications depicted in Tables 2, 3 and 4.
Using the means and methods of current application, 3 distinct genetic subtypes were determined, however depending on the clustering method and preferences of the practitioner more or less genetic subtypes can be constructed using the biomarker panel of the invention. In another particular embodiment, the use of a biomarker panel to cluster colorectal cancer samples in 3 distinct genetic subtypes is provided, wherein said subtypes are characterized by the copy number alteration specifications depicted in Tables 2, 3 and 4, wherein said biomarker panel is one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of the application described above.
The term “genetic subtype” as used herein means a category of CRC samples having common genetic characteristics, more precisely having a common copy number alteration status. “Distinct” means different, separate or diverse. In the application “genetic subtype” refers to a specific cluster. Cluster 1, 2, 3 are thus respectively the same as genetic subtype 1, 2, 3. The term “copy number alteration specifications” as used herein means the conditions to which a genetic sample must comply to fall into one of the genetic subtypes described in this application.
In another embodiment, the use of a biomarker panel is provided to predict the responsiveness of a colorectal cancer patient to anti-VEGF therapy, wherein said biomarker panel is one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of the application described above. The invention thus also relates to the use of the biomarker panel comprising at least 5 or at least 6 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 to predict the responsiveness of a colorectal cancer patient to anti-VEGF therapy. In a more particular embodiment, said anti-VEGF therapy is bevacizumab therapy.
This is equivalent as saying that the biomarker panels described in the first aspect of the application or more particularly the biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 are provided for use in diagnosis of a CRC patient with responsiveness to anti-VEGF therapy, where in a particular embodiment said anti-VEGF therapy is bevacizumab therapy.
“Responsiveness” is defined in this application as the reaction or response of a CRC patient to an anti-VEGF treatment, more precisely to bevacizumab therapy. The response is positive or the patient is responsive to anti-VEGF therapy if the treatment clinically improves the situation of the patient. The term “anti-VEGF therapy” as used herein refers to an anti-angiogenic therapy, i.e. a therapy for example a medicament that inhibits angiogenesis or the growth of new blood vessels. VEGF stands for vascular endothelial growth factor and is a signal protein produced by cells that stimulates vasculogenesis and angiogenesis. Bevacizumab or avastin is a frequently used anti-VEGF antibody for treating cancer.
Bevacizumab and avastin are interchangeably used in this application. “Bevacizumab therapy” thus refers to the treatment of a patient that comprises bevacizumab administration. Bevacizumab can be administered as monotherapy or as combination therapy. Typically, monotherapy is used to describe the use of a single medication, while combination therapy or polytherapy uses more than one medication. A pharmacological therapy (i.e. a therapy that consists of one or more medicament against a single disease) can also be combined with other non-pharmacological therapies as radiation therapy and surgery.
In a fourth aspect, a method is provided to determine the genetic subtype of a colorectal cancer sample from a colorectal cancer patient, said method comprises:
In particular embodiments, said clustering of step a) is done using a CRC biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 while said classification of step b) is done using the same said CRC biomarker panel.
The invention also provides a method for determining the genetic subtype of a colorectal cancer sample, comprising determining the copy number alteration status of a colorectal cancer sample of a colorectal cancer patient using one of the colorectal cancer biomarker panels described in the first aspect of the application (e.g. comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5); classifying said colorectal cancer sample in one of 3 distinct genetic subtypes wherein said subtypes are characterized by the copy number alteration specifications depicted in Tables 2, 3 and 4; to determine the genetic subtype of said colorectal cancer sample.
“Classifying” means arranging a sample in a specific category (e.g. genetic subtype) according to shared qualities or characteristics with the other subjects of the specific category.
In a fifth aspect, the invention provides a method for the identification of a patient responsive to anti-VEGF therapy comprising determining the copy number alteration status of a colorectal cancer sample of a colorectal cancer patient using one of the colorectal cancer biomarker panels disclosed in one of the embodiments of the first aspect of this application (e.g. comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5); classifying said colorectal cancer sample in one of 3 distinct genetic subtypes wherein said subtypes are characterized by the copy number alteration specifications depicted in Tables 2, 3 and 4 to determine the genetic subtype of said colorectal cancer sample, wherein classification of said patient in genetic subtypes 2 or 3 respectively depicted in Table 3 or 4 is indicative for said patient to be responsive to anti-VEGF therapy. In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In a sixth aspect, the invention provides a method for the identification of a patient responsive to anti-VEGF therapy comprising:
In a particular embodiment, the invention provides a method for the identification of a patient responsive to anti-VEGF therapy comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
The term “indicative” as used herein means that a patient is predicted to be responsive to a therapy.
In another embodiment the invention provides a method for the identification of a patient responsive to anti-VEGF therapy, said method comprising determining the copy number instability of the genome of a CRC sample of a CRC patient, wherein a high copy number instability is indicative for said patient to be responsive to anti-VEGF therapy. The invention also provides methods for the identification of a patient responsive to anti-VEGF therapy, said methods comprising determining the copy number instability of a CRC sample of a CRC patient using one of the CRC biomarker panels disclosed in the first aspect of the application, wherein a high copy number instability is indicative for said CRC patient to be responsive to anti-VEGF therapy. The invention also provides methods for the identification of a patient responsive to anti-VEGF therapy, said methods comprising determining the copy number instability of a CRC sample of a CRC patient using a biomarker panel comprising at least 5 or at least 6 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 or Table 6 or Table 10 wherein a high copy number instability is indicative for said CRC patient to be responsive to anti-VEGF therapy. In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
A high copy number instability means 10% or more, 15% or more, more preferably 20% or more, most preferably 26% or more of the DNA sequence consisting of the genomic regions or fragments thereof used for the analysis (e.g. those selected from Table 1 or Table 5) is affected by copy number alterations. The invention thus also provides methods for the identification of a patient responsive to anti-VEGF therapy comprising determining the copy number instability of a CRC sample of a CRC patient using one of the biomarker panels disclosed in the first aspect of the application or using a biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 or Table 6 or Table 10, wherein a copy number instability of 15% or more is indicative for said CRC patient to be responsive to anti-VEGF therapy or more particularly to bevacizumab therapy. The invention also provides methods for the identification of a patient responsive to anti-VEGF therapy comprising determining the copy number instability of the genome of a CRC sample of a CRC patient using one of the biomarker panels disclosed in the first aspect of the application or using a biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 or Table 6 or Table 10, wherein a copy number instability of 20% or more is indicative for said patient to be responsive to anti-VEGF therapy or more particularly to bevacizumab therapy. The invention also provides methods for the identification of a patient responsive to anti-VEGF therapy comprising determining the copy number instability of a CRC sample of a CRC patient using one of the biomarker panels disclosed in the first aspect of the application or using a biomarker panel comprising at least 5 genomic DNA regions or fragments thereof selected from Table 1 or Table 5 or Table 6 or Table 10, wherein a copy number instability of 26% or more is indicative for said CRC patient to be responsive to anti-VEGF therapy or more particularly to bevacizumab therapy.
In another embodiment, the invention provides a method for treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment, the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
“Respectively depicted” as used in the application means that the specifications of subtype 2 are shown in Table 3 and that of subtype 3 are presented in Table 4.
In another embodiment the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment the invention provides a method of treating colorectal cancer in a subject in need thereof, comprising:
In a more particular embodiment, the anti-VEGF therapy is bevacizumab therapy.
In another embodiment the invention provides a kit to determine the copy number alteration status in a colorectal cancer sample, comprising primers or probes for detection of at least 5 genomic DNA regions or fragments thereof selected from Table 1, Table 5, Table 6 or Table 10. In another embodiment the invention provides a kit to determine the copy number instability of a colorectal cancer sample, comprising primers or probes for detection of at least 5 genomic DNA regions selected from Table 1, Table 5, Table 6 or Table 10.
It is to be understood that although particular embodiments, specific configurations as well as materials and/or molecules, have been discussed herein for cells and methods according to the present invention, various changes or modifications in form and detail may be made without departing from the scope and spirit of this invention. The following examples are provided to better illustrate particular embodiments, and they should not be considered limiting the application. The application is limited only by the claims.
We collected tumor biopsies from 278 metastatic colorectal cancer (mCRC) patients. After confirming the histopathology and assessing tumor content as described in the methods, we successfully performed whole-exome sequencing (WES) on paired biopsies from 194 patients (Coverage 57.3±41.6×). Low-coverage whole-genome sequencing was performed on 238 patients and copy number profiles were generated as described in the methods section. Subsequently, we manually assessed each profile to check whether tumor content was sufficiently high resulting in 176 profiles that were used for further downstream analysis. In total 157 patients were treated with combination-bevacizumab (bvz) therapy, 2 with monotherapy bvz and 15 with chemotherapy backbone only. For the survival analyses, we excluded the 2 patients with mono-therapy for the survival analyses and furthermore excluded 16 patients that received combination-bvz therapy in 2nd, 3rd or 4th line (n=13, 2 and 1 respectively) for which no information about first-line therapy was available. Another 2 patients were excluded since no clinical information about the therapy was available. Furthermore we performed whole-exome sequencing on 128 out of the 156 patients. Additionally we downloaded publicly available copy number data of a cohort of 205 patients from the CAIRO trial that were treated with Irinotecan-Capecitabine (CAPIRI) or capecitabine (CAP) only (Agilent oligonucleotide hybridization arrays; GSE36864) (Haan et al 2014) and a cohort of 499 patients from the TCGA network (http://gdac.broadinstitute.org/). Furthermore, a second cohort 106 of combination bevacizumab treated metastatic colorectal tumors and accompanying normal tissue from the MOMA clinical trial was provided by The University of Pisa in Italy (NCT02271464). For this replication cohort, the same procedure was followed, resulting in 78 patients selected for further analysis. All patients were treated with combination-bvz therapy.
GISTIC analysis was performed on all 880 tumors to identify the most frequent and overrepresented somatic copy number aberrations (SCNAs) in the tumors (hereafter referred to as recurrent SCNAs). This analysis revealed the presence of 43 recurrent focal amplifications and 59 recurrent focal deletions as well as whole-arm aberrations in every chromosome (Table 1;
Characterisation of the clusters on the somatic mutation and copy number level revealed that patients in clusters 2 and 3 had substantially more chromosomal breakpoints and a higher proportion of the genome was affected by copy number aberrations, while cluster 1 showed almost no CNAs or breakpoints (
Next, we performed Kaplan-Meier analysis to determine the relationship between the different clusters and the patients' progression free survival and overall survival. Univariate COX-regression revealed that cluster 3 correlated with slightly worse overall survival compared to cluster 1 (P=0.3×10−2; H R=1.44 CI95=1.03-1.99). However, multivariate analysis using a COX-regression with age, TNM-staging as numerical factors and age as categorical factor showed that not the cluster but primary tumor, regional lymph nodes and distant metastases staging are the main contributors to worse prognosis (P=2.53×10−4, HR=1.51, CI95=1.21-1.89; P=2.25×10−5, HR=1.35, CI95=1.17-1.55 and P=7.63×10−10, HR=2.36, CI95=1.80-3.11 respectively) (
In a next step, we selected all the stage IV tumors, repeated the hierarchical clustering and performed the same characterization on clinical and genomic level. Similar as for all the colorectal samples, the subset of mCRC tumors showed an enrichment for TP53 mutations (P=5.2×10−4) and a substantially increased number of CNAs and breakpoints in clusters 2 and 3, while cluster 1 showed almost no CNAs and was enriched for MSI-tumors (P=2.1×10−15) and hypermutator phenotypes (P=1×10−5) as well as BRAF (P=2.6×10−3) and PIK3CA mutations (P=4.2×103)(
However, when assessing progression free and overall survival it became apparent that, compared to cluster 1, cluster 2 and 3 showed significantly better PFS (P=2.63×10−5, HR=0.44, CI95=0.30-0.65 and P=1.85×10−4, HR=0.50, CI95=0.34-0.72 for cluster 2 and 3 respectively) and OS (P=3.85×10−4, HR=0.48, CI95=0.32-0.72 and P=9.73×10−3, HR=0.60, CI95=0.41-0.88 for cluster 2 and 3 respectively). Multi-variate analysis correcting for relevant covariates confirmed that clusters 2 and 3 were significantly correlated with improved PFS (P=1.22×10−4, HR=0.45, CI95=0.30-0.68 and P=7.47×10−4, HR=0.52, CI95=0.35-0.76 for cluster 2 and 3 respectively) and OS (P=1.73×103, HR=0.51, CI95=0.33-0.78 and P=1.92×10−2, HR=0.62, CI95=0.41-0.92 for cluster 2 and 3 respectively), while primary tumor (P=2.35×10−2, HR=1.27, CI95=1.03-1.57 and P=2.82×10−3, HR=1.42, CI95=1.13-1.78 for clusters 2 and 3) and regional lymph node staging contributed negatively (P=2.50×10−1, HR=1.08, CI95=0.95-1.24 and P=1.09×10−2, HR=1.21, CI95=1.04-1.39 for clusters 2 and 3)(
We stratified patients in two groups, namely those that received combination-Avastin therapy (n=141) and a control group receiving combination chemotherapy only (n=220). For each cluster we used the Kaplan Meier method with a log-rank test to evaluate the correlation with progression-free survival and overall survival. For progression free survival, patients from clusters 2 and 3 showed a significant benefit when comparing patients treated with Avastin to the control group (P=3.23×10−3, HR=0.58, CI95=0.41-0.83 and P=2.01×10−6, HR=0.495, CI=0.37-66 for cluster 2 and 3 respectively). No difference was noted for the patients in cluster 1. Similar results were obtained when combining the patients from clusters 2 and 3 in one group (P=1.36×10−7, HR=0.54, CI95=0.43-0.68) (
Since patients from clusters 2 and 3 showed a good response to Avastin and these clusters were characterized by a higher CIN we hypothesized that tumors with CIN would have a better response to Avastin. We therefore divided the patients in two groups based on the proportion of the regions that are affected by CNAs (i.e. CNA-high tumors which have more affected regions the first quartile limit and CNA-low tumors with less than or equal to the first quartile limit)(Guinney et al 2015 Nature Medicine 21:1350-1356). When comparing CNA-high with CNA-low tumors within the group of patients that were treated with Avastin, CNA-high tumors showed a significant better progression free survival (P=1.74×103; HR=0.513; CI95=0.30-0.88). Multivariate analysis using a COX-regression with age, TNM-staging as numerical factors and age as categorical factor revealed that this effect was also observed independent from clinical factors (P=3.4×10−3; HR=0.484; CI95=0.30-0.79). In contrast, this correlation was not observed when comparing CNA-high and CNA-low tumors in the control group (
To determine whether our clustering technique could also be performed using only focal CNAs we repeated the clustering using only focal CNAs as input (Table 5). This resulted in the identification of 3 very similar clusters. Characteristics of the three clusters when performing clustering on all CRC samples (n=883) are nearly identical to the clusters created with all 180 CNAs (
In order to be able to classify a single sample to one of the three clusters and evaluate how many genomic regions are needed to classify a given CRC sample into one of the three predefined clusters we used 3 different techniques: recursive partitioning, random forest classification and k-nearest neighbour classification (Breiman et al 1984 Classification and regression trees, Wadsworth 368; Breiman 2001 Random Forest, Machine Learning 45:5-32; Venables and Ripley 2002 Modern Applied Statistics with S, Springer).
Recursive partitioning revealed that as little as 5 regions can be used to classify samples in one of the three defined clusters with an accuracy as high as 90.7% (
In a second approach, we used the random forest classification algorithm to build a classification model using the predefined clusters from the hierarchical clustering as golden standard. In a first step, we performed a 10-fold cross-validation on the original dataset to determine the accuracy of the model. Hereto, we divided the 442 mCRC samples used for the original clustering 10 times at random, each time in a training set (90% of the samples) and validation set (10% of the samples) in such a manner that each sample is presented only once in the whole of 10 validation sets. Next a random forest classifier was generated from 500 balanced bootstraps of the training data. When we applied this classifier to the validation data the classifier demonstrated robust performance with high overall accuracy (94.1% and 91.5% for all 180 regions and only the 102 focal regions respectively) with a >90% balanced accuracy across all 3 clusters (Table 17). In a last step, we built a final model using the complete dataset as input and subsequently also calculated the contribution of each region to the final model. These contributions are represented in Table 13. Similarly, for the analysis using the 102 regions affected by focal CNAs only, the contribution of each region to this model is represented in Table 14.
It is of interest to note that the tier 1 and 2 from the recursive partitioning are also the highest ranking regions when comparing them with the random test contribution (Table 15 and table 16).
In a third approach, we used the k-nearest neighbors algorithm to build a classification model. Similar as the random forest classification we used a 10-fold cross validation to determine the model accuracy which was 86.6% and 87.3% for all 180 regions and only the 102 focal regions respectively with a >88% balanced accuracy across all 3 clusters. In a next step we applied these models to an independent dataset.
We applied both the k-nearest neighbors and random forest models to the additional dataset of 78 mCRC samples. Using both the k-nearest neighbouring model and the random forest classification we were able to classify the samples in 3 different clusters with very similar characteristics as the predefined clusters that arose from the hierarchical clustering. Further survival analysis again showed that patients from cluster 1 have worse PFS compared to patients from clusters 2 and 3 both in the analysis with all 180 regions as well as using only the 102 regions affected by focal CNAs (
Similar as in example 5 we divided the patients in two groups based on the proportion of the regions that are affected by CNAs (i.e. CNA-high tumors which have more affected regions the first quartile limit and CNA-low tumors with less than or equal to the first quartile limit) and performed this for 6 different subsets (1) using only the 102 focal regions, (2) using the top 50 ranked regions from the random forest classification model built with the 102 focal regions, (3) using the tier 1 and tier 2 regions from the recursive partitioning applied on the 102 focal regions, (4) using all 180 genomic regions (5) using the tier 1 and tier 2 regions from the recursive partitioning applied all 180 regions and (6) using the top 50 ranked regions from the random forest classification model built with the 102 focal regions (
Similar as the analysis in example 5 we stratified patients in CNA-high and CNA-low (with the thresholds set to 30% and 25% for the top 50 ranking regions from the random forest classifiers and the tier 1 and 2 regions for both only focal regions and all 180 regions respectively. For each of the different subsets of genomic regions CNA-high tumors showed an increased progression free survival (
Materials and Methods
Sample Collection
Tumor tissue of 278 CRC patients receiving combination bevacizumab treatment or chemotherapeutic agents alone were identified and provided from the tissue bio-banks of the Royal College of Surgeons in Ireland (RCSI) Beaumont Hospital (n=29), The University of Heidelberg (UHEI) in Germany (n=107) and the VU university medical centre (VUMC) in The Netherlands (n=142). A second cohort 106 of combination bevacizumab treated tumors and accompanying normal tissue from the MOMA clinical trial was provided by The University of Pisa in Italy (NCT02271464). Informed consent was obtained from the patient, following the ethical approval of the local ethical committee. After tissue collection, samples were reviewed by qualified pathologists to reconfirm cancer diagnosis and delineate adjacent normal tissue. Only tumor blocks with (1) at least 30% tumor cell content, as judged by a routine hematoxylin and eosin (H&E) staining, (2) sufficient tissue volume in order to allow successful DNA isolation and (3) clinical data available were considered for further processing and analysis. Additionally we downloaded publicly available copy number data of a cohort of 205 patients from the CAIRO trial that were treated with Irinotecon-Capecitabine (CAPIRI) or capecitabine (CAP) only (Agilent oligonucleotide hybridization arrays; GSE36864) (Haan et al 2014).
DNA Isolation
After pathological examination, 1-10 FFPE slides (5-10 μm) were used for DNA extraction. Regions with high tumor content as well as regions containing only normal cells as indicated by the pathologist were macro-dissected from individual slides. Subsequently the FFPE tissue sections are deparaffinised using a series of xylene and ethanol washes. The sections were then subjected to purification and homogenization (by gentle shaking at 400 rpm while incubation in buffer ALT and Proteinase K at 56° C.) to remove fixatives and aid lysis. After deparaffinisation and tissue digestion, DNA was further extracted using the QIAamp DNA FFPE Tissue kit (QIAgen) following the manufacturer's instructions. The resulting DNA was quantified using the Picogreen Assay (Life Technologies) following the manufacturer's instructions. This assay allows to accurately determine the concentration of double-strand DNA needed for further sequencing library preparation. Only samples with a yield of more than 0.5 μg of dsDNA and a concentration >7.5 ng/μl were selected for further library preparation.
Low-Coverage Whole Genome Sequencing
Shot-gun whole genome libraries were prepared using KAPA library preparation kit (KAPA Biosystems). Since the DNA was extracted from FFPE tissue blocks, whole genome DNA libraries from matched normal and tumor tissue samples were created according to the manufacturer's instructions with some modifications to the protocol. Before end repair, a 4 hour incubation step at 65° C. was added to remove as many reversible crosslinks as possible after which excessive single stranded DNA was removed using Mung-Bean nuclease. The concentration double stranded DNA was reassessed using picogreen and the concentration of adapters used in the ligation step of the library construction was altered according to the present DNA. For the library enrichment, 5 to 15 cycles of PCR with intermediate assessment steps were used instead to ensure low adapter dimer content and high library yield. After quantification with qPCR, the resulting libraries were sequenced on a HiSeq2500 (Illumina) at low coverage (±0.1×). Raw sequencing reads were mapped to the human reference genome (NCBI37/hg19) using Burrows-Wheeler Aligner (BWA v0.5.8a) (Li and Durbin 2010). Picard (v1.43) was used to remove PCR duplicates. CNAs were identified by binning the reads in 30 Kb windows, correcting for genomic waves using the PennCNV software package (Wang et al 2007) and the resulting number of reads per 30 Kb window were transformed into log R-values. The ASCAT algorithm version 2.0.1 (Van Loo et al 2010) was used to segment the raw data and estimate tumor percentages and overall ploidy. Subsequently, GISTIC v2.0 (Mermel et al 2011) was used to identify the most frequent and overrepresented chromosomal aberrations in tumors. A region was considered deleted if the log R value was <0.1 and amplified when the log R was >0.1 A cut-off q-value of 0.25 was used to select significantly overrepresented CNAs. CNAs spanning >70% of a chromosomal arm were defined as whole-arm CNAs, while CNAs spanning <70% of a chromosomal arm were considered focal CNAs. Significant amplified or deleted regions were assigned as homozygous deletion, loss, diploid, gain or amplification for each sample based on Log R signal and GISTIC output threshold values (t<−1.3; −1.3<t<−0.1; −0.1<t<0.1; 0.1<t<0.9; t>0.9 respectively).
Whole-Exome Sequencing
After confirmation of successful library construction, whole exome enrichment was performed using the SeqCapV3 exome enrichment kit (Roche) following the manufacturer's instructions. The resulting whole-exome libraries were then sequenced on a HiSeq2500 using a V3 flowcell generating 2×100 bp paired end reads. Raw sequencing reads were mapped to the human reference genome (NCBI37/hg19) using Burrows-Wheeler Aligner (BWA v0.5.8a) (Li and Durbin 2010) and aligned reads were processed and sorted with SAMtools (v0.1.19) (Li et al 2009). Duplicate reads were removed using Picard tools. Base recalibration, local realignment around insertions and deletions and single nucleotide variant calling were performed using the GenomeAnalysisToolKit (GATK) (McKenna et al 2010). Insertions and deletions were called using Dindel (Albers et al 2011). By subtracting variants and indels detected in the matched germline DNA from those found in the tumor DNA, somatic mutations were selected. Low quality mutations were removed based on mapping quality and coverage. ANNOVAR (Wang et al 2010) was used to annotate the remaining mutations and exonic non-synonymous mutations and frame-shift insertions or deletions were selected. Common variants (MAF>1%) were filtered out using the following databases as described previously (Zhao et al 2014): (1) dbSNP version 132, (2) 1000 Genomes Project, (3) Axiom Genotype Data Set, (4) Complete Genomics diversity panel (46 hapmap individuals).
Statistical Analysis
Consensus clustering using unsupervised Hierarchical Ward clustering was performed using the packages ‘ConsensusClusterPlus’ and ‘hclust’ in R on all samples using the recurrent CNAs identified from the GISTIC analysis on all mCRC samples as input using a subsampling size of 80% and 50 repetitions. Multivariate survival analysis between the different clusters was performed using a Cox regression analysis using TNM staging and age as numerical factors while gender and the cluster were used as categorical factors. For each cluster and to compare CNA-high with CNA-low patients, survival of patients receiving combination bevacizumab therapy was compared with patients treated with chemotherapy in a univariate analysis using the Kaplan Meier method and evaluated with a log rank-test. Recursive partitioning was performed using the R-package ‘rpart’ using the ‘class’ method. To determine the different tiers we used all 180 regions to build a first most optimal regression tree. Next, in a stepwise manner we removed one of the regions used in the tree and generated a second tree, after that we reinserted that specific region again and removed another region and generated a third, fourth, fifth etc. . . . tree. By performing this on 4 different levels (each time removing and replacing one of the used regions) we selected the most important regions based on recurrent selection by the recursive partitioning. The CNAs selected after the first analysis completed 4 levels were assigned to tier 1, removed from the list of 180 regions and the process was repeated to generate tier 2, tier 3 and tier 4. A similar approach was used for the 102 focal regions to determine tier 1, tier 2 and tier 3 regions. Random forest classification was performed using the R-package ‘rf’ using. K-nearest neighbors classification was performed using the package ‘knn’. For both the random forest and k-nearest neighbors classifiers, we performed a 10-fold cross-validation on the original dataset to determine the accuracy of the model. Hereto, we divided the 442 mCRC samples used for the original clustering 10 times at random, each time in a training set (90% of the samples) and validation set (10% of the samples) in such a manner that each sample is presented only once in the whole of 10 validation sets. Next a random forest classifier was generated using the training data. We then applied this classifier to the validation data to determine the models accuracy.
Number | Date | Country | Kind |
---|---|---|---|
1606923.9 | Apr 2016 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/059559 | 4/21/2017 | WO | 00 |