IDENTIFICATION AND USE OF CIRCULATING TUMOR MARKERS

BACKGROUND OF THE INVENTION

Analysis of cancer-derived cell-free DNA (cfDNA) has the potential to revolutionize detection and monitoring of cancer. Noninvasive access to malignant DNA is particularly attractive for solid tumors, which cannot be repeatedly sampled without invasive procedures. In non-small cell lung cancer (NSCLC), PCR-based assays have been used previously to detect recurrent point mutations in genes such as KRAS or EGFR in plasma DNA (Taniguchi et al. (2011) Clin. Cancer Res. 17:7808-7815; Gautschi et al. (2007) Cancer Lett. 254:265-273; Kuang et al. (2009) Clin. Cancer Res. 15:2630-2636; Rosell et al. (2009) N. Engl. J. Med. 361:958-967), but the majority of patients lack mutations in these genes. Other studies have proposed identifying patient-specific chromosomal rearrangements in tumors via whole genome sequencing (WGS), followed by breakpoint qPCR from cfDNA (Leary et al. (2010) Sci. Transl. Med. 2:20ra14; McBride et al. (2010) Genes Chrom. Cancer 49:1062-1069). While sensitive, such methods require optimization of molecular assays for each patient, limiting their widespread clinical application. More recently, several groups have reported amplicon-based deep sequencing methods to detect cfDNA mutations in up to 6 recurrently mutated genes (Forshew et al. (2012) Sci. Transl. Med. 4:136ra168; Narayan et al. (2012) Cancer Res. 72:3492-3498; Kinde et al. (2011) Proc. Natl Acad. Sci. USA 108:9530-9535). While powerful, these approaches are limited by the number of mutations that can be interrogated (Rachlin et al. (2005) BMC Genomics 6:102) and the inability to detect genomic fusions.

PCT International Patent Publication No. 2011/103236 describes methods for identifying personalized tumor markers in a cancer patient using “mate-paired” libraries. The methods are limited to monitoring somatic chromosomal rearrangements, however, and must be personalized for each patient, thus limiting their applicability and increasing their cost.

U.S. Patent Application Publication No. 2010/0041048 A1 describes the quantitation of tumor-specific cell-free DNA in colorectal cancer patients using the “BEAMing” technique (Beads, Emulsion, Amplification, and Magnetics). While this technique provides high sensitivity and specificity, this method is for single mutations and thus any given assay can only be applied to a subset of patients and/or requires patient-specific optimization. U.S. Patent Application Publication No. 2012/0183967 A1 describes additional methods to identify and quantify genetic variations, including the analysis of minor variants in a DNA population, using the “BEAMing” technique.

U.S. Patent Application Publication No. 2012/0214678 A1 describes methods and compositions for detecting fetal nucleic acids and determining the fraction of cell-free fetal nucleic acid circulating in a maternal sample. While sensitive, these methods analyze polymorphisms occurring between maternal and fetal nucleic acids rather than polymorphisms that result from somatic mutations in tumor cells. In addition, methods that detect fetal nucleic acids in maternal circulation require much less sensitivity than methods that detect tumor nucleic acids in cancer patient circulation, because fetal nucleic acids are much more abundant than tumor nucleic acids.

U.S. Patent Application Publication Nos. 2012/0237928 A1 and 2013/0034546 describe methods for determining copy number variations of a sequence of interest in a test sample comprising a mixture of nucleic acids. While potentially applicable to the analysis of cancer, these methods are directed to measuring major structural changes in nucleic acids, such as translocations, deletions, and amplifications, rather than single nucleotide variations.

U.S. Patent Application Publication No. 2012/0264121 A1 describes methods for estimating a genomic fraction, for example, a fetal fraction, from polymorphisms such as small base variations or insertions-deletions. These methods do not, however, make use of optimized libraries of polymorphisms, such as, for example, libraries containing recurrently-mutated genomic regions.

U.S. Patent Application Publication No. 2013/0024127 A1 describes computer-implemented methods for calculating a percent contribution of cell-free nucleic acids from a major source and a minor source in a mixed sample. The methods do not, however, provide any advantages in identifying or making use of optimized libraries of polymorphisms in the analysis.

PCT International Publication No. WO 2010/141955 A2 describes methods of detecting cancer by analyzing panels of genes from a patient-obtained sample and determining the mutational status of the genes in the panel. The methods rely on a relatively small number of known cancer genes, however, and they do not provide any ranking of the genes according to effectiveness in detection of relevant mutations. In addition, the methods were unable to detect the presence of mutations in the majority of serum samples from actual cancer patients.

There is thus a need for new and improved methods to detect and monitor tumor-related nucleic acids in cancer patients.

SUMMARY OF THE INVENTION

The present invention addresses these and other problems by providing novel methods and systems relating to the characterization, diagnosis, and monitoring of cancer. In particular, according to one aspect, the invention provides methods for creating a library of recurrently mutated genomic regions comprising:

identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer;

wherein the library comprises the plurality of genomic regions;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In specific embodiments of these methods, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

In other specific method embodiments, at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In still other specific method embodiments, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

In some embodiments, the identifying step comprises for each genomic region in the plurality of genomic regions, ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

In other embodiments, the identifying step comprises for each genomic region in the plurality of genomic regions, ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

In some embodiments, the library comprises a plurality of genomic regions encoding a plurality of driver sequences, more specifically known driver sequences or driver sequences that are recurrently mutated in the specific cancer.

In some embodiments, the library comprises a plurality of genomic regions that are recurrently rearranged in the specific cancer.

In preferred embodiments, the specific cancer is a carcinoma, and in more preferred embodiments, the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

In specific embodiments, the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

In another aspect, the invention provides methods for analyzing a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer;

sequencing a plurality of target regions in the tumor nucleic acid sample and in the genomic nucleic acid sample to obtain a plurality of tumor nucleic acid sequences and a plurality of genomic nucleic acid sequences; and

comparing the plurality of tumor nucleic acid sequences to the plurality of genomic nucleic acid sequences to identify a patient-specific genetic alteration in the tumor nucleic acid sample;

wherein the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In specific embodiments of this aspect of the invention, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

In other specific embodiments, at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In still other specific embodiments, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

In some embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

In other embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

In some embodiments, the plurality of genomic regions comprises genomic regions encoding a plurality of driver sequences, more specifically known driver sequences or driver sequences that are recurrently mutated in the specific cancer.

In some embodiments, the plurality of genomic regions comprises genomic regions that are recurrently rearranged in the specific cancer.

In preferred embodiments, the specific cancer is a carcinoma, and in more preferred embodiments, the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

In specific embodiments, the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

In some embodiments, the methods further comprising the steps of:

obtaining a cell-free nucleic acid sample from the subject; and

identifying the patient-specific genetic alteration in the cell-free nucleic acid sample.

In specific embodiments, the step of identifying the patient-specific genetic alteration in the cell-free nucleic acid sample comprises sequencing a genomic region comprising the patient-specific genetic alteration in the cell-free sample.

In other specific embodiments, the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample comprises the step of enriching the plurality of target regions in the tumor nucleic acid sample and the genomic nucleic acid sample, and in more specific embodiments, the enriching step comprises use of a custom library of biotinylated DNA.

In still other specific embodiments, the step of obtaining a cell-free nucleic acid sample comprises the step of enriching the plurality of target regions in the cell-free nucleic acid sample, and in still more specific embodiments, the enriching step comprises use of a custom library of biotinylated DNA.

In some embodiments, the methods further comprise the step of quantifying the cancer-specific genetic alteration in the cell-free sample.

In yet another aspect, the invention provides methods for screening a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a cell-free nucleic acid sample from a subject;

sequencing a plurality of target regions in the cell-free sample to obtain a plurality of cell-free nucleic acid sequences; and

identifying a cancer-specific genetic alteration in the cell-free sample;

wherein the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In specific embodiments, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

In particular embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

In other particular embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

In still other particular embodiments, the plurality of genomic regions comprises genomic regions encoding a plurality of driver sequences, and, more particularly, the driver sequences are known driver sequences or are recurrently mutated in the specific cancer.

In yet still other particular embodiments, the plurality of genomic regions comprises genomic regions that are recurrently rearranged in the specific cancer.

In some embodiments, the specific cancer is a carcinoma, including, for example, an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

In specific embodiments, the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

In other specific embodiments, the step of obtaining a cell-free nucleic acid sample comprises the step of enriching the plurality of target regions in the cell-free nucleic acid sample, and, in some embodiments, the enriching step comprises use of a custom library of biotinylated DNA.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Development of CAncer Personalized Profiling by Deep Sequencing (CAPP-Seq). (a) Schematic depicting design of CAPP-Seq selectors and their application for assessing circulating tumor DNA. (b) Multi-phase design of the NSCLC CAPP-Seq selector. (c) Analysis of the number of SNVs per lung adenocarcinoma covered by the NSCLC CAPP-Seq selector in the TCGA WES cohort (Training; N=229) and an independent lung adenocarcinoma WES data set (Validation; N=183) (Imielinski et al. (2012) Cell 150:1107-1120). (d) Number of SNVs per patient identified by the NSCLC CAPP-Seq selector in WES data from three adenocarcinomas from TCGA, colon (COAD), rectal (READ), and endometrioid (UCEC) cancers. (e-f) Quality parameters from a representative CAPP-Seq analysis of plasma cfDNA, including length distribution of sequenced cfDNA fragments (e), and depth of sequencing coverage across all genomic regions in the selector (f). (g) Variation in sequencing depth across cfDNA samples from 4 patients.

FIG. 2. CAPP-Seq computational pipeline. Major steps of the bioinformatics pipeline for mutation discovery and quantitation in plasma are schematically illustrated.

FIG. 3. Statistical enrichment of recurrently mutated NSCLC exons captures known drivers.

FIG. 4. Development of the FACTERA algorithm. Major steps used by FACTERA (see Detailed Methods) to precisely identify genomic breakpoints from aligned paired-end sequencing data are anecdotally illustrated using two hypothetical genes, w and v. (a) Improperly paired, or “discordant,” reads (indicated in yellow) are used to locate genes involved in a potential fusion (in this case, w and v). (b) Because truncated (i.e., soft-clipped) reads may indicate a fusion breakpoint, any such reads within genomic regions delineated by w and v are also further analyzed. (c) Consider soft-clipped reads, R1 and R2, whose non-clipped segments map to w and v, respectively. If R1 and R2 derive from a fragment encompassing a true fusion between w and v, then the mapped portion of R1 should match the soft-clipped portion of R2, and vice versa. This is assessed by FACTERA using fast k-mer indexing and comparison. (d) Four possible orientations of R1 and R2 are depicted. However, only Cases 1a and 2a can generate valid fusions (see Detailed Methods). Thus, prior to k-mer comparison (panel c), the reverse complement of R1 is taken for Cases 1b and 2b, respectively, converting them into Cases 1a and 2a. (e) In some cases, short sequences immediately flanking the breakpoint are identical, preventing unambiguous determination of the breakpoint. Let iterators i and j denote the first matching sequence positions between R1 and R2. To reconcile sequence overlap, FACTERA arbitrarily adjusts the breakpoint in R2 (i.e., bp2) to match R1 (i.e., bp1) using the sequence offset determined by differences in distance between bp2 and i, and bp1 and j. Two cases are illustrated, corresponding to sequence orientations described in (d).

FIG. 5. Application of FACTERA to NSCLC cell lines NCI-H3122 and HCC78, and Sanger-validation of breakpoints. (a) Pile-up of a subset of soft-clipped reads mapping to the EML4-ALK fusion identified in NCI-H3122 along with the corresponding Sanger chromatogram. (b) Same as (a), but for the SLC34A2-ROS1 translocation identified in HCC78.

FIG. 6. Improvements in CAPP-Seq performance with optimized library preparation procedures.

FIG. 7. Optimizing allele recovery from low input cfDNA during Illumina library preparation.

FIG. 8. CAPP-Seq performance with various amounts of input cfDNA.

FIG. 9. Analysis of CAPP-Seq background, allele detection threshold, and linearity. (a) Analysis of background rate for 6 NSCLC patient plasma samples and a healthy individual (Detailed Methods). (b) Analysis of biological background in (a) focusing on 107 recurrent somatic mutations from a previously reported SNaPshot panel (Su et al. (2011) J. Mol. Diagn. 13:74-84). Mutations found in a given patient's tumor were excluded. The mean frequency for each patient (horizontal red line) was within confidence limits of the mean background limit of 0.007% (horizontal blue line; panel a). A single outlier mutation (TP53 R175H) is indicated by an orange diamond. (c) Individual mutations from (b) ranked by most to least recurrent, according to median frequency across the 7 samples. (d) Dilution series analysis of expected versus observed frequencies of mutant alleles using CAPP-Seq. Dilution series were generated by spiking fragmented HCC78 DNA into control cfDNA. (e) Analysis of the effect of the number of SNVs considered on the estimates of fractional abundance (95% confidence intervals shown in gray). (f) Analysis of the effect of the number of SNVs considered on the mean correlation coefficient between expected and observed cancer fractions (blue dashed line) using data from panel (d). 95% confidence intervals are shown for (a)-(c). Statistical variation for (d) is shown as s.e.m.

FIG. 10. Empirical spiking analysis of CAPP-Seq using two NSCLC cell lines. (a) Expected and observed (by CAPP-Seq) fractions of NCI-H3122 DNA spiked into control HCC78 DNA are linear for all fractions tested (0.1%, 1%, and 10%; R²=1). (b) Using data from (a), analysis of the effect of the number of SNVs considered on the estimates of fractional abundance (95% confidence intervals shown in gray). (c) Analysis of the effect of the number of SNVs considered on the mean correlation coefficient and coefficient of variation between expected and observed cancer fractions (dashed lines) using data from panel (a). (d) Expected and observed fractions of the EML4-ALK fusion present in HCC78 are linear (R²=0.995) over all spiking concentrations tested (see FIG. 5(b) for breakpoint verification). The observed EML4-ALK fractions were normalized based on the relative abundance of the fusion in 100% H3122 DNA (see Detailed Methods for details). Moreover, a single heterozygous insertion (indel) discovered within the selector space of NCI-H3122 (chr7: 107416855, +T) was concordant with defined concentrations (shown are observed fractions adjusted for zygosity).

FIG. 11. Application of CAPP-Seq for noninvasive detection and monitoring of circulating tumor DNA. (a) Characteristics of 11 patients included in this study (Table 3). P-values reflect a two-sided paired t-test for patients with reporter SNVs detected at both time points; other p-values were determined as described in Methods. ND, mutant DNA was not detected above background. Dashes, plasma sample not available. Smoking history, ≧20 pack years (heavy), >0 pack years (light). (b-d) Disease monitoring using CAPP-Seq. Mutant allele frequencies (left y-axis) and absolute concentrations (right y-axis) are shown. The lower limit of detection (defined in FIG. 2(a)-(b)) is indicated by the dashed lines. (b) Pre- and post-surgery circulating tumor DNA levels quantified by CAPP-Seq in a Stage IB and a Stage IIIA NSCLC patient. Complete resections were achieved in both cases. (c) Disease burden changes in response to chemotherapy in a Stage IV NSCLC patient with three rearrangement breakpoints identified by CAPP-Seq. Tumor volume based on CT measurements and CAPP-Seq mutant allele frequencies are shown. Tu, tumor; Ef, pleural effusion. (d) Detection and monitoring of a subclonal EGFR T790M resistance mutation in a patient with Stage IV NSCLC. The fractional abundance of the dominant clone and T790M-containing clone are shown in the primary tumor (left) and plasma samples (right). (e) Predicted transcripts of three fusion genes detected in case P9. (f) Statistically significant co-occurrence of ROS1 fusions and U2AF1 S34F mutations in NSCLC (P=0.0019; two-sided Fisher's exact test). (g) Exploratory analysis of the potential application of CAPP-Seq for cancer screening. Pre-treatment plasma samples from panel (a) and a plasma sample from a healthy individual were examined for the presence of mutant allele outliers without knowledge of the primary tumor mutations (see Detailed Methods). Error bars represent s.e.m.

FIG. 12. Base-pair resolution breakpoint mapping for all patients and cell lines enumerated by FACTERA. Gene fusions involving ALK (a) and ROS1 (b) are graphically depicted. Schematics in the top panels indicate the exact genomic positions (HG19 NCBI Build 37.1/GRCh37) of the breakpoints in ALK, ROS1, EML4, KIF5B, SLC34A2, CD74, MKX, and FYN. Bottom panels depict exons flanking the predicted gene fusions with notation indicating the 5′ fusion partner gene and last fused exon followed by the 3′ fusion partner gene and first fused exon. For example, in S13del37; R34 exons 1-13 of SLC34A2 (excluding the 3′ 37 nucleotides of exon 13) are fused to exons 34-43 of ROS1. Exons in FYN are from its 5′UTR and precede the first coding exon. The green dotted line in the predicted FYN-ROS1 fusion indicates the first in-frame methionine in ROS1 exon 33, which preserves an open reading frame encoding the ROS1 kinase domain. All rearrangements were each independently confirmed by PCR and/or FISH.

FIG. 13. Presence of fusions is inversely related to the number of SNVs detected by CAPP-Seq. For each patient listed in FIG. 11(a) the number of identified SNVs versus the presence or absence of detected genomic fusions are plotted. The shading of the symbols is identical to FIG. 11(a), and indicates smoking history. Statistical significance was determined using a two-sided Wilcoxon rank sum test, and error bars indicate s.e.m.

FIG. 14. Different types of reporters are similarly useful for disease monitoring. Three SNVs and an ALK translocation identified in patient 6 are concordant at each time point, showing a comparable drop in fractional abundance after treatment with the ALK kinase inhibitor Crizotinib. Due to small differences in measured allele frequencies at each time point, linear regression was used to fit all allele frequencies to their adjusted mutant cfDNA concentrations (R²=0.93). Thus, the scale on the right y-axis is interpolated. To accurately quantify disease burden, translocation and SNV frequencies were adjusted based on differences in zygosity and sequencing depth in the tumor sample (see Detailed Methods).

FIG. 15. Flow cytometry-analysis of P9 pleural effusion. Flow cytometry of cryopreserved cells from a pleural effusion revealed only 0.22% of cells stained positive for the epithelial marker, EpCAM, and negative for the lineage markers CD31 (endothelial cells) and CD45 (immune cells). FACS was used to enrich tumor cells and analysis of tumor-enriched genomic DNA identified 3 fusions (FIG. 11(e)), while unsorted low purity tumor specimen hampered de novo fusion discovery using FACTERA (Detailed Methods).

FIG. 16. Analysis of RNA-Seq data from lung adenocarcinoma patients in TCGA identifies 2 candidate cases with ROS1 rearrangements. (a) ROS1 fusions are known to result in over-expression of the C-terminal kinase domain, and breakpoints typically occur downstream of exon 31 (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870; Rikova et al. (2007) Cell 131:1190-1203; Takeuchi et al. (2012) Nat. Med. 18:378-381). Exon-level RPKM values for ROS1 are plotted for 163 LUAD patients. Two patients (TCGA-05-4426 and TCGA-64-1680) have expression patterns suggestive of ROS1 fusions. (b,c) Pileups of RNA-Seq reads in these two patients illustrate an abundance of reads mapping to regions surrounding ROS1 exon boundaries. Colored reads indicate discordant pairs, consistent with ROS1 fusions. Such pairs map to SLC34A2 for patient TCGA-05-4426 (b) and CD74 for patient TCGA-64-1680 (c). A single soft-clipped RNA-Seq read supports a ROS1-CD74 fusion event in TCGA-64-1680.

FIG. 17. Non-invasive cancer screening with CAPP-Seq, related to FIG. 11(g). (a) Steps to identify candidate SNVs in plasma cfDNA demonstrated using a patient sample with NSCLC (P6, see Table 3). Following stepwise filtration, outlier detection is applied (Detailed Methods). (b) Same as (a), but using a plasma cfDNA sample from a patient who had their tumor surgically removed. No SNVs are identified, as expected. (c) Three additional representative samples applying retrospective screening to patients analyzed in this study. P2 and P5 samples have confirmed tumor-derived SNVs, while P9 is cancer positive but lacks tumor-derived SNVs. Red points, confirmed tumor-derived SNVs; Green points, background noise.

DETAILED DESCRIPTION OF THE INVENTION

Tumors continually shed DNA into the circulation, where it is readily accessible. Stroun et al. (1987) Eur J Cancer Clin Oncol 23:707-712. Provided herein are methods for the ultrasensitive detection of circulating tumor DNA called CAncer Personalized Profiling by Deep Sequencing (CAPP-Seq). Also provided are methods for creating libraries of recurrently mutated genomic regions used in the CAPP-Seq methods. CAPP-Seq targets hundreds of recurrently mutated genomic regions and simultaneously detects point mutations, insertions/deletions, and rearrangements. CAPP-Seq for non-small cell lung cancer has been demonstrated herein with a design that identified mutations in >95% of tumors. CAPP-Seq accurately quantified circulating tumor DNA from early and advanced stage tumors and identified mutant alleles down to 0.025% with a detection limit of <0.01%. Tumor-derived DNA levels paralleled clinical responses to diverse therapies and CAPP-Seq identified actionable mutations in plasma. Moreover, CAPP-Seq identified significant co-occurrence of ROS1 translocations with U2AF1 splicing factor mutations. Finally, the utility of CAPP-Seq for cancer screening is also described. CAPP-Seq can be routinely applied to noninvasively detect and monitor tumors, thus facilitating personalized cancer therapy.

Methods for Creating Libraries

According to one aspect of the invention, methods for creating a library of recurrently mutated genomic regions are provided. The methods comprise the step of identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer, wherein the library comprises the plurality of genomic regions, the plurality of genomic regions comprises at least 10 different genomic regions, and at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

It should be understood that the term “library” represents a compilation or collection of individual components. Thus, a library of recurrently mutated genomic regions is a compilation or collection of recurrently mutated genomic regions. The libraries of the instant disclosure are useful because they include a large number of potentially mutated genomic regions within a minimal length of genomic sequence. Use of these libraries to identify genetic alternations in specific patient samples is particularly advantageous because the libraries do not need to be optimized on a patient-by-patient basis.

The libraries created according to the instant methods comprise genomic regions that are recurrently mutated in a specific cancer. The identification of these recurrent mutations benefits greatly from the availability of databases such as, for example, The Cancer Genome Atlas (TCGA) and its subsets (http://cancergenome.nih.gov/). Such databases serve as the starting point for identifying the recurrently mutated genomic regions of the instant libraries. The databases also provide a sample of mutations occurring within a given percentage of subjects with a specific cancer.

The libraries created according to the instant methods comprise a plurality of genomic regions, wherein the plurality of genomic regions comprises at least 10 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, at least 500, or even more different genomic regions.

It should be understood that the inclusion of larger numbers of genomic regions generally increases the likelihood that a unique mutation will be identified to distinguish tumor nucleic acid in a subject from the subject's genomic nucleic acid. Including too many genomic regions in the library is not without a cost, however, since the number of genomic regions is directly related to the length of nucleic acids that must be sequenced in the analysis. At the extreme, the entire genome of a tumor sample and a genomic sample could be sequenced, and the resulting sequences could be compared to note any differences. Such a brute force approach is not possible, however, with the vanishingly small quantities of tumor nucleic acid present in a cell-free sample.

The libraries of the instant disclosure address this problem by identifying genomic regions that are recurrently mutated in a particular cancer, and then ranking those regions to maximize the likelihood that the region will include a distinguishing genetic alteration in a particular tumor. The library of recurrently mutated genomic regions, or “selectors”, can be used across an entire population for a given cancer, and does not need to be optimized for each subject.

The term “mutation”, as used herein, refers to a genetic alteration in the genome of an organism, specifically to a change in the nucleotide sequence of the organism. Examples of mutations include point mutations, where a single nucleotide is changed in the genome, and larger-scale changes in the genome, such as rearrangements, insertions, deletions, and amplifications. A recurrent mutation is a mutation that has been identified in more than one individual.

The terms “patient” and “subject” are used interchangeably. These are typically individuals that suffer from the cancer of interest. While the individuals are typically human individuals, the methods and systems of the instant disclosure could also be applied to other species, in particular, to other animal species, for example, livestock animals and pets.

The libraries of recurrently mutated genomic regions disclosed herein are created for a given type of cancer using one or more of the following design phases:

Phase 1: Identify known “driver” genes, i.e., genes that are known to be mutated frequently in the particular cancer.

Phase 2: Maximize patient coverage by selecting genomic regions that contain recurrent mutations in multiple subjects with the particular cancer and ranking those selections to maximize the number of patients identified by mutations in those regions.

Phases 3 and 4: Further ranking of genomic regions containing recurrent mutations by maximizing the “recurrence index”.

Phase 5: Add genomic regions from genes predicted to harbor “driver” mutations in the particular cancer.

Phase 6: Add genomic regions covering fusions and their flanking regions.

It should be understood, however, that the above-described phases of selector design are independent of one another and may be applied separately or in a different order within the methods of library creating and still achieve the desired result.

Application of the above approaches for recurrently mutated genomic regions in non-small cell lung cancer results in the library shown in Table 1. All genomic regions included in the selector, along with their corresponding HUGO gene symbols and genomic coordinates, as well as patient statistics for NSCLC and a variety of other cancers, are shown, organized by selector design phase. The percentage of coverage of NSCLC patients as the Table 1 library was developed is shown in FIG. 1(b). Also shown in the bottom panel of this figure is the cumulative length of genomic regions (in kb) as the library is created according to the above phasing. The three curves in the top panel show percentage coverage of patients with at least one distinguishing mutation between tumor and genomic sequences (≧1 SNVs), at least two distinguishing mutations between tumor and genomic sequences (≧2 SNVs), and at least three distinguishing mutations between tumor and genomic sequences (≧3 SNVs). As is apparent from these graphs, the library created according to the instant methods identifies genomic regions that are highly likely to include identifiable mutations in tumor sequences. This library includes a relatively small total number of genomic regions and thus a relatively short cumulative length of genomic regions and yet provides a high overall coverage of likely mutations in a population. The library does not, therefore, need to be optimized on a patient-by-patient basis. The relatively short cumulative length of genomic regions also means that the analysis of cancer-derived cell-free DNA using these libraries is highly sensitive and allows the sequencing of this DNA to a great depth.

Accordingly, the libraries of recurrently mutated genomic regions created using the instant methods comprise a plurality of genomic regions that are recurrently mutated in a specific cancer, and the plurality of genomic regions comprises at least 10 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 25 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 50 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 100 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 150 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 200 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 500 different genomic regions or even more.

In some embodiments, the plurality of genomic regions comprises at most 5000 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 2000 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 1000 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 500 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 200 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 150 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 100 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 50 different genomic regions or even fewer.

Importantly, the libraries of recurrently mutated genomic regions created according to the instant methods enable the identification of patient- and tumor-specific mutations within the genomic regions in a high percentage of subjects. Specifically, in these libraries, at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer. In some embodiments, at least two mutations within the plurality of genomic regions are present in at least 60% of all subjects with the specific cancer. In specific embodiments, at least three mutations, or even more, within the plurality of genomic regions are present in at least 60% of all subjects with the specific cancer.

In some embodiments, in the libraries of recurrently mutated genomic regions created according to these methods, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, 99.9% or even higher percentages of all subjects with the specific cancer.

In specific embodiments, at least two mutations within the plurality of genomic regions are present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, 99.9% or even higher percentages of all subjects with the specific cancer.

In more specific embodiments, at least three mutations, or even more, within the plurality of genomic regions are present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, 99.9% or even higher percentages of all subjects with the specific cancer.

As previously noted, the cumulative length of genomic regions in the libraries of recurrently mutated genomic regions created according to the instant methods are relatively short, thus minimizing sequencing costs associated with the analytical methods relying on these libraries and maximizing their sensitivity. In some embodiments, the cumulative length of genomic regions is at most 30 megabases (Mb). In some embodiments, the cumulative length of genomic regions is at most 20 Mb, 10 Mb, 5 Mb, 2 Mb, or 1 Mb. In some embodiments, the cumulative length of genomic regions is at most 500 kilobases (kb), 200 kb, 100 kb, 50 kb, 20 kb, 10 kb, or even fewer.

In some embodiments, the library of recurrently mutated genomic regions created according to the instant methods comprises the genomic regions displayed in Table 1, or a subset of those genomic regions.

The instant methods include the step of identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer. As noted elsewhere, the libraries are particularly useful in methods for analyzing cancer-specific gene alterations in solid tumors, because those alterations can be detected in cell-free nucleic acids present in blood samples. Accordingly, the libraries created according to these methods include genomic regions that are recurrently mutated in a solid tumor. In some embodiments, the solid tumor is a carcinoma. In specific embodiments, the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma. The methods are also applicable to genomic regions that are recurrently mutated in other cancers, however. Specifically, the other cancer may be, for example, a sarcoma, a leukemia, a lymphoma, or a myeloma.

Systems

The methods for creating a library of recurrently mutated genomic regions, as disclosed herein, are typically implemented by a programmed computer system. Therefore, according to another aspect, the instant disclosure provides computer systems for creating a library of recurrently mutated genomic regions. Such systems comprise at least one processor and a non-transitory computer-readable medium storing computer-executable instructions that, when executed by the at least one processor, cause the computer system to carry out the above-described methods for creating a library.

Methods for Analyzing Genetic Alterations

The libraries created according to the above-described methods are useful in the analysis of genetic alterations, particularly in comparing tumor and genomic sequences in a patient with cancer. As shown in FIG. 2, a tissue biopsy sample from the patient may be used to discover mutations in the tumor by sequencing the genomic regions of the selector library in tumor and genomic nucleic acid samples and comparing the results. Because the selector libraries are designed to identify mutations in tumors from a large percentage of all patients, it is not necessary to optimize the library for each patient.

Accordingly, in this aspect of the invention, methods are provided for analyzing a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer;

comparing the plurality of tumor nucleic acid sequences to the plurality of genomic nucleic acid sequences to identify a patient-specific genetic alteration in the tumor nucleic acid sample.

In these methods, the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer; the plurality of genomic regions comprises at least 10 different genomic regions; and at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer. More specifically, the plurality of target regions may correspond to the plurality of genomic regions found in the libraries of recurrently mutated genomic regions created using the above-described methods. In other words, in various embodiments, the number of different genomic regions in the plurality of genomic regions, the number of mutations within the plurality of genomic regions that are present in a specific percentage of all subjects with the specific cancer, the percentage of all subjects with the specific cancer with at least one mutation within the plurality of genomic regions, the specific composition of the plurality of genomic regions, the types of cancer, and the cumulative length of the plurality of genomic regions have the values disclosed above for the methods of creating a library.

In some embodiments, the plurality of target regions used in the methods for analyzing a cancer-specific genetic alteration in a subject corresponds to the library of recurrently mutated genomic regions displayed in Table 1, or a subset of those genomic regions.

It should be understood that the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may occur in a single step or in separate steps. For example, it may be possible to obtain a single tissue sample from a patient, for example from a biopsy sample, that includes both tumor nucleic acids and genomic nucleic acids. It is also within the scope of this step to obtain the tumor nucleic acid sample and the genomic nucleic acid sample from the subject in separate samples, in separate tissues, or even at separate times.

The step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may also include the process of extracting a biological fluid or tissue sample from the subject with the specific cancer. These particular steps are well understood by those of ordinary skill in the medical arts, particularly by those working in the medical laboratory arts.

The step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may additionally include procedures to improve the yield or recovery of the nucleic acids in the sample. For example, the step may include laboratory procedures to separate the nucleic acids from other cellular components and contaminants that may be present in the biological fluid or tissue sample. As noted, such steps may improve the yield and/or may facilitate the sequencing reactions.

It should also be understood that the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may be performed by a commercial laboratory that does not even have direct contact with the subject. For example, the commercial laboratory may obtain the nucleic acid samples from a hospital or other clinical facility where, for example, a biopsy or other procedure is performed to obtain tissue from a subject. The commercial laboratory may thus carry out all the steps of the instantly-disclosed methods at the request of, or under the instructions of, the facility where the subject is being treated or diagnosed.

Methods for Screening

The methods of the instant invention may also be applied to the detection of cancer in a patient, where there is no prior knowledge of the presence of a tumor in the patient. Accordingly, in this aspect of the invention are provided methods for screening a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a cell-free nucleic acid sample from a subject;

sequencing a plurality of target regions in the cell-free sample to obtain a plurality of cell-free nucleic acid sequences; and

identifying a cancer-specific genetic alteration in the cell-free sample.

In these methods, the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer. In some embodiments, the plurality of genomic regions comprises at least 10 different genomic regions, and at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer. More specifically, the plurality of target regions may correspond to the plurality of genomic regions found in the libraries of recurrently mutated genomic regions created using the above-described methods. In other words, in various embodiments, the number of different genomic regions in the plurality of genomic regions, the number of mutations within the plurality of genomic regions that are present in a specific percentage of all subjects with the specific cancer, the percentage of all subjects with the specific cancer with at least one mutation within the plurality of genomic regions, the specific composition of the plurality of genomic regions, the types of cancer, and the cumulative length of the plurality of genomic regions have the values disclosed above for the methods of creating a library.

In some embodiments, the plurality of target regions used in the methods for screening a cancer-specific genetic alteration in a subject corresponds to the library of recurrently mutated genomic regions displayed in Table 1, or a subset of those genomic regions.

It will be readily apparent to one of ordinary skill in the relevant arts that other suitable modifications and adaptations to the methods and applications described herein may be made without departing from the scope of the invention or any embodiment thereof. Having now described the present invention in detail, the same will be more clearly understood by reference to the following Examples, which are included herewith for purposes of illustration only and are not intended to be limiting of the invention.

Examples
Noninvasive and Ultrasensitive Quantitation of Circulating Tumor DNA by Hybrid Capture and Deep Sequencing

To overcome the limitations of prior methods, an ultrasensitive and specific strategy for analysis of cancer-derived cfDNA (CAncer Personalized Profiling by Deep Sequencing (CAPP-Seq)) that can simultaneously detect single nucleotide variants (SNVs), insertions/deletions (indels), and rearrangements, without the need for patient-specific optimization has been developed. CAPP-Seq employs an adaptable “selector” to enrich recurrently mutated regions in the cancer of interest using a custom library of biotinylated DNA oligonucleotides (Ng et al. (2010) Nat. Genetics 42:30-35). To use CAPP-Seq for monitoring circulating tumor DNA, this selector is typically applied first to matched tumor and normal genomic DNA to identify a patient's cancer-specific genetic aberrations and then directly to cfDNA in order to quantify these mutations (FIG. 1a and FIG. 2).

The design of an NSCLC CAPP-Seq selector is shown in FIG. 1(b). Phase 1: Genomic regions harboring known/suspected driver mutations in NSCLC. Phases 2-4: Addition of exons containing recurrent SNVs using WES data from lung adenocarcinomas and squamous cell carcinomas from TCGA (N=407). Regions were selected iteratively to maximize the number of mutations per tumor while minimizing selector size. Recurrence index=total unique patients with mutations covered per kb of exon. Phases 5-6: Exons of predicted NSCLC drivers (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181) and introns/exons harboring breakpoints in rearrangements involving ALK, ROS1, and RET were added. Bottom: increase of selector length during each design phase. FIG. 1(c) shows an analysis of the number of SNVs per lung adenocarcinoma covered by the NSCLC CAPP-Seq selector in the TCGA WES cohort (Training; N=229) and an independent lung adenocarcinoma WES data set (Validation; N=183) (Imielinski et al. (2012) Cell 150:1107-1120). Results are compared to selectors randomly sampled from the exome (P<1.0×10⁻⁶) for the difference between random selectors and the NSCLC CAPP-Seq selector). FIG. 1(d) shows the number of SNVs per patient identified by the NSCLC CAPP-Seq selector in WES data from three adenocarcinomas from TCGA, colon (COAD), rectal (READ), and endometrioid (UCEC) cancers. FIGS. 1(e) and (f) show quality parameters from a representative CAPP-Seq analysis of plasma cfDNA, including length distribution of sequenced cfDNA fragments 1(e), and depth of sequencing coverage across all genomic regions in the selector 1(f). FIG. 1(g) illustrates the variation in sequencing depth across cfDNA samples from 4 patients. The envelope above and below the solid line represents s.e.m. FIG. 2 illustrates the CAPP-Seq computational pipeline. See Detailed Methods section for details.

For the initial implementation of CAPP-Seq we focused on NSCLC, although our approach is generalizable to any cancer for which a comprehensive list of recurrent mutations has been identified. We employed a multi-phase approach to design a NSCLC-specific selector, aiming to identify genomic regions recurrently mutated in this disease (FIG. 1b, Table 1, and Methods). We began by including exons covering recurrent mutations in potential driver genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Forbes et al. (2010) Nucleic Acids Res. 38:D652-657) as well as other sources (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181) (e.g. KRAS, EGFR, TP53). Next, using whole exome sequencing (WES) data from 407 NSCLC patients profiled by The Cancer Genome Atlas (TCGA), an iterative algorithm was applied to maximize the number of mutations per patient while minimizing selector size. The approach relied on a recurrence index that identified known driver mutations as well as uncharacterized genes that are frequently mutated and are therefore likely to be involved in NSCLC pathogenesis (FIG. 3 and Table 1).

TABLE 1

Recurrently mutated genomic regions in NSCLC.

Coverage (unique LUAD

Selector design
Genomic region
& SCC patients; n = 407)

Regions
Genes
Length

Start
End
Length
Patients
Patients
No. patients

Design phase
covered
covered
(bp)
Gene
Chr
(bp)
(bp)
(bp)
covered
gained
per exon
RI

Known drivers
1
1
130
AKT1
chr14
105246424
105246553
130
1
1
1
7.7

Known drivers
2
2
250
BRAF
chr7
140453074
140453192
120
9
8
8
66.7

Known drivers
3
2
369
BRAF
chr7
140481375
140481493
119
16
7
7
58.8

Known drivers
4
3
677
CDKN2A
chr9
21970900
21971207
308
46
30
30
97.4

Known drivers
5
3
1029
CDKN2A
chr9
21974475
21974826
352
53
7
7
19.9

Known drivers
6
4
1258
CTNNB1
chr3
41266016
41266244
229
57
4
6
26.2

Known drivers
7
5
1382
EGFR
chr7
55241613
55241736
124
58
1
3
24.2

Known drivers
8
5
1482
EGFR
chr7
55242414
55242513
100
65
7
8
80.0

Known drivers
9
5
1669
EGFR
chr7
55248985
55249171
187
69
4
5
26.7

Known drivers
10
5
1826
EGFR
chr7
55259411
55259567
157
81
12
14
89.2

Known drivers
11
6
1926
ERBB2
chr17
37880164
37880263
100
81
0
0
0.0

Known drivers
12
6
2113
ERBB2
chr17
37880978
37881164
187
85
4
4
21.4

Known drivers
13
7
2293
HRAS
chr11
533765
533944
180
87
2
3
16.7

Known drivers
14
7
2405
HRAS
chr11
534211
534322
112
90
3
3
26.8

Known drivers
15
8
2583
KEAP1
chr19
10599867
10600044
178
93
3
3
16.9

Known drivers
16
8
2790
KEAP1
chr19
10600323
10600529
207
108
15
15
72.5

Known drivers
17
8
3477
KEAP1
chr19
10602252
10602938
687
128
20
25
36.4

Known drivers
18
8
4117
KEAP1
chr19
10610070
10610709
640
141
13
18
28.1

Known drivers
19
8
4285
KEAP1
chr19
10597327
10597494
168
143
2
2
11.9

Known drivers
20
9
4465
KRAS
chr12
25380167
25380346
180
147
4
4
22.2

Known drivers
21
9
4577
KRAS
chr12
25398207
25398318
112
191
44
56
500.0

Known drivers
22
10
4789
MEK1
chr15
66727364
66727575
212
191
0
0
0.0

Known drivers
23
11
4931
MET
chr7
116411902
116412043
142
193
2
2
14.1

Known drivers
24
12
5199
NFE2L2
chr2
178098732
178098998
268
212
19
31
115.7

Known drivers
25
13
5417
NOTCH1
chr9
139396723
139396940
218
212
0
1
4.6

Known drivers
26
13
5850
NOTCH1
chr9
139399124
139399556
433
212
0
0
0.0

Known drivers
27
13
7339
NOTCH1
chr9
139390522
139392010
1489
214
2
3
2.0

Known drivers
28
13
7489
NOTCH1
chr9
139397633
139397782
150
214
0
0
0.0

Known drivers
29
14
7669
NRAS
chr1
115256420
115256599
180
217
3
5
27.8

Known drivers
30
14
7781
NRAS
chr1
115258670
115258781
112
217
0
0
0.0

Known drivers
31
15
7907
PIK3CA
chr3
178935997
178936122
126
225
8
19
150.8

Known drivers
32
15
8179
PIK3CA
chr3
178951881
178952152
272
228
3
4
14.7

Known drivers
33
16
8259
PTEN
chr10
89624226
89624305
80
229
1
1
12.5

Known drivers
34
16
8345
PTEN
chr10
89653781
89653866
86
229
0
0
0.0

Known drivers
35
16
8391
PTEN
chr10
89685269
89685314
46
231
2
3
65.2

Known drivers
36
16
8436
PTEN
chr10
89690802
89690846
45
231
0
0
0.0

Known drivers
37
16
8676
PTEN
chr10
89692769
89693008
240
234
3
5
20.8

Known drivers
38
16
8819
PTEN
chr10
89711874
89712016
143
235
1
3
21.0

Known drivers
39
16
8987
PTEN
chr10
89717609
89717776
168
238
3
6
35.7

Known drivers
40
16
9213
PTEN
chr10
89720650
89720875
226
239
1
3
13.3

Known drivers
41
17
9504
STK11
chr19
1206912
1207202
291
240
1
4
13.7

Known drivers
42
17
9589
STK11
chr19
1218415
1218498
85
241
1
2
23.5

Known drivers
43
17
9680
STK11
chr19
1219322
1219412
91
242
1
1
11.0

Known drivers
44
17
9814
STK11
chr19
1220371
1220504
134
242
0
4
29.9

Known drivers
45
17
9952
STK11
chr19
1220579
1220716
138
242
0
4
29.0

Known drivers
46
17
10081
STK11
chr19
1221211
1221339
129
242
0
4
31.0

Known drivers
47
17
10140
STK11
chr19
1221947
1222005
59
242
0
0
0.0

Known drivers
48
17
10329
STK11
chr19
1222983
1223171
189
242
0
0
0.0

Known drivers
49
17
10524
STK11
chr19
1226452
1226646
195
242
0
0
0.0

Known drivers
50
18
10662
TP53
chr17
7577018
7577155
138
264
22
56
405.8

Known drivers
51
18
10773
TP53
chr17
7577498
7577608
111
286
22
50
450.5

Known drivers
52
18
10887
TP53
chr17
7578176
7578286
114
300
14
39
342.1

Known drivers
53
18
11167
TP53
chr17
7579311
7579590
280
312
12
31
110.7

Known drivers
54
18
11352
TP53
chr17
7578370
7578554
185
340
28
68
367.6

Max coverage
55
19
11472
REG1B
chr2
79313937
79314056
120
341
1
10
83.3

Max coverage
56
20
11527
TPTE
chr21
10970008
10970062
55
343
2
4
72.7

Max coverage
57
21
11641
CSMD3
chr8
113246593
113246706
114
345
2
8
70.2

Max coverage
58
21
11749
TP53
chr17
7573926
7574033
108
348
3
9
83.3

Max coverage
59
22
11861
FAM135B
chr8
139151228
139151339
112
350
2
8
71.4

Max coverage
60
23
11950
U2AF1
chr21
44524424
44524512
89
351
1
5
56.2

Max coverage
61
24
12084
THSD7A
chr7
11501637
11501770
134
352
1
9
67.2

Max coverage
62
25
12257
MLL3
chr7
151962122
151962294
173
353
1
11
63.6

Max coverage
63
26
12339
EYA4
chr6
133849862
133849943
82
354
1
5
61.0

Max coverage
64
27
12505
HCN1
chr5
45267190
45267355
166
355
1
9
54.2

Max coverage
65
28
12590
AKR1B10
chr7
134222945
134223029
85
357
2
5
58.8

Max coverage
66
29
12692
SLC6A5
chr11
20668379
20668480
102
358
1
5
49.0

Max coverage
67
30
12801
DPP10
chr2
116525872
116525980
109
360
2
6
55.0

Max coverage
68
31
12894
SCN7A
chr2
167327124
167327216
93
361
1
4
43.0

Max coverage
69
32
12988
SNTG1
chr8
51621445
51621538
94
362
1
5
53.2

Max coverage
70
33
13093
VPS13A
chr9
79946925
79947029
105
363
1
5
47.6

Max coverage
71
34
13240
IL1RAPL1
chrX
29938065
29938211
147
364
1
7
47.6

Max coverage
72
35
13408
CTNNA2
chr2
80085138
80085305
168
365
1
8
47.6

Max coverage
73
35
13598
CSMD3
chr8
113323206
113323395
190
366
1
9
47.4

Max coverage
74
36
13705
FAM5C
chr1
190203501
190203607
107
367
1
5
46.7

Max coverage
75
37
13813
CACNA1E
chr1
181708282
161708389
108
368
1
4
37.0

Max coverage
76
38
14528
KRTAP5-5
chr11
1651070
1651784
715
371
3
31
43.4

Max coverage
77
39
14650
PDE1C
chr7
31864480
31864601
122
372
1
5
41.0

Max coverage
78
40
14772
RYR2
chr1
237808626
237808747
122
373
1
5
41.0

Max coverage
79
41
14896
NRXN1
chr2
50733632
50733755
124
374
1
5
40.3

Max coverage
80
42
15021
COL19A1
chr6
70637800
70637924
125
375
1
5
40.0

Max coverage
81
42
15349
CSMD3
chr8
113697634
113697961
328
376
1
13
39.6

Max coverage
82
43
15551
LRP1B
chr2
141665445
141665646
202
377
1
7
34.7

Max coverage
83
44
15709
GKN2
chr2
69173435
69173592
158
378
1
6
38.0

Max coverage
84
45
16031
CD5L
chr1
157805624
157805945
322
379
1
12
37.3

Max coverage
85
46
16250
SPTA1
chr1
158627266
158627484
219
380
1
8
36.5

Max coverage
86
47
16392
DHX9
chr1
182812428
182812569
142
381
1
5
35.2

Max coverage
87
48
16535
ADAMTS20
chr12
43858393
43858535
143
382
1
5
35.0

Max coverage
88
49
16707
NLRP4
chr19
56382192
56382363
172
382
0
6
34.9

Max coverage
89
50
17199
CDH18
chr5
19473334
19473825
492
384
2
17
34.6

Max coverage
90
51
17344
MYH2
chr17
10450791
10450935
145
386
2
5
34.5

RI ≧ 30
91
52
18281
OR5L2
chr11
55594694
55595630
937
386
0
30
32.0

RI ≧ 30
92
53
19317
OR4A15
chr11
55135359
55136394
1036
386
0
32
30.9

RI ≧ 30
93
54
20245
OR6F1
chr1
247875130
247876057
928
386
0
26
28.0

RI ≧ 30
94
55
21176
OR4C6
chr11
55432642
55433572
931
387
1
27
29.0

RI ≧ 30
95
56
22224
OR2T4
chr1
248524882
248525929
1048
387
0
33
31.5

RI ≧ 30
96
56
23342
FAM5C
chr1
190067147
190068264
1118
387
0
35
31.3

RI ≧ 30
97
57
23598
PSG2
chr19
43575851
43576106
256
387
0
9
35.2

RI ≧ 30
98
58
23797
ITM2A
chrX
78618438
78618636
199
387
0
6
30.2

RI ≧ 30
99
59
24062
TNN
chr1
175092535
175092799
265
387
0
12
45.3

RI ≧ 30
100
60
24206
GATA3
chr10
8105958
8106101
144
387
0
3
20.8

RI ≧ 30
101
60
24369
HCN1
chr5
45461947
45462109
183
387
0
5
30.7

RI ≧ 30
102
61
24503
OCA2
chr15
28211835
28211968
134
387
0
6
44.8

RI ≧ 30
103
61
24686
CTNNA2
chr2
80816428
80816610
183
387
0
5
27.3

RI ≧ 30
104
62
24863
CNTN5
chr11
99715818
99715994
177
387
0
5
33.9

RI ≧ 30
105
63
25755
POM121L12
chr7
53103364
53104255
892
387
0
28
31.4

RI ≧ 30
106
64
25945
LRRC7
chr1
70225887
70226076
190
387
0
5
26.3

RI ≧ 30
107
65
26165
CNTNAP5
chr2
125530375
125530594
220
387
0
8
36.4

RI ≧ 30
108
66
26313
SLC4A10
chr2
162751188
162751335
148
387
0
5
33.8

RI ≧ 30
109
67
26412
SETD2
chr3
47142947
47143045
99
387
0
3
30.3

RI ≧ 30
110
68
26744
GFRAL
chr6
55216050
55216381
332
387
0
10
30.1

RI ≧ 30
111
69
26837
SORCS3
chr10
106927015
106927107
93
388
1
3
32.3

RI ≧ 30
112
70
27359
POTEG
chr14
19553416
19553937
522
388
0
17
32.6

RI ≧ 30
113
71
27489
F9
chrX
138630521
138630650
130
389
1
4
30.8

RI ≧ 30
114
72
27583
SLC26A3
chr7
107416896
107416989
94
389
0
2
21.3

RI ≧ 30
115
73
27753
UNC5D
chr8
35806044
35606213
170
389
0
5
29.4

RI ≧ 30
116
74
27860
PDE4DIP
chr1
144882775
144882881
107
389
0
4
37.4

RI ≧ 30
117
75
27943
MRPL1
chr4
78870950
78871032
83
389
0
4
48.2

RI ≧ 30
118
76
28013
COL25A1
chr4
109784474
109784543
70
389
0
3
42.9

RI ≧ 30
119
76
28161
SPTA1
chr1
158650372
158650519
148
389
0
5
33.8

RI ≧ 30
120
77
28309
TNR
chr1
175331798
175331945
148
369
0
5
33.8

RI ≧ 30
121
78
28491
GALNT13
chr2
155157921
155158102
182
389
0
6
33.0

RI ≧ 30
122
79
28618
EIF3E
chr8
109241298
109241424
127
389
0
5
39.4

RI ≧ 30
123
80
28691
SLC5A1
chr22
32445929
32446001
73
389
0
4
54.8

RI ≧ 30
124
81
28757
COASY
chr17
40717000
40717065
66
389
0
3
45.5

RI ≧ 30
125
82
28930
TBX15
chr1
119467268
119467440
173
389
0
7
40.5

RI ≧ 30
126
83
29099
PYHIN1
chr1
158908869
158909037
169
389
0
6
35.5

RI ≧ 30
127
84
29164
PSG5
chr19
43690493
43690557
65
389
0
3
46.2

RI ≧ 30
128
85
29262
BTRC
chr10
103290993
103291090
98
389
0
2
20.4

RI ≧ 30
129
86
29394
MDGA2
chr14
47324226
47324357
132
389
0
4
30.3

RI ≧ 30
130
87
29454
GUCY1A3
chr4
156629387
156629446
60
389
0
2
33.3

RI ≧ 30
131
88
29570
HGF
chr7
81386504
81386619
116
389
0
4
34.5

RI ≧ 30
132
89
29656
TIMD4
chr5
156346467
156346552
86
389
0
3
34.9

RI ≧ 30
133
90
29844
AK5
chr1
77752625
77752812
188
389
0
6
31.9

RI ≧ 30
134
91
30077
ODZ3
chr4
183245173
183245405
233
389
0
7
30.0

RI ≧ 30
135
92
30177
COL5A2
chr2
189927897
189927996
100
389
0
3
30.0

RI ≧ 30
136
93
30299
NTM
chr11
132180005
132180126
122
389
0
4
32.8

RI ≧ 30
137
94
30426
LTBP1
chr2
33500031
33500157
127
389
0
5
39.4

RI ≧ 30
138
95
30587
PRSS1
chr7
142458405
142458565
161
389
0
5
31.1

RI ≧ 30
139
95
30794
CDKN2A
chr9
21971001
21971207
207
389
0
26
125.6

RI ≧ 30
140
96
30922
CNGB3
chr8
87738758
87738885
128
389
0
4
31.3

RI ≧ 30
141
97
31049
SI
chr3
164777689
164777815
127
389
0
4
31.5

RI ≧ 30
142
97
31135
SI
chr3
164767578
164767663
86
389
0
4
46.5

RI ≧ 30
143
98
31320
TMEM132D
chr12
129822176
129822362
185
389
0
6
32.4

RI ≧ 30
144
99
31429
ASTN1
chr1
176998769
176998877
109
389
0
3
27.5

RI ≧ 30
145
100
31571
SAGE1
chrX
134987410
134987551
142
389
0
6
42.3

RI ≧ 30
146
100
31709
THSD7A
chr7
11464322
11464459
138
389
0
5
36.2

RI ≧ 30
147
101
31907
ADAMTS12
chr5
33683963
33684160
198
389
0
6
30.3

RI ≧ 30
148
101
32090
NRXN1
chr2
50463926
50464108
183
389
0
8
43.7

RI ≧ 30
149
101
32294
CSMD3
chr8
113562899
113563102
204
389
0
7
34.3

RI ≧ 30
150
101
32414
CSMD3
chr8
113364644
113364763
120
389
0
5
41.7

RI ≧ 30
151
102
32504
EPB41L4B
chr9
112018415
112018504
90
389
0
2
22.2

RI ≧ 30
152
103
32687
POLR3B
chr12
106820974
106821136
163
389
0
4
24.5

RI ≧ 30
153
104
32873
ATP10B
chr5
160097469
180097674
208
389
0
7
34.0

RI ≧ 30
154
105
33001
CSMD1
chr8
3165216
3165343
128
389
0
4
31.3

RI ≧ 30
155
106
33164
FBN2
chr5
127648325
127648487
163
389
0
5
30.7

RI ≧ 30
156
107
33252
EXOC5
chr14
57684699
57684786
88
389
0
2
22.7

RI ≧ 30
157
108
33315
ANKRD30A
chr10
37440987
37441049
63
389
0
3
47.6

RI ≧ 30
158
109
33414
TRIML1
chr4
189065189
189065287
99
389
0
4
40.4

RI ≧ 30
159
109
33538
SPTA1
chr1
158631076
158631199
124
389
0
4
32.3

RI ≧ 30
160
110
33699
POLDIP2
chr17
26684313
26684473
161
389
0
5
31.1

RI ≧ 30
161
111
33863
KLHL1
chr13
70314525
70314688
164
389
0
5
30.5

RI ≧ 20
162
112
34454
TRIM58
chr1
248039201
248039791
591
389
0
14
23.7

RI ≧ 20
163
113
34563
GRIA3
chrX
122537262
122537370
109
389
0
3
27.5

RI ≧ 20
164
114
34777
CNOT4
chr7
135048605
135048818
214
389
0
5
23.4

RI ≧ 20
165
115
34947
NAV3
chr12
78582388
78582557
170
389
0
4
23.5

RI ≧ 20
166
115
35975
NAV3
chr12
78400198
78401225
1028
389
0
22
21.4

RI ≧ 20
167
116
36354
TRPC5
chrX
111195270
111195648
379
389
0
8
21.1

RI ≧ 20
168
117
36480
LRRC2
chr3
46592956
46593081
126
389
0
3
23.8

RI ≧ 20
169
118
36726
ADAMTS16
chr5
5239793
5240038
246
389
0
6
24.4

RI ≧ 20
170
119
36869
ACER2
chr9
19424697
19424839
143
389
0
3
21.0

RI ≧ 20
171
120
37103
AMOT
chrX
112024113
112024346
234
389
0
5
21.4

RI ≧ 20
172
121
37215
OBP2A
chr9
138439716
138439827
112
389
0
3
26.8

Predicted drivers
173
122
38109
INHBA
chr7
41729247
41730140
894
389
0
17
19.0

Predicted drivers
174
122
38498
INHBA
chr7
41739584
41739972
389
389
0
3
7.7

Predicted drivers
175
123
38605
EPHA5
chr4
66189831
66189937
107
389
0
3
28.0

Predicted drivers
176
123
38762
EPHA5
chr4
66197690
66197846
157
389
0
2
12.7

Predicted drivers
177
123
38957
EPHA5
chr4
66201649
66201843
195
389
0
2
10.3

Predicted drivers
178
123
39108
EPHA5
chr4
66213771
66213921
151
389
0
3
19.9

Predicted drivers
179
123
39319
EPHA5
chr4
66217106
66217316
211
389
0
4
19.0

Predicted drivers
180
123
39420
EPHA5
chr4
66218740
66218840
101
389
0
2
19.8

Predicted drivers
181
123
39607
EPHA5
chr4
66230734
66230920
187
389
0
3
16.0

Predicted drivers
182
123
39734
EPHA5
chr4
66231649
66231775
127
389
0
3
23.6

Predicted drivers
183
123
39835
EPHA5
chr4
66233058
66233158
101
389
0
2
19.8

Predicted drivers
184
123
39936
EPHA5
chr4
66242698
66242798
101
389
0
0
0.0

Predicted drivers
185
123
40040
EPHA5
chr4
66270091
66270194
104
389
0
2
19.2

Predicted drivers
186
123
40201
EPHA5
chr4
66280001
66280161
161
389
0
1
6.2

Predicted drivers
187
123
40327
EPHA5
chr4
66286158
66286283
126
389
0
0
0.0

Predicted drivers
188
123
40664
EPHA5
chr4
66356094
66356430
337
389
0
5
14.8

Predicted drivers
189
123
40821
EPHA5
chr4
66361105
66361261
157
389
0
1
6.4

Predicted drivers
190
123
41486
EPHA5
chr4
66467358
86468022
665
389
0
6
9.0

Predicted drivers
191
123
41588
EPHA5
chr4
66509062
66509163
102
389
0
0
0.0

Predicted drivers
192
123
41770
EPHA5
chr4
66535279
66535460
182
389
0
1
5.5

Predicted drivers
193
124
41871
EPHA3
chr3
89156892
89156992
101
389
0
0
0.0

Predicted drivers
194
124
41973
EPHA3
chr3
89176340
89176441
102
389
0
2
19.6

Predicted drivers
195
124
42635
EPHA3
chr3
89259009
89259670
662
389
0
6
9.1

Predicted drivers
196
124
42792
EPHA3
chr3
89390065
89390221
157
389
0
4
25.5

Predicted drivers
197
124
43129
EPHA3
chr3
89390904
89391240
337
389
0
3
8.9

Predicted drivers
198
124
43255
EPHA3
chr3
89444986
89445111
126
389
0
2
15.9

Predicted drivers
199
124
43445
EPHA3
chr3
89448467
89448656
190
389
0
1
5.3

Predicted drivers
200
124
43549
EPHA3
chr3
89456418
89456521
104
389
0
0
0.0

Predicted drivers
201
124
43651
EPHA3
chr3
89457198
89457299
102
389
0
0
0.0

Predicted drivers
202
124
43778
EPHA3
chr3
89462290
89462416
127
389
0
3
23.6

Predicted drivers
203
124
43965
EPHA3
chr3
89468354
89468540
187
389
0
1
5.3

Predicted drivers
204
124
44066
EPHA3
chr3
89478236
89478336
101
389
0
0
0.0

Predicted drivers
205
124
44277
EPHA3
chr3
89480299
89480509
211
389
0
4
19.0

Predicted drivers
206
124
44428
EPHA3
chr3
89498374
89498524
151
389
0
1
6.6

Predicted drivers
207
124
44623
EPHA3
chr3
89499326
89499520
185
389
0
2
10.3

Predicted drivers
208
124
44780
EPHA3
chr3
89521613
89521769
157
389
0
3
19.1

Predicted drivers
209
124
44887
EPHA3
chr3
89528546
89528652
107
389
0
1
9.3

Predicted drivers
210
125
44989
PTPRD
chr9
8317857
8317958
102
389
0
2
19.6

Predicted drivers
211
125
45126
PTPRD
chr9
8319830
8319966
137
389
0
0
0.0

Predicted drivers
212
125
45282
PTPRD
chr9
8331581
8331736
156
389
0
1
6.4

Predicted drivers
213
125
45409
PTPRD
chr9
8338921
8339047
127
389
0
2
15.7

Predicted drivers
214
125
45537
PTPRD
chr9
8340342
8340469
128
389
0
1
7.8

Predicted drivers
215
125
45717
PTPRD
chr9
8341089
8341268
180
389
0
0
0.0

Predicted drivers
216
125
46004
PTPRD
chr9
8341692
8341978
287
389
0
2
7.0

Predicted drivers
217
125
46160
PTPRD
chr9
8375935
8376090
156
389
0
1
6.4

Predicted drivers
218
125
46281
PTPRD
chr9
8376606
8376726
121
389
0
1
8.3

Predicted drivers
219
125
46458
PTPRD
chr9
8389231
8389407
177
389
0
0
0.0

Predicted drivers
220
125
46583
PTPRD
chr9
8404536
8404660
125
389
0
0
0.0

Predicted drivers
221
125
46684
PTPRD
chr9
8436590
8436690
101
389
0
1
9.9

Predicted drivers
222
125
46785
PTPRD
chr9
8437168
8437268
101
389
0
0
0.0

Predicted drivers
223
125
46899
PTPRD
chr9
8449724
8449837
114
389
0
3
26.3

Predicted drivers
224
125
47001
PTPRD
chr9
8454536
8454637
102
389
0
0
0.0

Predicted drivers
225
125
47163
PTPRD
chr9
8460410
8460571
162
389
0
5
18.5

Predicted drivers
226
125
47374
PTPRD
chr9
8465465
8465675
211
389
0
6
28.4

Predicted drivers
227
125
47476
PTPRD
chr9
8470989
8471090
102
389
0
1
9.8

Predicted drivers
228
125
47737
PTPRD
chr9
8484118
8484378
261
389
0
5
19.2

Predicted drivers
229
125
47839
PTPRD
chr9
8485226
8485327
102
389
0
0
0.0

Predicted drivers
230
125
48428
PTPRD
chr9
8485761
8436349
589
389
0
4
6.8

Predicted drivers
231
125
48547
PTPRD
chr9
8492861
8492979
119
389
0
1
8.4

Predicted drivers
232
125
48649
PTPRD
chr9
8497204
8497305
102
389
0
1
9.8

Predicted drivers
233
125
48844
PTPRD
chr9
8499646
8499840
195
389
0
2
10.3

Predicted drivers
234
125
49151
PTPRD
chr9
8500753
8501059
307
389
0
3
9.8

Predicted drivers
235
125
49297
PTPRD
chr9
8504260
8504405
146
389
0
1
6.8

Predicted drivers
236
125
49432
PTPRD
chr9
8507300
8507434
135
389
0
1
7.4

Predicted drivers
237
125
50015
PTPRD
chr9
8517847
8518429
583
389
0
9
15.4

Predicted drivers
238
125
50286
PTPRD
chr9
8521276
8521546
271
389
0
5
18.5

Predicted drivers
239
125
50387
PTPRD
chr9
8523468
8523568
101
389
0
1
9.9

Predicted drivers
240
125
50499
PTPRD
chr9
8524924
8525035
112
389
0
1
8.9

Predicted drivers
241
125
50600
PTPRD
chr9
8526585
8526685
101
389
0
0
0.0

Predicted drivers
242
125
50702
PTPRD
chr9
8527298
8527399
102
389
0
2
19.6

Predicted drivers
243
125
50892
PTPRD
chr9
8528590
8528779
190
389
0
4
21.1

Predicted drivers
244
125
51035
PTPRD
chr9
8633316
8633458
143
389
0
2
13.6

Predicted drivers
245
125
51182
PTPRD
chr9
8636698
8636644
147
389
0
2
13.6

Predicted drivers
246
125
51283
PTPRD
chr9
8733761
8733861
101
389
0
0
0.0

Predicted drivers
247
126
51507
KDR
chr4
55946107
55946330
224
389
0
1
4.5

Predicted drivers
248
126
51608
KDR
chr4
55948115
55948215
101
389
0
0
0.0

Predicted drivers
249
126
51709
KDR
chr4
55948702
55948802
101
389
0
2
19.8

Predicted drivers
250
126
51862
KDR
chr4
55953773
55953925
153
389
0
3
19.6

Predicted drivers
251
126
51969
KDR
chr4
55955034
55955140
107
389
0
2
18.7

Predicted drivers
252
126
52070
KDR
chr4
55955540
55955640
101
389
0
0
0.0

Predicted drivers
253
126
52183
KDR
chr4
55955857
55955969
113
389
0
1
8.8

Predicted drivers
254
126
52307
KDR
chr4
55956122
55956245
124
389
0
0
0.0

Predicted drivers
255
126
52408
KDR
chr4
55958782
55958882
101
389
0
2
19.8

Predicted drivers
256
128
52563
KDR
chr4
55960968
55961122
155
389
0
2
12.9

Predicted drivers
257
126
52665
KDR
chr4
55961737
55961838
102
389
0
2
19.6

Predicted drivers
258
126
52780
KDR
chr4
55962395
55962509
115
389
0
1
8.7

Predicted drivers
259
126
52886
KDR
chr4
55963828
55963933
106
389
0
3
28.3

Predicted drivers
260
126
53023
KDR
chr4
55964303
55964439
137
389
0
0
0.0

Predicted drivers
261
126
53131
KDR
chr4
55964863
55964970
108
389
0
2
18.5

Predicted drivers
262
126
53264
KDR
chr4
55968063
55968195
133
389
0
1
7.5

Predicted drivers
263
126
53412
KDR
chr4
55968528
55968675
148
389
0
2
13.5

Predicted drivers
264
126
53755
KDR
chr4
55970809
55971151
343
389
0
5
14.6

Predicted drivers
265
126
53865
KDR
chr4
55971998
55972107
110
389
0
2
18.2

Predicted drivers
266
126
53990
KDR
chr4
55972853
55972977
125
389
0
1
8.0

Predicted drivers
267
126
54148
KDR
chr4
55973903
55974060
158
389
0
2
12.7

Predicted drivers
268
126
54313
KDR
chr4
55976569
55976733
165
389
0
2
12.1

Predicted drivers
269
126
54429
KDR
chr4
55976820
55976935
116
389
0
1
8.6

Predicted drivers
270
126
54608
KDR
chr4
55979470
55979648
179
389
0
2
11.2

Predicted drivers
271
128
54749
KDR
chr4
55980292
55980432
141
389
0
0
0.0

Predicted drivers
272
126
54919
KDR
chr4
55981040
55981209
170
389
0
1
5.9

Predicted drivers
273
126
55051
KDR
chr4
55981447
55981578
132
389
0
4
30.3

Predicted drivers
274
126
55249
KDR
chr4
55984770
55984967
198
389
0
0
0.0

Predicted drivers
275
126
55350
KDR
chr4
55987260
55987360
101
389
0
1
9.9

Predicted drivers
276
126
55452
KDR
chr4
55991376
55991477
102
389
0
0
0.0

Predicted drivers
277
127
55639
NTRK3
chr15
88420165
88420351
187
389
0
0
0.0

Predicted drivers
278
127
55799
NTRK3
chr15
88423500
88423659
160
389
0
1
6.3

Predicted drivers
279
127
55900
NTRK3
chr15
88428895
88428995
101
389
0
0
0.0

Predicted drivers
280
127
56145
NTRK3
chr15
88472421
88472665
245
389
0
1
4.1

Predicted drivers
281
127
56319
NTRK3
chr15
88476242
88476415
174
389
0
4
23.0

Predicted drivers
282
127
56451
NTRK3
chr15
88483853
88483984
132
389
0
1
7.6

Predicted drivers
283
127
56571
NTRK3
chr15
88522575
88522694
120
389
0
0
0.0

Predicted drivers
284
127
56707
NTRK3
chr15
88524456
88524591
136
389
0
0
0.0

Predicted drivers
285
127
56897
NTRK3
chr15
88576087
88576276
190
389
0
2
10.5

Predicted drivers
286
127
57001
NTRK3
chr15
88669501
88669604
104
389
0
3
28.8

Predicted drivers
287
127
57103
NTRK3
chr15
88670374
88670475
102
389
0
0
0.0

Predicted drivers
288
127
57204
NTRK3
chr15
88671903
88672003
101
389
0
0
0.0

Predicted drivers
289
127
57502
NTRK3
chr15
88678331
88878628
298
389
0
7
23.5

Predicted drivers
290
127
57645
NTRK3
chr15
88679129
88679271
143
389
0
1
7.0

Predicted drivers
291
127
57789
NTRK3
chr15
88679697
88679840
144
389
0
2
13.9

Predicted drivers
292
127
57948
NTRK3
chr15
88680634
88680792
159
389
0
0
0.0

Predicted drivers
293
127
58050
NTRK3
chr15
88690549
88690650
102
389
0
0
0.0

Predicted drivers
294
127
58151
NTRK3
chr15
88726634
88726734
101
389
0
1
9.9

Predicted drivers
295
127
58253
NTRK3
chr15
88727442
88727543
102
389
0
1
9.8

Predicted drivers
296
126
58391
RB1
chr13
48878048
48878185
138
389
0
0
0.0

Predicted drivers
297
128
56519
RB1
chr13
48881415
48881542
128
389
0
3
23.4

Predicted drivers
298
128
58636
RB1
chr13
48916734
48916850
117
389
0
1
8.5

Predicted drivers
299
128
58757
RB1
chr13
48919215
48919335
121
389
0
1
8.3

Predicted drivers
300
128
58859
RB1
chr13
48921929
48922030
102
389
0
0
0.0

Predicted drivers
301
128
58960
RB1
chr13
48923075
48923175
101
389
0
0
0.0

Predicted drivers
302
128
59072
RB1
chr13
48934152
48934283
112
389
0
2
17.9

Predicted drivers
303
128
59216
RB1
chr13
48936950
48937093
144
389
0
0
0.0

Predicted drivers
304
128
59317
RB1
chr13
48939018
48939118
101
389
0
0
0.0

Predicted drivers
305
128
59428
RB1
chr13
48941629
48941739
111
389
0
3
27.0

Predicted drivers
306
128
59529
RB1
chr13
48942651
48942751
101
389
0
0
0.0

Predicted drivers
307
128
59630
RB1
chr13
48947534
48947634
101
389
0
2
19.8

Predicted drivers
308
128
59748
RB1
chr13
48951053
48951170
118
389
0
0
0.0

Predicted drivers
309
128
59850
RB1
chr13
48953707
48953808
102
389
0
2
19.6

Predicted drivers
310
128
59951
RB1
chr13
48954154
48954254
101
389
0
0
0.0

Predicted drivers
311
128
60053
RB1
chr13
48954288
48954389
102
389
0
1
9.8

Predicted drivers
312
128
60251
RB1
chr13
48955382
48955579
198
389
0
0
0.0

Predicted drivers
313
128
60371
RB1
chr13
49027128
49027247
120
389
0
0
0.0

Predicted drivers
314
128
60518
RB1
chr13
49030339
49030485
147
389
0
3
20.4

Predicted drivers
315
128
60665
RB1
chr13
49033823
49033969
147
389
0
1
6.8

Predicted drivers
316
128
60771
RB1
chr13
49037866
49037971
106
389
0
0
0.0

Predicted drivers
317
128
60886
RB1
chr13
49039133
49039247
115
389
0
1
8.7

Predicted drivers
318
128
61051
RB1
chr13
49039340
49039504
165
389
0
2
12.1

Predicted drivers
319
128
61153
RB1
chr13
49047460
49047561
102
389
0
0
0.0

Predicted drivers
320
128
61297
RB1
chr13
49050836
49050979
144
389
0
0
0.0

Predicted drivers
321
128
61398
RB1
chr13
49051465
49051565
101
389
0
0
0.0

Predicted drivers
322
128
61499
RB1
chr13
49054120
49054220
101
389
0
0
0.0

Predicted drivers
323
129
61946
ERBB4
chr2
212248339
212248785
447
389
0
3
6.7

Predicted drivers
324
129
62245
ERBB4
chr2
212251577
212251875
299
389
0
3
10.0

Predicted drivers
325
129
62346
ERBB4
chr2
212252643
212252743
101
389
0
0
0.0

Predicted drivers
326
129
62518
ERBB4
chr2
212285165
212285336
172
389
0
2
11.6

Predicted drivers
327
129
62619
ERBB4
chr2
212286730
212286830
101
389
0
1
9.9

Predicted drivers
328
129
62787
ERBB4
chr2
212288879
212289026
148
389
0
1
6.8

Predicted drivers
329
129
62868
ERBB4
chr2
212293120
212293220
101
389
0
0
0.0

Predicted drivers
330
129
63025
ERBB4
chr2
212295669
212295825
157
389
0
2
12.7

Predicted drivers
331
129
63212
ERBB4
chr2
212426627
212426813
187
389
0
1
5.3

Predicted drivers
332
129
63312
ERBB4
chr2
212483901
212484000
100
389
0
0
0.0

Predicted drivers
333
129
63436
ERBB4
chr2
212488646
212488769
124
389
0
0
0.0

Predicted drivers
334
129
63570
ERBB4
chr2
212495186
212495319
134
389
0
0
0.0

Predicted drivers
335
129
63672
ERBB4
chr2
212522465
212522566
102
389
0
2
19.6

Predicted drivers
336
129
63828
ERBB4
chr2
212530047
212530202
156
389
0
1
6.4

Predicted drivers
337
129
63929
ERBB4
chr2
212537885
212537985
101
389
0
1
9.9

Predicted drivers
338
129
64063
ERBB4
chr2
212543776
212543909
134
389
0
1
7.5

Predicted drivers
339
129
64264
ERBB4
chr2
212566691
212566891
201
389
0
2
10.0

Predicted drivers
340
129
64366
ERBB4
chr2
212568823
212568924
102
389
0
0
0.0

Predicted drivers
341
129
64467
ERBB4
chr2
212570029
212570129
101
389
0
1
9.8

Predicted drivers
342
129
64595
ERBB4
chr2
212576774
212576901
128
389
0
1
7.8

Predicted drivers
343
129
64710
ERBB4
chr2
212578259
212578373
115
389
0
1
8.7

Predicted drivers
344
129
64853
ERBB4
chr2
212587117
212587259
143
389
0
0
0.0

Predicted drivers
345
129
64973
ERBB4
chr2
212589800
212589919
120
389
0
2
16.7

Predicted drivers
348
129
65074
ERBB4
chr2
212615346
212615446
101
389
0
0
0.0

Predicted drivers
347
129
65210
ERBB4
chr2
212652749
212652884
136
389
0
1
7.4

Predicted drivers
348
129
65398
ERBB4
chr2
212812154
212812341
188
390
1
4
21.3

Predicted drivers
349
129
65551
ERBB4
chr2
212989476
212989628
153
390
0
2
13.1

Predicted drivers
350
129
65652
ERBB4
chr2
213403163
213403263
101
390
0
0
0.0

Predicted drivers
351
130
65754
NTRK1
chr1
156785575
156785676
102
390
0
0
0.0

Predicted drivers
352
130
65868
NTRK1
chr1
156811872
156811985
114
390
0
0
0.0

Predicted drivers
353
130
66061
NTRK1
chr1
156830726
156830938
213
390
0
0
0.0

Predicted drivers
354
130
66183
NTRK1
chr1
156834132
156834233
102
390
0
1
9.8

Predicted drivers
355
130
66284
NTRK1
chr1
156834505
156834605
101
390
0
0
0.0

Predicted drivers
356
130
66386
NTRK1
chr1
156836685
156836786
102
390
0
0
0.0

Predicted drivers
357
130
66533
NTRK1
chr1
156837895
156838041
147
390
0
1
6.8

Predicted drivers
358
130
66677
NTRK1
chr1
156838296
156838439
144
390
0
0
0.0

Predicted drivers
359
130
66811
NTRK1
chr1
156841414
156841547
134
390
0
0
0.0

Predicted drivers
360
130
67139
NTRK1
chr1
156843424
156843751
328
390
0
1
3.0

Predicted drivers
361
130
67240
NTRK1
chr1
156844133
156844233
101
390
0
0
0.0

Predicted drivers
362
130
67341
NTRK1
chr1
156844340
156844440
101
390
0
0
0.0

Predicted drivers
363
130
67445
NTRK1
chr1
156844697
156844800
104
390
0
0
0.0

Predicted drivers
364
130
67593
NTRK1
chr1
156845311
156845458
148
390
0
2
13.5

Predicted drivers
365
130
67725
NTRK1
chr1
156845871
156846002
132
390
0
3
22.7

Predicted drivers
366
130
67899
NTRK1
chr1
156846191
156846364
174
390
0
2
11.5

Predicted drivers
367
130
68141
NTRK1
chr1
156848913
156849154
242
390
0
4
16.5

Predicted drivers
368
130
68301
NTRK1
chr1
156849790
156849949
160
390
0
0
0.0

Predicted drivers
369
130
68488
NTRK1
chr1
156851248
156851434
187
390
0
0
0.0

Predicted drivers
370
131
68589
NF1
chr17
29422307
29422407
101
390
0
0
0.0

Predicted drivers
371
131
68734
NF1
chr17
29483000
29483144
145
390
0
0
0.0

Predicted drivers
372
131
68835
NF1
chr17
29486019
29486119
101
390
0
1
9.9

Predicted drivers
373
131
69027
NF1
chr17
29490203
29490394
192
390
0
1
5.2

Predicted drivers
374
131
89135
NF1
chr17
29496908
29497015
108
390
0
1
9.3

Predicted drivers
375
131
69236
NF1
chr17
29508423
29508523
101
390
0
0
0.0

Predicted drivers
376
131
69337
NF1
chr17
29508715
29508815
101
390
0
0
0.0

Predicted drivers
377
131
69496
NF1
chr17
29509525
29509683
159
390
0
1
6.3

Predicted drivers
378
131
69671
NF1
chr17
29527439
29527613
175
390
0
3
17.1

Predicted drivers
379
131
69795
NF1
chr17
29528054
29528177
124
390
0
0
0.0

Predicted drivers
380
131
69897
NF1
chr17
29528415
29528516
102
390
0
0
0.0

Predicted drivers
381
131
70030
NF1
chr17
29533257
29533389
133
390
0
0
0.0

Predicted drivers
382
131
70166
NF1
chr17
29541468
29541603
136
390
0
1
7.4

Predicted drivers
383
131
70281
NF1
chr17
29546022
29546136
115
390
0
1
8.7

Predicted drivers
384
131
70423
NF1
chr17
29548867
29549008
142
390
0
1
7.0

Predicted drivers
385
131
70548
NF1
chr17
29550461
29550585
125
390
0
0
0.0

Predicted drivers
386
131
70705
NF1
chr17
29552112
29552268
157
390
0
0
0.0

Predicted drivers
387
131
70956
NF1
chr17
29553452
29553702
251
390
0
1
4.0

Predicted drivers
386
131
71057
NF1
chr17
29554222
29554322
101
390
0
0
0.0

Predicted drivers
389
131
71158
NF1
chr17
29554532
29554632
101
390
0
1
9.9

Predicted drivers
390
131
71600
NF1
chr17
29556042
29556483
442
390
0
2
4.5

Predicted drivers
391
131
71741
NF1
chr17
29556852
29556992
141
390
0
1
7.1

Predicted drivers
392
131
71865
NF1
chr17
29557277
29557400
124
390
0
1
8.1

Predicted drivers
393
131
71966
NF1
chr17
29557851
29557951
101
390
0
0
0.0

Predicted drivers
394
131
72084
NF1
chr17
29559090
29559207
118
390
0
0
0.0

Predicted drivers
395
131
72267
NF1
chr17
29559717
29559899
183
390
0
2
10.9

Predicted drivers
396
131
72480
NF1
chr17
29560019
29560231
213
390
0
1
4.7

Predicted drivers
397
131
72643
NF1
chr17
29562628
29562790
163
390
0
2
12.3

Predicted drivers
398
131
72748
NF1
chr17
29562935
29563039
105
390
0
0
0.0

Predicted drivers
399
131
72885
NF1
chr17
29576001
29576137
137
390
0
0
0.0

Predicted drivers
400
131
72987
NF1
chr17
29579936
29580037
102
390
0
0
0.0

Predicted drivers
401
131
73147
NF1
chr17
29585361
29585520
160
390
0
0
0.0

Predicted drivers
402
131
73248
NF1
chr17
29588048
29586148
101
390
0
1
9.9

Predicted drivers
403
131
73396
NF1
chr17
29587386
29587533
148
390
0
2
13.5

Predicted drivers
404
131
73544
NF1
chr17
29588728
29588875
148
390
0
0
0.0

Predicted drivers
405
131
73656
NF1
chr17
29592246
29592357
112
390
0
0
0.0

Predicted drivers
406
131
74090
NF1
chr17
29652837
29653270
434
390
0
2
4.6

Predicted drivers
407
131
74432
NF1
chr17
29654516
29654857
342
390
0
3
8.8

Predicted drivers
408
131
74636
NF1
chr17
29657313
29657516
204
390
0
2
9.8

Predicted drivers
409
131
74831
NF1
chr17
29661855
29662049
195
390
0
3
15.4

Predicted drivers
410
131
74973
NF1
chr17
29663350
29683491
142
390
0
2
14.1

Predicted drivers
411
131
75254
NF1
chr17
29663652
29663932
281
390
0
0
0.0

Predicted drivers
412
131
75470
NF1
chr17
29664385
29664600
216
390
0
1
4.6

Predicted drivers
413
131
75571
NF1
chr17
29664817
29664917
101
390
0
1
9.9

Predicted drivers
414
131
75687
NF1
chr17
29665042
29665157
116
390
0
0
0.0

Predicted drivers
415
131
75790
NF1
chr17
29665721
29665823
103
390
0
2
19.4

Predicted drivers
416
131
75932
NF1
chr17
29667522
29667663
142
390
0
1
7.0

Predicted drivers
417
131
76060
NF1
chr17
29670026
29670153
128
390
0
2
15.6

Predicted drivers
418
131
76193
NF1
chr17
29676137
29676269
133
390
0
2
15.0

Predicted drivers
419
131
76330
NF1
chr17
29677200
29677336
137
390
0
0
0.0

Predicted drivers
420
131
76489
NF1
chr17
29679274
29679432
159
390
0
2
12.6

Predicted drivers
421
131
76613
NF1
chr17
29683477
29683600
124
390
0
0
0.0

Predicted drivers
422
131
76745
NF1
chr17
29683977
29684108
132
390
0
1
7.6

Predicted drivers
423
131
76847
NF1
chr17
29684286
29684387
102
390
0
1
9.8

Predicted drivers
424
131
76991
NF1
chr17
29685497
29685640
144
390
0
1
6.9

Predicted drivers
425
131
77093
NF1
chr17
29685959
29686060
102
390
0
0
0.0

Predicted drivers
426
131
77311
NF1
chr17
29687504
29687721
216
390
0
0
0.0

Predicted drivers
427
131
77455
NF1
chr17
29701030
29701173
144
390
0
1
6.9

Predicted drivers
428
132
77621
APC
chr5
112043414
112043579
166
390
0
0
0.0

Predicted drivers
429
132
77757
APC
chr5
112090587
112090722
136
390
0
0
0.0

Predicted drivers
430
132
77859
APC
chr5
112102014
112102115
102
390
0
1
9.8

Predicted drivers
431
132
78062
APC
chr5
112102885
112103087
203
390
0
2
9.9

Predicted drivers
432
132
78172
APC
chr5
112111325
112111434
110
390
0
1
9.1

Predicted drivers
433
132
78287
APC
chr5
112116486
112116600
115
390
0
0
0.0

Predicted drivers
434
132
78388
APC
chr5
112128134
112128234
101
390
0
0
0.0

Predicted drivers
435
132
78494
APC
chr5
112136975
112137080
106
390
0
0
0.0

Predicted drivers
436
132
78594
APC
chr5
112151191
112151290
100
390
0
0
0.0

Predicted drivers
437
132
78974
APC
chr5
112154662
112155041
380
390
0
1
2.6

Predicted drivers
438
132
79075
APC
chr5
112157590
112157690
101
390
0
0
0.0

Predicted drivers
439
132
79216
APC
chr5
112162804
112162944
141
390
0
0
0.0

Predicted drivers
440
132
79317
APC
chr5
112163614
112163714
101
390
0
0
0.0

Predicted drivers
441
132
79435
APC
chr5
112164552
112164669
118
390
0
2
16.9

Predicted drivers
442
132
79651
APC
chr5
112170647
112170862
216
390
0
0
0.0

Predicted drivers
443
132
86226
APC
chr5
112173249
112179823
6575
391
1
23
3.5

Predicted drivers
444
133
86327
ATM
chr11
108098337
108096437
101
391
0
0
0.0

Predicted drivers
445
133
86441
ATM
chr11
108098502
108098615
114
391
0
1
8.8

Predicted drivers
446
133
86588
ATM
chr11
108099904
108100050
147
391
0
0
0.0

Predicted drivers
447
133
86754
ATM
chr11
108106396
108106561
168
391
0
0
0.0

Predicted drivers
448
133
86921
ATM
chr11
108114679
108114845
167
391
0
0
0.0

Predicted drivers
449
133
87161
ATM
chr11
108115514
108115753
240
391
0
1
4.2

Predicted drivers
450
133
87326
ATM
chr11
108117690
108117854
165
391
0
0
0.0

Predicted drivers
451
133
87497
ATM
chr11
108119659
108119829
171
391
0
1
5.8

Predicted drivers
452
133
87870
ATM
chr11
108121427
108121799
373
391
0
0
0.0

Predicted drivers
453
133
88066
ATM
chr11
108122563
108122758
196
391
0
0
0.0

Predicted drivers
454
133
88187
ATM
chr11
108123541
108123641
101
391
0
1
9.9

Predicted drivers
455
133
88394
ATM
chr11
108124540
108124766
227
391
0
0
0.0

Predicted drivers
456
133
88521
ATM
chr11
108126941
108127067
127
391
0
1
7.9

Predicted drivers
457
133
88648
ATM
chr11
108128207
108128333
127
391
0
0
0.0

Predicted drivers
458
133
88749
ATM
chr11
108129707
108129807
101
391
0
0
0.0

Predicted drivers
459
133
88922
ATM
chr11
108137897
108138069
173
391
0
1
5.8

Predicted drivers
460
133
89123
ATM
chr11
108139136
108139336
201
391
0
0
0.0

Predicted drivers
461
133
89225
ATM
chr11
108141781
108141882
102
391
0
0
0.0

Predicted drivers
462
133
89382
ATM
chr11
108141977
108142133
157
391
0
0
0.0

Predicted drivers
463
133
89483
ATM
chr11
108143246
108143346
101
391
0
0
0.0

Predicted drivers
464
133
89615
ATM
chr11
108143448
108143579
132
391
0
1
7.6

Predicted drivers
465
133
89734
ATM
chr11
108150217
108150335
119
391
0
0
0.0

Predicted drivers
466
133
89909
ATM
chr11
108151721
108151895
175
391
0
0
0.0

Predicted drivers
467
133
90080
ATM
chr11
108153436
108153606
171
391
0
2
11.7

Predicted drivers
468
133
90328
ATM
chr11
108154953
108155200
248
391
0
1
4.0

Predicted drivers
469
133
90445
ATM
chr11
108158326
108158442
117
391
0
0
0.0

Predicted drivers
470
133
90573
ATM
chr11
108159703
108159830
128
391
0
1
7.8

Predicted drivers
471
133
90774
ATM
chr11
108160328
108160528
201
391
0
1
5.0

Predicted drivers
472
133
90950
ATM
chr11
108163345
108163520
176
391
0
0
0.0

Predicted drivers
473
133
91116
ATM
chr11
108164039
108164204
166
391
0
0
0.0

Predicted drivers
474
133
91250
ATM
chr11
108165653
108165786
134
391
0
0
0.0

Predicted drivers
475
133
91351
ATM
chr11
108168011
108168111
101
391
0
1
9.9

Predicted drivers
476
133
91524
ATM
chr11
108170440
108170612
173
391
0
1
5.8

Predicted drivers
477
133
91667
ATM
chr11
108172374
108172516
143
391
0
0
0.0

Predicted drivers
478
133
91845
ATM
chr11
108173579
108173756
178
391
0
0
0.0

Predicted drivers
479
133
92024
ATM
chr11
108175401
108175579
179
391
0
2
11.2

Predicted drivers
480
133
92125
ATM
chr11
108178617
108178717
101
391
0
0
0.0

Predicted drivers
481
133
92282
ATM
chr11
108180886
108181042
157
391
0
0
0.0

Predicted drivers
482
133
92383
ATM
chr11
108183131
108183231
101
391
0
1
9.9

Predicted drivers
483
133
92485
ATM
chr11
108186543
108186644
102
391
0
0
0.0

Predicted drivers
484
133
92589
ATM
chr11
108186737
108186840
104
391
0
1
9.6

Predicted drivers
485
133
92739
ATM
chr11
108188099
108188248
150
391
0
0
0.0

Predicted drivers
486
133
92845
ATM
chr11
108190680
108190785
106
391
0
0
0.0

Predicted drivers
487
133
92966
ATM
chr11
108192027
108192147
121
391
0
0
0.0

Predicted drivers
488
133
93202
ATM
chr11
108196036
108196271
236
391
0
1
4.2

Predicted drivers
489
133
93371
ATM
chr11
108196784
108196952
169
391
0
0
0.0

Predicted drivers
490
133
93486
ATM
chr11
108198371
108198485
115
391
0
0
0.0

Predicted drivers
491
133
93705
ATM
chr11
108199747
108199965
218
391
0
1
4.6

Predicted drivers
492
133
93914
ATM
chr11
108200940
108201148
209
391
0
0
0.0

Predicted drivers
493
133
94029
ATM
chr11
108202170
108202284
115
391
0
0
0.0

Predicted drivers
494
133
94189
ATM
chr11
108202605
108202764
160
391
0
0
0.0

Predicted drivers
495
133
94329
ATM
chr11
106203488
108203627
140
391
0
0
0.0

Predicted drivers
496
133
94431
ATM
chr11
108204603
108204704
102
391
0
1
9.8

Predicted drivers
497
133
94573
ATM
chr11
108205695
108205836
142
391
0
3
21.1

Predicted drivers
498
133
94691
ATM
chr11
108206571
108206688
118
391
0
1
8.5

Predicted drivers
499
133
94842
ATM
chr11
108213948
108214098
151
391
0
0
0.0

Predicted drivers
500
133
95009
ATM
chr11
108216469
108216635
167
391
0
0
0.0

Predicted drivers
501
133
95111
ATM
chr11
108217998
108218099
102
391
0
1
9.8

Predicted drivers
502
133
95227
ATM
chr11
108224492
108224607
116
391
0
1
8.6

Predicted drivers
503
133
95328
ATM
chr11
108225519
108225619
101
391
0
0
0.0

Predicted drivers
504
133
95466
ATM
chr11
108235808
108235945
138
391
0
1
7.2

Predicted drivers
505
133
95651
ATM
chr11
108236051
108236235
185
391
0
2
10.8

Predicted drivers
506
134
95753
FGFR4
chr5
176516598
176516699
102
391
0
0
0.0

Predicted drivers
507
134
960718
FGFR4
chr5
176517390
176517654
265
391
0
1
3.8

Predicted drivers
508
134
96120
FGFR4
chr5
176517735
176517836
102
391
0
1
9.8

Predicted drivers
509
134
96288
FGFR4
chr5
176517938
176518105
168
391
0
0
0.0

Predicted drivers
510
134
96413
FGFR4
chr5
176518685
176518809
125
391
0
0
0.0

Predicted drivers
511
134
96605
FGFR4
chr5
176519321
176519512
192
391
0
0
0.0

Predicted drivers
512
134
96745
FGFR4
chr5
176519646
176519785
140
391
0
0
0.0

Predicted drivers
513
134
97160
FGFR4
chr5
176520138
176520552
415
391
0
2
4.8

Predicted drivers
514
134
97283
FGFR4
chr5
176520654
176520776
123
391
0
0
0.0

Predicted drivers
515
134
97395
FGFR4
chr5
176522330
176522441
112
391
0
1
8.9

Predicted drivers
516
134
97587
FGFR4
chr5
176522533
176522724
192
391
0
0
0.0

Predicted drivers
517
134
97711
FGFR4
chr5
176523057
176523180
124
391
0
0
0.0

Predicted drivers
518
134
97813
FGFR4
chr5
176523272
176523373
102
391
0
0
0.0

Predicted drivers
519
134
97952
FGFR4
chr5
176523604
176523742
139
391
0
0
0.0

Predicted drivers
520
134
98059
FGFR4
chr5
176524292
176524398
107
391
0
0
0.0

Predicted drivers
521
134
98210
FGFR4
chr5
176524527
176524677
151
391
0
0
0.0

Add fusions
522
135
100435
ALK
chr2
29446207
29448431
2225
—
—
—
—

Add fusions
523
136
117908
ROS1
chr6
117641031
117658503
17473
—
—
—
—

Add fusions
524
137
123433
RET
chr10
43606655
43612179
5525
—
—
—
—

Add fusions
525
138
123876
POGFRA
chr4
55140698
55141140
443
—
—
—
—

Add fusions
526
139
125384
FGFR1
chr8
38275746
38277253
1508
—
—
—
—

Coverage (unique LUAD

& SCC patients; n = 407)
Coverage (all LUAD & SCC samples; n = 419)

No. pa-
% pa-
% pa-
% pa-

No.

No. sam-
% sam-
% sam-
% sam-

tients
tients ≧1
tients ≧2
tients ≧3
Samples
Samples
samples

ples
ples ≧1
ples ≧2
ples ≧3

Design phase
w/1 SNV
SNV
SNVs
SNVs
covered
gained
per exon
RI
w/1 SNV
SNV
SNVs
SNVs

Known drivers
1
0.25
0.00
0.00
1
1
1
7.7
1
0.24
0.00
0.00

Known drivers
9
2.21
0.00
0.00
11
10
10
83.3
11
2.63
0.00
0.00

Known drivers
16
3.93
0.00
0.00
18
7
7
58.8
18
4.30
0.00
0.00

Known drivers
46
11.30
0.00
0.00
48
30
30
97.4
48
11.46
0.00
0.00

Known drivers
53
13.02
0.00
0.00
55
7
7
19.9
55
13.13
0.00
0.00

Known drivers
55
14.00
0.49
0.00
59
4
6
26.2
57
14.08
0.48
0.00

Known drivers
54
14.25
0.98
0.00
60
1
3
24.2
56
14.32
0.95
0.00

Known drivers
60
15.97
1.23
0.00
67
7
8
80.0
62
15.99
1.19
0.00

Known drivers
64
16.95
1.23
0.25
71
4
5
26.7
66
16.95
1.19
0.24

Known drivers
74
19.90
1.72
0.25
84
13
15
95.5
77
20.05
1.67
0.24

Known drivers
74
19.90
1.72
0.25
84
0
0
0.0
77
20.05
1.67
0.24

Known drivers
78
20.88
1.72
0.25
88
4
4
21.4
81
21.00
1.67
0.24

Known drivers
79
21.38
1.87
0.25
90
2
3
16.7
82
21.48
1.91
0.24

Known drivers
82
22.11
1.97
0.25
93
3
3
26.8
85
22.20
1.91
0.24

Known drivers
85
22.85
1.97
0.25
96
3
3
16.3
88
22.91
1.91
0.24

Known drivers
100
26.54
1.97
0.25
111
15
15
72.5
103
26.49
1.91
0.24

Known drivers
117
31.45
2.70
0.74
131
20
25
36.4
120
31.26
2.63
0.72

Known drivers
126
34.64
3.69
0.98
145
14
19
29.7
130
34.81
3.58
0.95

Known drivers
128
35.14
3.69
0.98
147
2
2
11.9
132
35.08
3.58
0.95

Known drivers
132
36.12
3.69
0.98
151
4
4
22.2
136
36.04
3.58
0.95

Known drivers
164
46.93
6.63
0.98
196
45
57
508.9
169
46.78
6.44
0.95

Known drivers
164
46.93
6.63
0.98
196
0
0
0.0
169
46.78
6.44
0.95

Known drivers
166
47.42
6.63
0.98
198
2
2
14.1
171
47.26
6.44
0.95

Known drivers
174
52.09
9.34
0.98
217
19
31
115.7
179
51.79
9.07
0.95

Known drivers
173
52.09
9.58
0.98
217
0
1
4.6
178
51.79
9.31
0.95

Known drivers
173
52.09
9.58
0.98
217
0
0
0.0
178
51.79
9.31
0.95

Known drivers
174
52.58
9.83
0.98
219
2
3
2.0
179
52.27
9.55
0.95

Known drivers
174
52.58
9.83
0.98
219
0
0
0.0
179
52.27
9.55
0.95

Known drivers
175
53.32
10.32
0.98
222
3
5
27.8
180
52.98
10.02
0.95

Known drivers
175
53.32
10.32
0.98
222
0
0
0.0
180
52.98
10.02
0.95

Known drivers
174
55.28
12.53
1.47
230
8
19
150.8
179
54.89
12.17
1.43

Known drivers
176
56.02
12.78
1.47
233
3
4
14.7
181
55.61
12.41
1.43

Known drivers
177
56.27
12.78
1.47
234
1
1
12.5
182
55.85
12.41
1.43

Known drivers
177
56.27
12.78
1.47
234
0
0
0.0
182
55.85
12.41
1.43

Known drivers
178
56.76
13.02
1.47
236
2
3
65.2
183
56.32
12.65
1.43

Known drivers
178
56.76
13.02
1.47
236
0
0
0.0
183
56.32
12.65
1.43

Known drivers
179
57.49
13.51
1.47
239
3
5
20.8
184
57.04
13.13
1.43

Known drivers
179
57.74
13.76
1.72
240
1
3
21.0
184
57.28
13.37
1.67

Known drivers
179
58.48
14.50
1.72
243
3
6
35.7
184
58.00
14.08
1.67

Known drivers
179
58.72
14.74
1.97
244
1
3
13.3
184
58.23
14.32
1.91

Known drivers
179
58.97
14.99
2.46
245
1
4
13.7
184
58.47
14.56
2.39

Known drivers
179
59.21
15.23
2.46
246
1
2
23.5
184
58.71
14.80
2.39

Known drivers
180
59.46
15.23
2.46
247
1
1
11.0
185
58.95
14.80
2.39

Known drivers
177
59.46
15.97
2.70
247
0
4
29.9
182
58.95
15.51
2.63

Known drivers
174
59.46
16.71
2.95
247
0
4
29.0
179
58.95
16.23
2.86

Known drivers
171
59.46
17.44
3.19
247
0
4
31.0
176
58.95
16.95
3.10

Known drivers
171
59.46
17.44
3.19
247
0
0
0.0
178
58.95
16.95
3.10

Known drivers
171
59.46
17.44
3.19
247
0
0
0.0
176
58.95
16.95
3.10

Known drivers
171
59.46
17.44
3.19
247
0
0
0.0
176
58.95
16.95
3.10

Known drivers
168
64.86
23.59
5.16
269
22
58
420.3
171
64.20
23.39
5.01

Known drivers
167
70.27
29.24
6.14
292
23
51
459.5
171
69.69
28.88
5.97

Known drivers
164
73.71
33.42
8.11
306
14
39
342.1
168
73.03
32.94
7.88

Known drivers
164
76.66
36.36
9.58
319
13
32
114.3
169
76.13
35.80
9.31

Known drivers
167
83.54
42.51
12.04
347
28
69
373.0
171
82.62
42.00
11.69

Max coverage
163
83.78
43.73
12.78
349
2
11
91.7
168
83.29
43.20
12.41

Max coverage
165
84.28
43.73
13.02
352
3
5
90.9
171
84.01
43.20
12.65

Max coverage
164
84.77
44.47
13.76
354
2
10
87.7
169
84.49
44.15
13.60

Max coverage
164
85.50
45.21
14.50
357
3
9
83.3
169
85.20
44.87
14.32

Max coverage
162
86.00
46.19
14.99
360
3
9
80.4
168
85.92
45.82
14.80

Max coverage
163
86.24
46.19
15.72
362
2
6
67.4
170
86.40
45.82
15.51

Max coverage
161
86.49
46.93
16.46
363
1
9
67.2
168
86.63
46.54
16.23

Max coverage
160
86.73
47.42
17.69
364
1
11
63.6
167
86.37
47.02
17.42

Max coverage
161
86.98
47.42
18.43
365
1
5
61.0
168
87.11
47.02
18.14

Max coverage
161
87.22
47.67
19.16
366
1
10
60.2
168
87.35
47.26
18.85

Max coverage
163
87.71
47.67
19.66
368
2
5
58.8
170
87.83
47.26
19.33

Max coverage
163
87.96
47.91
20.15
369
1
6
58.8
170
88.07
47.49
20.05

Max coverage
164
88.45
48.16
20.39
371
2
6
55.0
171
88.54
47.73
20.29

Max coverage
164
88.70
48.40
20.64
372
1
5
53.8
170
88.78
48.21
20.53

Max coverage
163
88.94
48.89
20.64
373
1
5
53.2
169
89.02
48.69
20.53

Max coverage
162
89.19
49.39
20.88
374
1
5
47.6
168
89.26
49.16
20.76

Max coverage
161
89.43
49.88
21.87
375
1
7
47.6
167
89.50
49.64
21.72

Max coverage
161
89.68
50.12
22.85
376
1
8
47.6
167
89.74
49.88
22.67

Max coverage
160
89.93
50.61
23.83
377
1
9
47.4
166
89.98
50.36
23.63

Max coverage
159
90.17
51.11
24.32
378
1
5
46.7
165
90.21
50.84
24.11

Max coverage
158
90.42
51.60
24.57
379
1
5
46.3
163
90.45
51.55
24.34

Max coverage
152
91.15
53.81
26.78
382
3
32
44.8
157
91.17
53.70
26.73

Max coverage
153
91.40
53.81
27.03
383
1
5
41.0
158
91.41
53.70
28.97

Max coverage
153
91.65
54.05
27.03
384
1
5
41.0
158
91.85
53.94
26.97

Max coverage
152
91.89
54.55
27.52
385
1
5
40.3
157
91.89
54.42
27.45

Max coverage
152
92.14
54.79
28.01
386
1
5
40.0
157
92.12
54.65
27.92

Max coverage
151
92.38
55.28
28.99
387
1
13
39.6
156
92.36
55.13
28.88

Max coverage
150
92.63
55.77
29.48
388
1
8
39.6
155
92.60
55.61
29.59

Max coverage
149
92.87
56.27
29.98
389
1
6
38.0
154
92.84
56.09
30.07

Max coverage
147
93.12
57.00
30.96
390
1
12
37.3
152
93.08
56.80
31.03

Max coverage
144
93.37
57.99
30.96
391
1
8
36.5
149
93.32
57.76
31.03

Max coverage
143
93.61
58.48
31.20
392
1
5
35.2
148
93.56
58.23
31.26

Max coverage
144
93.86
58.48
31.20
393
1
5
35.0
149
93.79
58.23
31.26

Max coverage
143
93.86
58.72
31.94
394
1
6
34.9
150
94.03
58.23
31.98

Max coverage
140
94.35
59.95
32.68
396
2
17
34.6
147
94.51
59.43
32.70

Max coverage
142
94.84
59.95
32.92
398
2
5
34.5
149
94.99
59.43
32.94

RI ≧ 30
134
94.84
61.92
35.63
398
0
30
32.0
141
94.99
61.34
35.56

RI ≧ 30
126
94.84
63.88
37.59
398
0
34
32.8
133
94.99
63.25
37.71

RI ≧ 30
121
94.84
65.11
38.33
398
0
28
30.2
127
94.99
64.68
38.42

RI ≧ 30
117
95.09
66.34
39.80
399
1
28
30.1
123
95.23
65.87
39.86

RI ≧ 30
113
95.09
67.32
42.01
399
0
33
31.5
119
95.23
66.83
42.00

RI ≧ 30
109
95.09
68.30
43.24
399
0
36
32.2
115
95.23
67.78
43.20

RI ≧ 30
105
95.09
69.29
43.24
399
0
9
35.2
111
95.23
68.74
43.20

RI ≧ 30
102
95.09
70.02
43.49
399
0
6
30.2
108
95.23
69.45
43.44

RI ≧ 30
99
95.09
70.76
43.73
399
0
12
45.3
105
95.23
70.17
43.68

RI ≧ 30
97
95.09
71.25
43.73
399
0
5
34.7
102
95.23
70.88
43.68

RI ≧ 30
94
95.09
71.99
44.23
399
0
5
30.7
99
95.23
71.80
44.15

RI ≧ 30
91
95.09
72.73
44.23
399
0
7
52.2
96
95.23
72.32
44.15

RI ≧ 30
88
95.09
73.46
44.23
399
0
6
32.8
93
95.23
73.03
44.15

RI ≧ 30
85
95.09
74.20
44.23
399
0
6
33.9
90
95.23
73.75
44.15

RI ≧ 30
82
95.09
74.94
45.21
399
0
29
32.5
87
95.23
74.46
45.11

RI ≧ 30
80
95.09
75.43
45.45
399
0
6
31.6
84
95.23
75.18
45.35

RI ≧ 30
77
95.09
76.17
45.70
399
0
8
36.4
81
95.23
75.89
45.58

RI ≧ 30
75
95.09
76.66
45.70
399
0
5
33.8
79
95.23
76.37
45.58

RI ≧ 30
73
95.09
77.15
45.95
399
0
3
30.3
77
95.23
76.85
45.82

RI ≧ 30
71
95.09
77.64
45.95
399
0
11
33.1
75
95.23
77.33
45.82

RI ≧ 30
70
95.33
78.13
45.95
400
1
3
32.3
74
95.47
77.80
45.82

RI ≧ 30
68
95.33
78.62
47.17
400
0
17
32.6
72
95.47
78.28
47.02

RI ≧ 30
67
95.58
79.12
47.17
401
1
4
30.8
71
95.70
78.76
47.02

RI ≧ 30
67
95.58
79.12
47.42
401
0
3
31.9
69
95.70
79.24
47.02

RI ≧ 30
65
95.58
79.61
47.42
401
0
6
35.3
67
95.70
79.71
47.02

RI ≧ 30
63
95.58
80.10
47.42
401
0
4
37.4
65
95.70
80.19
47.02

RI ≧ 30
61
95.58
80.59
47.42
401
0
4
48.2
63
95.70
80.67
47.02

RI ≧ 30
59
95.58
81.08
47.42
401
0
3
42.9
61
95.70
81.15
47.02

RI ≧ 30
57
95.58
81.57
47.42
401
0
5
33.8
59
95.70
81.62
47.02

RI ≧ 30
56
95.58
81.82
47.42
401
0
7
47.3
57
95.70
82.10
47.26

RI ≧ 30
54
95.58
82.31
47.42
401
0
6
33.0
55
95.70
82.58
47.26

RI ≧ 30
52
95.58
82.80
47.67
401
0
5
39.4
53
95.70
83.05
47.49

RI ≧ 30
51
95.58
83.05
47.67
401
0
4
54.8
52
95.70
83.29
47.49

RI ≧ 30
51
95.58
83.05
48.16
401
0
3
45.5
51
95.70
83.53
47.73

RI ≧ 30
50
95.58
83.29
48.65
401
0
7
40.5
50
95.70
83.77
48.21

RI ≧ 30
49
95.58
83.54
48.89
401
0
6
35.5
49
95.70
84.01
48.45

RI ≧ 30
48
95.58
83.78
48.89
401
0
3
46.2
46
95.70
84.25
48.45

RI ≧ 30
47
95.58
84.03
48.89
401
0
3
30.6
47
95.70
84.49
48.45

RI ≧ 30
46
95.58
84.28
48.89
401
0
4
30.3
46
95.70
84.73
48.45

RI ≧ 30
45
95.58
84.52
48.89
401
0
3
50.0
45
95.70
84.96
48.45

RI ≧ 30
44
95.58
84.77
49.14
401
0
4
34.5
44
95.70
85.20
48.69

RI ≧ 30
43
95.58
85.01
49.14
401
0
3
34.9
43
95.70
85.44
48.69

RI ≧ 30
42
95.58
85.26
49.63
401
0
6
31.9
42
95.70
85.68
49.16

RI ≧ 30
41
95.58
85.50
50.61
401
0
7
30.0
41
95.70
85.92
50.12

RI ≧ 30
40
95.58
85.75
50.86
401
0
3
30.0
40
95.70
86.16
50.36

RI ≧ 30
39
95.58
86.00
50.86
401
0
4
32.8
39
95.70
86.40
50.36

RI ≧ 30
38
95.58
86.24
51.11
401
0
5
39.4
38
95.70
86.63
50.60

RI ≧ 30
37
95.58
86.49
51.35
401
0
5
31.1
37
95.70
86.87
50.84

RI ≧ 30
36
95.58
86.73
51.60
401
0
26
125.6
36
95.70
87.11
51.07

RI ≧ 30
35
95.58
86.98
51.60
401
0
4
31.3
35
95.70
87.35
51.07

RI ≧ 30
34
95.58
87.22
51.84
401
0
4
31.5
34
95.70
87.59
51.31

RI ≧ 30
33
95.58
87.47
52.09
401
0
4
46.5
33
95.70
87.83
51.55

RI ≧ 30
32
95.58
87.71
52.09
401
0
6
32.4
32
95.70
88.07
51.55

RI ≧ 30
31
95.58
87.96
52.09
401
0
4
36.7
31
95.70
88.31
51.55

RI ≧ 30
30
95.58
88.21
52.33
401
0
6
42.3
30
95.70
88.54
51.79

RI ≧ 30
29
95.58
88.45
52.33
401
0
5
36.2
29
95.70
88.76
51.79

RI ≧ 30
28
95.58
88.70
52.58
401
0
6
30.3
28
95.70
89.02
52.03

RI ≧ 30
27
95.58
88.94
52.83
401
0
8
43.7
27
95.70
89.26
52.27

RI ≧ 30
26
95.58
89.19
52.83
401
0
7
34.3
26
95.70
89.50
52.27

RI ≧ 30
25
95.58
89.43
53.07
401
0
5
41.7
25
95.70
89.74
52.51

RI ≧ 30
24
95.58
89.68
53.07
401
0
3
33.3
24
95.70
89.96
52.51

RI ≧ 30
23
95.58
89.93
53.56
401
0
5
30.7
23
95.70
90.21
53.22

RI ≧ 30
22
95.58
90.17
53.56
401
0
7
34.0
22
95.70
90.45
53.22

RI ≧ 30
21
95.58
90.42
53.81
401
0
4
31.3
21
95.70
90.69
53.46

RI ≧ 30
20
95.58
90.66
53.81
401
0
5
30.7
20
95.70
90.93
53.46

RI ≧ 30
19
95.58
90.91
53.81
401
0
3
34.1
19
95.70
91.17
53.46

RI ≧ 30
18
95.58
91.15
54.05
401
0
3
47.6
18
95.70
91.41
53.70

RI ≧ 30
17
95.58
91.40
54.30
401
0
4
40.4
17
95.70
91.65
53.94

RI ≧ 30
16
95.58
91.65
54.55
401
0
4
32.3
16
95.70
91.89
54.18

RI ≧ 30
15
95.58
91.89
54.55
401
0
5
31.1
15
95.70
92.12
54.18

RI ≧ 30
14
95.58
92.14
54.55
401
0
6
36.6
14
95.70
92.36
54.18

RI ≧ 20
12
95.58
92.63
55.53
401
0
14
23.7
12
95.70
92.84
55.13

RI ≧ 20
11
95.58
92.87
55.53
401
0
3
27.5
11
95.70
93.08
55.13

RI ≧ 20
10
95.58
93.12
55.77
401
0
5
23.4
10
95.70
93.32
55.37

RI ≧ 20
9
95.58
93.37
56.27
401
0
4
23.5
9
95.70
93.56
55.85

RI ≧ 20
8
95.58
93.61
57.00
401
0
22
21.4
8
95.70
93.79
56.56

RI ≧ 20
7
95.58
93.86
57.49
401
0
8
21.1
7
95.70
94.03
57.04

RI ≧ 20
6
95.58
94.10
57.74
401
0
3
23.8
6
95.70
94.27
57.28

RI ≧ 20
5
95.58
94.35
57.99
401
0
6
24.4
5
95.70
94.51
57.52

RI ≧ 20
4
95.58
94.59
57.99
401
0
4
28.0
4
95.70
94.75
57.52

RI ≧ 20
3
95.58
94.84
58.23
401
0
6
25.6
3
95.70
94.99
57.76

RI ≧ 20
2
95.58
95.09
58.23
401
0
3
26.8
2
95.70
95.23
57.76

Predicted drivers
2
95.58
95.09
58.97
401
0
17
19.0
2
95.70
95.23
56.47

Predicted drivers
2
95.58
95.09
59.46
401
0
3
7.7
2
95.70
95.23
58.95

Predicted drivers
2
95.58
95.09
59.46
401
0
3
28.0
2
95.70
95.23
58.95

Predicted drivers
2
95.58
95.09
59.46
401
0
2
12.7
2
95.70
95.23
58.95

Predicted drivers
2
95.58
95.09
59.71
401
0
2
10.3
2
95.70
95.23
59.19

Predicted drivers
2
95.58
95.09
59.71
401
0
3
19.9
2
95.70
95.23
59.19

Predicted drivers
2
95.58
95.09
59.95
401
0
4
19.0
2
95.70
95.23
59.43

Predicted drivers
2
95.58
95.09
60.44
401
0
2
19.8
2
95.70
95.23
59.90

Predicted drivers
2
95.58
95.09
60.44
401
0
4
21.4
2
95.70
95.23
59.90

Predicted drivers
2
95.58
95.09
60.93
401
0
3
23.6
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
2
19.8
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
0
0.0
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
2
19.2
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
1
6.2
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
0
0.0
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
5
14.8
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
1
6.4
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
6
9.0
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
0
0.0
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
1
5.5
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
0
0.0
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
2
19.6
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
60.93
401
0
6
9.1
2
95.70
95.23
60.38

Predicted drivers
2
95.58
95.09
61.18
401
0
4
25.5
2
95.70
95.23
60.62

Predicted drivers
2
95.58
95.09
61.43
401
0
3
8.9
2
95.70
95.23
60.86

Predicted drivers
2
95.58
95.09
61.67
401
0
2
15.9
2
95.70
95.23
61.10

Predicted drivers
2
95.58
95.09
61.92
401
0
1
5.3
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
0
0.0
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
0
0.0
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
3
23.6
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
1
5.3
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
0
0.0
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
5
23.7
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
61.92
401
0
1
6.6
2
95.70
95.23
61.34

Predicted drivers
2
95.58
95.09
62.16
401
0
2
10.3
2
95.70
95.23
61.58

Predicted drivers
2
95.58
95.09
62.65
401
0
3
19.1
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
1
9.3
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
2
19.6
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
0
0.0
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
1
6.4
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
2
15.7
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
1
7.8
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
0
0.0
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
2
7.0
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
1
6.4
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
1
8.3
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
0
0.0
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
0
0.0
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
1
9.9
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.65
401
0
0
0.0
2
95.70
95.23
62.05

Predicted drivers
2
95.58
95.09
62.90
401
0
3
26.3
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
62.90
401
0
0
0.0
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
62.90
401
0
4
24.7
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
62.90
401
0
7
33.2
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
62.90
401
0
1
9.8
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
62.90
401
0
5
19.2
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
62.90
401
0
0
0.0
2
95.70
95.23
62.29

Predicted drivers
2
95.58
95.09
63.14
401
0
5
8.5
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.14
401
0
1
8.4
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.14
401
0
1
9.8
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.14
401
0
2
10.3
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.14
401
0
3
9.8
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.14
401
0
1
6.8
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.14
401
0
1
7.4
2
95.70
95.23
62.77

Predicted drivers
2
95.58
95.09
63.88
401
0
9
15.4
2
95.70
95.23
63.48

Predicted drivers
2
95.58
95.09
64.13
401
0
5
18.5
2
95.70
95.23
63.72

Predicted drivers
2
95.58
95.09
64.37
401
0
1
9.9
2
95.70
95.23
63.96

Predicted drivers
2
95.58
95.09
64.37
401
0
1
8.9
2
95.70
95.23
63.96

Predicted drivers
2
95.58
95.09
64.37
401
0
0
0.0
2
95.70
95.23
63.96

Predicted drivers
2
95.58
95.09
64.37
401
0
2
19.6
2
95.70
95.23
63.96

Predicted drivers
2
95.58
95.09
64.62
401
0
4
21.1
2
95.70
95.23
64.20

Predicted drivers
2
95.58
95.09
64.86
401
0
3
21.0
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
2
13.6
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
0
0.0
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
1
4.5
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
0
0.0
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
2
19.8
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
3
19.6
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
2
18.7
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
0
0.0
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
1
8.8
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
0
0.0
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
2
19.8
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
2
12.9
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
64.86
401
0
3
29.4
2
95.70
95.23
64.44

Predicted drivers
2
95.58
95.09
65.11
401
0
1
8.7
2
95.70
95.23
64.68

Predicted drivers
2
95.58
95.09
65.11
401
0
3
28.3
2
95.70
95.23
64.68

Predicted drivers
2
95.58
95.09
65.11
401
0
0
0.0
2
95.70
95.23
64.68

Predicted drivers
2
95.58
95.09
65.36
401
0
2
18.5
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
1
7.5
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
2
13.5
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
5
14.6
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
66.36
401
0
2
18.2
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
1
8.0
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
2
12.7
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
2
12.1
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
1
8.6
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
2
11.2
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
0
0.0
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
1
5.9
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
4
30.3
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
0
0.0
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
1
9.9
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
0
0.0
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.36
401
0
0
0.0
2
95.70
95.23
64.92

Predicted drivers
2
95.58
95.09
65.60
401
0
1
6.3
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
0
0.0
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
2
8.2
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
4
23.0
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
1
7.6
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
0
0.0
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
0
0.0
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
65.60
401
0
2
10.5
2
95.70
95.23
65.16

Predicted drivers
2
95.58
95.09
66.09
401
0
3
28.8
2
95.70
95.23
65.63

Predicted drivers
2
95.58
95.09
66.09
401
0
0
0.0
2
95.70
95.23
65.63

Predicted drivers
2
95.58
95.09
66.09
401
0
0
0.0
2
95.70
95.23
65.63

Predicted drivers
2
95.58
95.09
66.09
401
0
8
26.8
2
95.70
95.23
65.63

Predicted drivers
2
95.58
95.09
66.34
401
0
1
7.0
2
95.70
95.23
65.87

Predicted drivers
2
95.58
95.09
66.34
401
0
2
13.9
2
95.70
95.23
65.87

Predicted drivers
2
95.58
95.09
66.34
401
0
0
0.0
2
95.70
95.23
65.87

Predicted drivers
2
95.58
95.09
66.34
401
0
0
0.0
2
95.70
95.23
65.87

Predicted drivers
2
95.58
95.09
66.58
401
0
1
9.9
2
95.70
95.23
66.11

Predicted drivers
2
95.58
95.09
66.83
401
0
1
9.8
2
95.70
95.23
66.35

Predicted drivers
2
95.58
95.09
66.83
401
0
0
0.0
2
95.70
95.23
66.35

Predicted drivers
2
95.58
95.09
67.57
401
0
3
23.4
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
1
8.5
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
1
8.3
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
0
0.0
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
0
0.0
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
2
17.9
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
0
0.0
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
67.57
401
0
0
0.0
2
95.70
95.23
67.06

Predicted drivers
2
95.58
95.09
68.06
401
0
3
27.0
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
0
0.0
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
2
19.8
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
0
0.0
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
2
19.6
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
0
0.0
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
1
9.8
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
0
0.0
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.06
401
0
0
0.0
2
95.70
95.23
67.54

Predicted drivers
2
95.58
95.09
68.30
401
0
3
20.4
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
6.8
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
8.7
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
12.1
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
3
6.7
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
3
10.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
11.6
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
9.9
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
6.8
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
12.7
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
5.3
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
19.6
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
6.4
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
9.9
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
7.5
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
10.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
9.9
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
1
7.8
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
17.4
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
2
16.7
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.30
401
0
0
0.0
2
95.70
95.23
67.78

Predicted drivers
2
95.58
95.09
68.55
401
0
1
7.4
2
95.70
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
1
4
21.3
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
2
13.1
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
1
9.8
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
1
6.8
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
1
3.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
2
13.5
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
3
22.7
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
2
11.5
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
4
16.5
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.55
402
0
0
0.0
3
95.94
95.23
68.02

Predicted drivers
3
95.82
95.09
68.80
402
0
1
9.9
3
95.94
95.23
68.26

Predicted drivers
3
95.82
95.09
68.80
402
0
1
5.2
3
95.94
95.23
68.26

Predicted drivers
3
95.82
95.09
68.80
402
0
1
9.3
3
95.94
95.23
68.26

Predicted drivers
3
95.82
95.09
68.80
402
0
0
0.0
3
95.94
95.23
68.26

Predicted drivers
3
95.82
95.09
68.80
402
0
0
0.0
3
95.94
95.23
68.26

Predicted drivers
3
95.82
95.09
69.04
402
0
1
6.3
3
95.94
95.23
68.50

Predicted drivers
3
95.82
95.09
69.29
402
0
3
17.1
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
7.4
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
8.7
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
7.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
4.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
9.9
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
2
4.5
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
7.1
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
8.1
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
2
10.9
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
1
4.7
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
2
12.3
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.29
402
0
0
0.0
3
95.94
95.23
68.74

Predicted drivers
3
95.82
95.09
69.53
402
0
1
9.9
3
95.94
95.23
68.97

Predicted drivers
3
95.82
95.09
69.78
402
0
2
13.5
3
95.94
95.23
69.21

Predicted drivers
3
95.82
95.09
69.78
402
0
0
0.0
3
95.94
95.23
69.21

Predicted drivers
3
95.82
95.09
69.78
402
0
0
0.0
3
95.94
95.23
69.21

Predicted drivers
3
95.82
95.09
69.78
402
0
2
4.6
3
95.94
95.23
69.21

Predicted drivers
3
95.82
95.09
69.78
402
0
3
8.8
3
95.94
95.23
69.21

Predicted drivers
3
95.82
95.09
69.78
402
0
3
14.7
3
95.94
95.23
69.21

Predicted drivers
3
95.82
95.09
70.02
402
0
3
15.4
3
95.94
95.23
69.45

Predicted drivers
3
95.82
95.09
70.27
402
0
2
14.1
3
95.94
95.23
69.69

Predicted drivers
3
95.82
95.09
70.27
402
0
0
0.0
3
95.94
95.23
69.69

Predicted drivers
3
95.82
95.09
70.52
402
0
1
4.6
3
95.94
95.23
69.93

Predicted drivers
3
95.82
95.09
70.52
402
0
1
9.9
3
95.94
95.23
69.93

Predicted drivers
3
95.82
95.09
70.52
402
0
0
0.0
3
95.94
95.23
69.93

Predicted drivers
3
95.82
95.09
70.76
402
0
2
19.4
3
95.94
95.23
70.17

Predicted drivers
3
95.82
95.09
71.01
402
0
1
7.0
3
95.94
95.23
70.41

Predicted drivers
3
95.82
95.09
71.01
402
0
2
15.6
3
95.94
95.23
70.41

Predicted drivers
3
95.82
95.09
71.01
402
0
2
15.0
3
95.94
95.23
70.41

Predicted drivers
3
95.82
95.09
71.01
402
0
0
0.0
3
95.94
95.23
70.41

Predicted drivers
3
95.82
95.09
71.25
402
0
2
12.6
3
95.94
95.23
70.64

Predicted drivers
3
95.82
95.09
71.25
402
0
0
0.0
3
95.94
95.23
70.64

Predicted drivers
3
95.82
95.09
71.25
402
0
1
7.6
3
95.94
95.23
70.64

Predicted drivers
3
95.82
95.09
71.50
402
0
1
9.8
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
1
6.9
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
0
0.0
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
0
0.0
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
1
6.9
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
0
0.0
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
0
0.0
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.50
402
0
1
9.8
3
95.94
95.23
70.88

Predicted drivers
3
95.82
95.09
71.74
402
0
2
9.9
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
1
9.1
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
1
2.6
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
2
16.9
3
95.94
95.23
71.12

Predicted drivers
3
95.82
95.09
71.74
402
0
0
0.0
3
95.94
95.23
71.12

Predicted drivers
4
96.07
95.09
72.97
403
1
23
3.5
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
1
8.8
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
1
4.2
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
1
5.8
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
72.97
403
0
0
0.0
4
96.18
95.23
72.32

Predicted drivers
4
96.07
95.09
73.22
403
0
1
9.9
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
7.9
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
5.8
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
7.6
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
2
11.7
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
4.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
7.8
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
5.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
9.9
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
5.8
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
2
11.2
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
9.9
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
9.6
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
4.2
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
4.6
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
0
0.0
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.22
403
0
1
9.8
4
96.18
95.23
72.55

Predicted drivers
4
96.07
95.09
73.46
403
0
3
21.1
4
96.18
95.23
72.79

Predicted drivers
4
96.07
95.09
73.46
403
0
1
8.5
4
96.18
95.23
72.79

Predicted drivers
4
96.07
95.09
73.46
403
0
0
0.0
4
96.18
95.23
72.79

Predicted drivers
4
96.07
95.09
73.46
403
0
0
0.0
4
96.18
95.23
72.79

Predicted drivers
4
96.07
95.09
73.46
403
0
1
9.8
4
96.18
95.23
72.79

Predicted drivers
4
96.07
95.09
73.71
403
0
1
8.6
4
96.18
95.23
73.03

Predicted drivers
4
96.07
95.09
73.71
403
0
0
0.0
4
96.18
95.23
73.03

Predicted drivers
4
96.07
95.09
73.96
403
0
1
7.2
4
96.18
95.23
73.27

Predicted drivers
4
96.07
95.09
74.45
403
0
2
10.8
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
1
3.8
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
1
9.8
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
2
4.6
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
1
8.9
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Predicted drivers
4
96.07
95.09
74.45
403
0
0
0.0
4
96.18
95.23
73.75

Add fusions
—
—
—
—
—
—
—
—
—
—
—
—

Add fusions
—
—
—
—
—
—
—
—
—
—
—
—

Add fusions
—
—
—
—
—
—
—
—
—
—
—
—

Add fusions
—
—
—
—
—
—
—
—
—
—
—
—

Add fusions
—
—
—
—
—
—
—
—
—
—
—
—

FIG. 3 illustrates how the statistical enrichment of recurrently mutated NSCLC exons captures known drivers. Two metrics were employed to prioritize exons with recurrent mutations for inclusion in the CAPP-Seq NSCLC selector. The first, termed Recurrence Index (RI), is defined as the number of unique patients (i.e. tumors) with somatic mutations per kilobase of a given exon and the second metric is based on the minimum number of unique patients (i.e. tumors) with mutations in a given kb of exon. Exons containing at least one non-silent SNV genotyped by TCGA (n=47,769) in a combined cohort of 407 lung adenocarcinoma (LUAD) and squamous cell carcinoma (SCC) patients were analyzed. As shown in FIG. 3(a), known/suspected NSCLC drivers are highly enriched at RI≧30 (inset), comprising 1.8% (n=861) of analyzed exons. As shown in FIG. 3(b), known/suspected NSCLC drivers are highly enriched at ≧3 patients with mutations per exon (inset), encompassing 16% of analyzed exons.

Approximately 8% of NSCLCs contain clinically actionable rearrangements involving the receptor tyrosine kinases, ALK, ROS1 and RET (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870; Kwak et al. (2010) N. Engl. J. Med. 363:1693-1703; Pao & Hutchinson (2012) Nat. Med. 18:349-351). To utilize the personalized nature and low false detection rate of structural rearrangements (Leary et al. (2010) Sci. Transl. Med. 2:20ra14; McBride et al. (2010) Genes Chrom. Cancer 49:1062-1069), introns and exons spanning recurrent fusion breakpoints in these genes were included in the final design phase (FIG. 1b). To detect fusions in tumor and plasma DNA, a breakpoint-mapping algorithm called FACTERA was developed (FIG. 4). Application of FACTERA to next generation sequencing (NGS) data from 2 NSCLC cell lines known to harbor fusions with previously uncharacterized breakpoints (Koivunen et al. (2008) Clin. Cancer Res. 14:4275-4283; Rikova et al. (2007) Cell 131:1190-1203) readily identified the breakpoints in both cases (FIG. 5).

Collectively, the NSCLC CAPP-Seq selector design targets 521 exons and 13 introns from 139 recurrently mutated genes, in total covering ˜125 kb (FIG. 1b). Within this small target (0.004% of the human genome), CAPP-Seq identifies a median of 4 point mutations and covered 96% of patients with lung adenocarcinoma or squamous cell carcinoma. To validate the number of mutations covered per tumor, we examined the selector region in WES data from an independent cohort of 183 lung adenocarcinoma patients (Imielinski et al. (2012) Cell 150:1107-1120). The selector covered 88% of patients with a median of 4 SNVs per patient, thus validating our selector design algorithm (P<1.0×10⁻⁶; FIG. 1c). When compared to randomly sampling the exome, regions targeted by CAPP-Seq captured ˜4-fold as many mutations per patient (at the median, FIG. 1c). Due to similarities in key oncogenic machinery across cancers (Hanahan & Weinberg (2011) Cell 144:646-674), we hypothesized that our NSCLC selector would perform favorably on other carcinomas. Indeed, when applied to TCGA WES data, the selector successfully captured 99% of colon, 98% of rectal, and 97% of endometrioid uterine carcinomas, with a median of 12, 7, and 3 mutations per patient, respectively (FIG. 1d). This demonstrates the value of targeting hundreds of recurrently mutated genomic regions and suggests that a CAPP-Seq selector could be designed to simultaneously cover mutations for a wide variety of human malignancies.

Using this CAPP-Seq selector, we profiled a total of 52 samples including NSCLC cell lines, primary tumor specimens, peripheral blood leukocytes (PBLs), and cfDNA isolated from plasma of patients with NSCLC before and after various cancer therapies (Table 2). To assess and optimize the performance of CAPP-Seq, we first applied it to cfDNA purified from healthy control plasma. Approximately 60% of reads mapped within the selector target region (Table 2). Sequenced cfDNA fragments had a median length of 169 bp (FIG. 1e), closely corresponding to the length of DNA contained within a chromatosome (Fan et al. (2008) Proc. Natl Acad. Sci. USA 105:16266-16271). To optimize library preparation from small quantities of cfDNA we explored a variety of modifications to the ligation and post-ligation amplification steps including temperature, incubation time, enzyme source, and “with-bead” clean-up. The optimized protocol increased recovery efficiency by >300% and decreased bias for libraries constructed from as little as 4 ng of cfDNA (FIGS. 6-8). Consequently, fluctuations in sequencing depth were minimal (FIG. 1f-g) and unlikely to impact performance.

TABLE 2

Profile of samples using NSCLC CAPP-Seq selector

DNA
Library

Fraction of

mass used
mass used
Total
properly
Read on-

Median

for library
for capture
reads
paired
target
Median
fragment

Sample
(ng)
(ng)
mapped
reads
rate
depth
length

H3122 0.1% into HCC78
128
111
99.0%
96.8%
69.5%
8688
173

H3122 1% into HCC78
128
111
98.9%
96.7%
69.8%
8657
171

H3122 10% into HCC78
128
111
98.9%
96.5%
69.8%
6890
170

H3122 100%
128
111
99.0%
96.8%
68.6%
6739
174

HCC78 100%
128
111
99.0%
96.9%
69.7%
7602
172

cfDNA 100% 6 cycles
32
83.3
97.5%
86.7%
60.3%
8280
168

HCC78 10% into cfDNA 4 cycles
128
83.3
97.5%
83.3%
59.3%
2682
170

HCC78 10% into cfDNA 8 cycles SigmaWGA
624
83.3
79.5%
72.0%
50.4%
15
158

HCC78 10% into cfDNA 6 cycles
32
83.3
97.7%
87.2%
60.4%
8261
169

HCC78 10% into cfDNA 8 cycles NEBNextOvernightBead
32
83.3
96.9%
91.8%
61.1%
6258
166

HCC78 10% into cfDNA 8 cycles OrigNEBNext 15 minLig
32
83.3
98.0%
93.1%
60.9%
9862
167

HCC78 10% into cfDNA 4 ng 9 cycles
4
83.3
97.6%
87.6%
60.5%
11630
169

P11 PBL
500
83.3
96.7%
93.8%
59.0%
6970
169

P11 Tumor
500
83.3
93.4%
88.3%
61.3%
7700
156

P6 PBL
500
83.3
96.7%
92.6%
67.2%
3848
152

P6 Tumor
1000
83.3
87.0%
81.8%
64.7%
2445
158

P8 PBL
500
83.3
96.9%
93.0%
65.8%
4021
154

P8 Tumor
500
83.3
91.7%
85.4%
63.6%
5331
151

P10 PBL
400
83.3
96.9%
93.6%
65.3%
4572
161

P10 Tumor
500
83.3
94.0%
89.6%
65.1%
5335
157

P7 PBL
500
83.3
97.1%
93.5%
67.1%
3552
155

P7 Tumor
500
83.3
94.1%
89.3%
64.0%
4793
162

HCC78 0.025% into cfDNA
32
83.3
98.2%
87.0%
46.3%
3913
169

HCC78 0.05% into cfDNA
32
83.3
98.1%
86.1%
44.7%
6549
169

HCC78 0.1% into cfDNA
32
83.3
98.4%
88.1%
44.9%
6897
169

HCC78 0.5% into cfDNA
32
83.3
98.8%
89.8%
46.2%
8096
169

HCC78 1% into cfDNA
32
83.3
98.5%
89.8%
46.5%
7779
171

P6-1 cfDNA
17
83.3
98.6%
91.3%
46.4%
11172
166

P6-2 cfDNA
20
83.3
98.5%
92.0%
46.6%
8455
166

P9 PBL
500
83.3
97.0%
94.4%
59.2%
5441
172

P9 Tumor
69
83.3
99.2%
97.3%
55.3%
7312
239

P3 PBL
500
83.3
99.3%
97.8%
57.0%
8838
235

P3 Tumor
500
83.3
99.3%
98.0%
66.0%
9562
204

P2 PBL
500
83.3
99.2%
97.5%
57.7%
7680
235

P2 Tumor
500
83.3
99.0%
97.1%
62.3%
7247
204

P4 PBL
500
83.3
99.1%
96.5%
56.5%
7331
227

P4 Tumor
200
83.3
97.5%
94.1%
60.0%
3968
189

P1 PBL
500
83.3
99.3%
97.1%
57.1%
7336
220

P1 Tumor
500
83.3
94.6%
90.1%
60.9%
976
192

P5 PBL
500
83.3
99.2%
97.2%
58.7%
8155
219

P5 Tumor
100
83.3
98.8%
97.0%
63.5%
6930
187

P9-1 cfDNA
12
83.3
99.1%
84.2%
65.6%
6839
172

P9-2 cfDNA
17
83.3
98.4%
83.9%
65.2%
6043
169

P9-3 cfDNA
16
83.3
99.4%
88.7%
67.6%
8141
167

P3-1 cfDNA
15
83.3
99.2%
86.0%
63.5%
7057
170

P3-2 cfDNA
16
83.3
99.3%
86.5%
63.5%
10089
171

P2-1 cfDNA
13
83.3
99.4%
86.9%
67.3%
6876
172

P2-2 cfDNA
16
83.3
99.5%
96.4%
63.6%
5248
185

P1-1 cfDNA
13
83.3
99.0%
85.0%
64.6%
5079
171

P1-2 cfDNA
7
83.3
99.4%
84.7%
64.1%
6487
172

P5-1 cfDNA
9
83.3
99.3%
87.8%
66.6%
7604
169

P5-2 cfDNA
15
83.3
99.4%
88.0%
67.5%
10451
170

FIG. 6 illustrates the improvements in CAPP-Seq performance achieved with optimized library preparation procedures. Using 32 ng of input cfDNA from plasma, standard versus “with bead” (Fisher et al. (2011) Genome biology 12:R1) library preparation methods were compared, as well as two commercially available DNA polymerases (Phusion and KAPA HiFi). Template pre-amplification by Whole Genome Amplification (WGA) using Degenerate Oligonucleotide PCR (DOP) were also compared. Indices considered for these comparisons included (a) length of the captured cfDNA fragments sequenced, (b) depth and uniformity of sequencing coverage across all genomic regions in the selector, and (c) sequence mapping and capture statistics, including uniqueness. Collectively, these comparisons identified KAPA HiFi polymerase and a “with bead” protocol as having most robust and uniform performance.

FIG. 7 illustrates the optimization of allele recovery from low input cfDNA during Illumina library preparation. Bars reflect the relative yield of CAPP-Seq libraries constructed from 4 ng cfDNA, calculated by averaging quantitative PCR measurements of 4 pre-selected reporters within CAPP-Seq with pre-defined amplification efficiencies. (a) Sixteen hour ligation at 16° C. increases ligation efficiency and reporter recovery. (b) Adapter ligation volume did not have a significant effect on ligation efficiency and reporter recovery. (c) Performing enzymatic reactions “with-bead” to minimize tube transfer steps increases reporter recovery. (d) Increasing adapter concentration during ligation increases ligation efficiency and reporter recovery. Reporter recovery is also higher when using KAPA HiFi DNA polymerase compared to Phusion DNA polymerase (e) and when using the KAPA Library Preparation Kit with the modifications in a-d compared to the NuGEN SP Ovation Ultralow Library System with automation on a Mondrian SP Workstation (f). Relative reporter abundance was determined by qPCR using the 2^−ΔCtmethod. All values are mean±s.d. N.S., not significant. Based on these results, it was estimated that combining the methodological modifications in a and c-e improves yield in NGS libraries by 3.3-fold.

FIG. 7 illustrates the performance of CAPP-Seq with various amounts of input cfDNA. (a) Length of the captured cfDNA fragments sequenced. (b) Depth of sequencing coverage across all genomic regions in the selector. (c) Sequence mapping and capture statistics. As expected, more input cfDNA mass correlates with more unique fragments sequenced.

The detection limit of CAPP-Seq is affected by the absolute number of available cfDNA molecules in a given volume of peripheral blood, as well as PCR and sequencing errors (i.e. “technical” background). The latter primarily affects substitutions/SNVs as opposed to other CAPP-Seq reporters (i.e., indels (Minoche et al. (2011) Genome Biol. 12:R112) and rearrangements). Separately, mutant cfDNA could be present in the absence of cancer due to contributions from pre-neoplastic cells from diverse tissues (i.e., “biological” background). The combined background from these sources was measured by assessing the error rate at each nucleotide position across the selector in plasma cfDNA from 6 patients and a healthy individual, excluding tumor-derived mutations. Mean and median background rates of ˜0.007% and ˜0% (not detected, N.D.) were found, respectively (FIG. 9 (a)). Next, we hypothesized that if significant biological background is present, it should be highest for recurrently mutated positions in cancer driver genes. We therefore analyzed mutation rates of 107 recurrent cancer-associated SNVs (Su et al. (2011) J. Mol. Diagn. 13:74-84) in the same 7 plasma samples, again excluding those SNVs found in corresponding tumors. Though the median fractional abundance was comparable (˜0%, N.D.), the mean was marginally higher at 0.012% (FIG. 9 (b)). However, only one cancer-associated mutation (TP53 R175H) was detectable in plasma at levels significantly above global background (P<0.01). Since this allele was detected at a median frequency of ˜0.3% across all samples (FIG. 9(c)), we hypothesize that it reflects true biological background and thus excluded it as a potential CAPP-Seq reporter. Collectively, this analysis suggests that biological background is not a significant factor for disease monitoring at the current detection limits of CAPP-Seq.

Next, the allele frequency detection limit and linearity of CAPP-Seq was benchmarked by spiking defined concentrations of fragmented genomic DNA from a NSCLC cell line into cfDNA from a healthy individual (FIG. 9(d)) or into genomic DNA from a second NSCLC line (FIG. 10(a)). CAPP-Seq accurately detected variants at fractional abundances between 0.025% and 10% with high linearity (R²≧0.994). Analyses of the influence of the number of SNV reporters on error metrics showed only marginal improvements above a threshold of 4 reporters per tumor (FIGS. 9(e)-(f), 10 (b)-(c)), equivalent to the median number of SNVs per NSCLC identified by the NSCLC selector. Finally, whether fusion breakpoints and indels could also serve as linear reporters was tested. It was found that the fractional abundance of these mutations correlated highly with expected concentrations (R²≧0.995; FIG. 10(d)).

Having designed, optimized, and benchmarked CAPP-Seq, it was applied to the discovery of somatic mutations in tumors collected from a diverse group of NSCLC patients (n=11; FIG. 11(a) and Table 3). To test the breakpoint enumeration capability of CAPP-Seq, 6 patients with clinically confirmed fusions were included. These translocations served as positive controls, along with SNVs in other tumors previously identified by clinical assays (N=9; Table 3). Tumor samples included formalin fixed surgical or biopsy specimens and pleural fluid. At a mean sequencing depth of ˜6,000× in tumor and paired germline samples, CAPP-Seq confirmed all previously identified SNVs and fusions (3 and 8, respectively) and discovered many additional somatic variants (FIG. 11(a) and Table 3). Moreover, CAPP-Seq characterized breakpoints and partner genes at base pair resolution for each of the 8 rearrangements (FIG. 12). Tumors containing fusions were almost exclusively from never smokers and, as expected (Govindan et al. (2012) Cell 150:1121-1134), contained fewer SNVs than those lacking fusions (FIG. 13). Excluding patients with fusions (<10% of the TCGA design cohort), CAPP-Seq identified a median of 4 SNVs per patient as we had predicted (FIG. 1(b)-(c)).

TABLE 3

Characteristics of patients used for noninvasive detection and monitoring of circulating tumor DNA by CAPP-Seq.

SNVs by
Fusions

Grade and Other
TNM
Stage

Pack-
Tumor
Germline
Clinical
Detected

Case
Age
Sex
Histology
Histological Features
Stage
Group
Smoker
Years
Source
Source
Assays
by FISH

P1
66
M
Adeno-
Papillary type
T2aN0M0
B
Yes
20
FFPE
Frozen

carcinoma

cores
PBL

P2
61
M
Large Cell
NOS
T3N1M0
IIIA
Yes
80
FFPE
Frozen

cores
PBL

P3
67
F
Adeno-
Acinar type
T1bN3M0
IIIB
Yes
15
FFPE
Frozen

carcinoma

cores
PBL

P4
47
F
Adeno-
Micropapillary and
T2aN2M1b
IV
Yes
45
FFPE
Frozen
KRAS G13D

carcinoma
papillary type

cores
PBL

P5
49
F
Adeno-
Well differentiated
T1bN0M1a
IV
No
0
FFPE
Frozen
EGFR L858R;

carcinoma

cores
PBL
EGFR T790M

P6
54
M
Adeno-
NOS
T3N2M1b
IV
No
0
Fresh
Frozen

ALK

carcinoma

PBL

P7
50
M
Adeno-
Poorly differentiated
T1aN2M1b
IV
Yes
4
FFPE
Frozen

ALK

carcinoma

cores
PBL

P8
48
F
Adeno-
Mutinous type
T4N0M1b
IV
No
0
FFPE
Frozen

ALK

carcinoma

cores
PBL

P9
49
M
Adeno-
Not otherwise
T4N3M1a
IV
No
0
Fresh
Frozen

ALK

carcinoma
specified (NOS)

PBL

P10
35
F
Adeno-
NOS
T4N0M0
IIIA
No
0
FFPE
Frozen

ROS1

carcinoma

cores
PBL

P11
38
F
Adeno-
Well-to-moderately
T3N2M0
IIIA
No
0
FFPE
Frozen

ROS1

carcinoma
differentiated

cores
PBL

: Related to FIGS. 11 (a) and 14, regarding smoking history, ≧20 pack years was considered heavy and >0 pack years was considered light.

To explore the potential clinical utility of CAPP-Seq for disease monitoring and minimal residual disease detection, we next applied CAPP-Seq to serial plasma samples collected from a subset of these same 11 patients (N=6), all of whom had pre- and post-treatment samples available (FIG. 11; Table 4). Starting from ˜15 ng of plasma cfDNA (˜3 mL of peripheral blood) and sequenced to a mean depth of nearly 8,000× (Table 3), CAPP-Seq detected cancer-derived cfDNA in both early and advanced stage patients (Table 4). Among patients with SNV or indel reporters, all showed a significant reduction in cancer cfDNA burden following treatment, consistent with radiographic response assessment by computed tomography (CT) (FIG. 11(a)). These included two patients—one with stage IB adenocarcinoma (P1) and another with stage IIIA large cell carcinoma (P2)—who underwent surgery with complete tumor resection (FIG. 11(b)). Post-treatment cancer-derived cfDNA was undetectable in the Stage I patient but was above background for the Stage IIIA patient suggesting that residual cancer cells remained after surgery even though a complete resection was thought to have been achieved. In a third case (P6), CAPP-Seq detected 3 SNVs and a KIF5B-ALK fusion, and both mutation types reported similar fractional abundances of mutant cfDNA (FIG. 14). Next, we analyzed a patient with 3 fusions and no detectable SNVs/indels (P9), but from whom 3 serial cfDNA samples were collected. Abundance of fusion product in the plasma was highly correlated with tumor burden and correctly indicated initial response to therapy followed by relapse (R²=0.97; FIG. 11(c)). Finally, in a fifth patient (P5), CAPP-Seq identified a sub-clonal population harboring the T790M EGFR gatekeeper mutation (Kobayashi et al. (2005) N. Engl. J. Med. 352:786-792) (FIG. 11(d)). The ratio between clones was identical in the tumor and pre-treatment plasma cfDNA but changed after treatment with cytotoxic chemotherapy followed by a 3^rdgeneration EGFR inhibitor (FIG. 11(d), inset), suggesting that CAPP-Seq can detect clinically relevant subclones and monitor clonal dynamics during therapy. Taken together, these data demonstrate the potential utility of CAPP-Seq as a noninvasive clinical assay for measuring tumor burden in early and advanced stage NSCLC and for monitoring tumor-derived cfDNA during therapy.

TABLE 4

Monitoring of cfDNA in patients using CAPP-Seq.

Time point 1
Time point 2
Time point 3

Mu-

Mu-

Mu-

Mu-

Mu-

Mu-

Mu-

tant

tant

tant

tant

tant

tant

tant
Ref.

allele
Total
allele
Final
allele
Total
allele
Final
allele
Total
allele
Final

Case
allele
allele
Chr
Position
depth
depth
%
%
depth
depth
%
%
depth
depth
%
%

P1
A
G
chr1
156785560
0
4572
0.000
0.000
3
6202
0.048
0.048
—
—
—
—

P1
T
G
chr1
157806043
0
1838
0.000
0.000
0
2266
0.000
0.000
—
—
—
—

P1
G
C
chr1
248525206
0
2828
0.000
0.000
0
4529
0.000
0.000
—
—
—
—

P1
C
T
chr2
33500291
1
943
0.106
0.106
0
943
0.000
0.000
—
—
—
—

P1
A
C
chr4
55946307
0
6856
0.000
0.000
0
8817
0.000
0.000
—
—
—
—

P1
G
A
chr4
55963949
0
5742
0.000
0.000
0
7335
0.000
0.000
—
—
—
—

P1
A
C
chr4
55968672
0
5856
0.000
0.000
0
7431
0.000
0.000
—
—
—
—

P1
C
T
chr6
117642146
0
5266
0.000
0.000
4
6849
0.058
0.058
—
—
—
—

P1
T
G
chr9
8376700
3
5535
0.054
0.054
0
7322
0.000
0.000
—
—
—
—

P1
T
C
chr9
8733625
1
827
0.121
0.121
0
1398
0.000
0.000
—
—
—
—

P1
T
G
chr10
43611663
0
3722
0.000
0.000
0
4565
0.000
0.000
—
—
—
—

P1
T
G
chr15
88522525
1
4919
0.020
0.020
4
6736
0.059
0.059
—
—
—
—

P1
+G
C
chr17
7578474
0
1762
0.000
0.000
0
2373
0.000
0.000
—
—
—
—

P1
−A
G
chr17
29552244
1
4484
0.022
0.022
0
6485
0.000
0.000
—
—
—
—

P1
+T
C
chr17
29553484
0
3657
0.000
0.000
0
4713
0.000
0.000
—
—
—
—

P1
−T
C
chr17
29592185
3
3694
0.081
0.081
0
3247
0.000
0.000
—
—
—
—

P2
A
C
chr2
50463926
49
6724
0.729
1.457
0
4981
0.000
0.000
—
—
—
—

P2

G

A

chr3

89457148

40
4838
0.827
0.827
0
4311
0.000
0.000
—
—
—
—

P2
T
G
chr3
89468286
5
4667
0.107
0.214
2
3625
0.055
0.110
—
—
—
—

P2
T
A
chr3
89480240
15
5073
0.296
0.591
0
4321
0.000
0.000
—
—
—
—

P2
T
A
chr4
66189669
4
950
0.421
0.842
5
1436
0.348
0.696
—
—
—
—

P2

T

G

chr4

66242868

16
2107
0.759
0.759
0
1655
0.000
0.000
—
—
—
—

P2

A

C

chr5

176522747

46
2220
2.072
2.072
0
1377
0.000
0.000
—
—
—
—

P2
C
T
chr6
117648229
70
7819
0.895
1.791
0
5985
0.000
0.000
—
—
—
—

P2
A
C
chr12
78400637
35
7907
0.443
0.885
1
6326
0.016
0.032
—
—
—
—

P2
T
G
chr12
78400910
106
8211
1.291
2.582
1
6289
0.016
0.032
—
—
—
—

P2

T

C

chr17

7577551

112
5629
1.990
1.990
2
3814
0.052
0.052
—
—
—
—

P2
T
G
chr19
1207247
15
1124
1.335
2.669
0
747
0.000
0.000
—
—
—
—

P2
+A
C
chr2
79314100
16
3280
0.488
0.98
0
2390
0.000
0.000
—
—
—
—

P3
A
C
chr17
7578253
6
6345
0.095
0.095
0
8583
0.000
0.000
—
—
—
—

P5
T
C
chr7
55249071
42
4736
0.887
0.887
10
5597
0.179
0.179
—
—
—
—

P5
G
T
chr7
55259515
503
11349
4.432
4.432
58
12222
0.475
0.475
—
—
—
—

P5
A
G
chr11
55135338
86
4063
2.117
2.117
10
4798
0.208
0.208
—
—
—
—

P5
T
C
chr17
7577097
227
7429
3.056
3.056
36
9723
0.370
0.370
—
—
—
—

P6
A
G
chr12
78400791
84
13970
0.601
1.203
28
10128
0.276
0.553
—
—
—
—

P6
T
G
chr12
129822187
78
8680
0.899
1.797
9
6604
0.136
0.273
—
—
—
—

P6

A

G

chr17

7576275

140
9376
1.493
1.493
22
7897
0.279
0.279
—
—
—
—

P6
KIF5B-
—
chr10/
—
28
15006
0.187
3.116
2
9989
0.020
0.334
—
—
—
—

ALK

chr2

P9
EML4-
—
chr2/
—
0
10688
0.000
0.000
0
13647
0.000
0.000
0
13521
0.000
0.000

ALK

chr2

P9
FYN-
—
chr6/
—
0
9261
0.000
0.000
0
6826
0.000
0.000
2
10693
0.019
0.019

ROS1

chr6

P9
ROS1-
—
chr6/
—
10
8029
0.125
0.125
1
6485
0.015
0.015
13
9943
0.131
0.131

MKX

chr10

Bolded reporters indicate potential homozygous alleles (see Table 3 and Detailed Methods).

Note that mutant cfDNA percentages for P5 were calculated from the 3 SNVs representing the dominant clone (see FIGS. 11 (a) and 11 (d)); EGFR T790M (chr7: 55249071 C−>T) was not included.

Final allelic percentages reflect any adjustments made based on estimated zygosity (using inferred homozygous reporters) and/or sequencing coverage. See Detailed Methods for details.

In addition to its potential clinical utility, CAPP-Seq analysis promises to yield novel biological insights. For example, in one patient's tumor (P9), we identified both a classic EML4-ALK fusion and two previously unreported fusions involving ROS1: FYN-ROS1 and ROS1-MKX (FIG. 11(e), FIG. 15). While the potential function of these novel ROS1 fusions is unknown, to the best of our knowledge this is the first observation of ROS1 and ALK fusions in the same NSCLC patient. All fusions were confirmed by qPCR amplification of genomic DNA, and were independently recovered in plasma samples (Table 4). Separately, among cases with a ROS1 rearrangement, we found an unexpected enrichment for S34F missense mutations in U2AF1, the 35 kD subunit of the U2 spliceosomal complex auxiliary factor. This SNV was initially described as a recurrent heterozygous mutation in myelodysplastic syndrome (Graubert et al. (2012) Nat. Genet. 44:53-57; Yoshida et al. (2011) Nature 478:64-69). While U2AF1 mutations (Imielinski et al. (2012) Cell 150:1107-1120) and ROS1 translocations (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870) were recently reported to occur individually in ˜3% and ˜1.7% of lung adenocarcinomas, respectively, combining the samples we profiled with publicly available data (Detailed Methods), we observed a significant enrichment for U2AF1 S34F mutations tumors harboring ROS1 fusions (in 3 of 6; P=0.0019; FIG. 11(f), FIG. 16 and Detailed Methods).

Finally, we explored whether CAPP-Seq analysis of cfDNA could potentially be used for cancer screening. As proof-of-principle, we blinded ourselves to the mutations present in each patient's tumor and developed a statistical method to test for the presence of cancer DNA in each pre-treatment plasma sample in our cohort (FIG. 17). This method identified mutant DNA in all plasma samples containing tumor-derived mutant alleles above fractional abundances of 0.5%. Mutant DNA below this level could not be detected by our algorithm, but no mutations were falsely called, indicating the high specificity of this approach (FIG. 11(g) and Detailed Methods). Since ˜95% of nodules identified in patients at high risk for NSCLC by low-dose CT are false positives (Aberle et al. (2011) N. Engl. J. Med. 365:395-409), CAPP-Seq could potentially serve as a complementary noninvasive screening test. However, methodological improvements to further lower the detection threshold will be required to detect early stage tumors.

In conclusion, we have developed a flexible method for ultrasensitive and specific assessment of circulating tumor DNA. CAPP-Seq overcomes limitations of previously proposed methods for cfDNA analysis by simultaneously measuring multiple types of mutations without patient-specific optimization and by covering mutations in the majority of patients. Moreover, due to multiplexing, CAPP-Seq is highly economical, and per sample costs for plasma cfDNA are expected to drop further as NGS costs continue to fall. Our method has the potential to accelerate the personalized detection, therapy, and monitoring of cancer patients. We anticipate that CAPP-Seq will prove valuable in a variety of clinical settings, including the assessment of cancer DNA in alternative biological fluids and specimens with low cancer cell content.

Methods
Patient Selection

Between April 2010 and June 2012, patients undergoing treatment for newly diagnosed or recurrent NSCLC were enrolled in a study approved by the Stanford University Institutional Review Board. Enrolled patients had not received blood transfusions within 3 months of blood collection. Patient characteristics are in Table 3.

Sample Collection and Processing

Peripheral blood from consented patients was collected in EDTA Vacutainer tubes (BD). Blood samples were processed within 3 hours of collection. Plasma was separated by centrifugation at 2,500×g for 10 min, transferred to microcentrifuge tubes, and centrifuged at 16,000×g for 10 min to remove cell debris. The cell pellet from the initial spin was used for isolation of germline genomic DNA from PBLs (peripheral blood leukocytes) with the DNeasy Blood & Tissue Kit (Qiagen). Matched tumor DNA was isolated from FFPE specimens or from the cell pellet of pleural effusions. Genomic DNA was quantified by Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen).

Cell-Free DNA Purification and Quantification

Cell-free DNA (cfDNA) was isolated from 1-5 mL plasma with the QIAamp Circulating Nucleic Acid Kit (Qiagen). Absolute quantification of purified cfDNA was determined by quantitative PCR (qPCR) using an 81 bp amplicon on chromosome 1 (Fan et al. (2008) Proc. Natl Acad. Sci. USA 105:16266-16271) and a dilution series of intact male human genomic DNA (Promega) as a standard curve. Power SyberGreen was used for qPCR on a HT7900 Real Time PCR machine (Applied Biosystems). Standard PCR thermal cycling parameters were used.

Illumina NGS Library Construction

Indexed Illumina NGS libraries were prepared from cfDNA and shorn tumor, germline, and cell line genomic DNA. For patient cfDNA, 7-32 ng DNA was used for library construction without additional shearing or fragmentation. For tumor, germline, and cell line genomic DNA, 69-1000 ng DNA was sheared prior to library construction with a Covaris S2 instrument using the recommended settings for 200 bp fragments. See Table 2 for details.

The NGS libraries were constructed using the KAPA Library Preparation Kit (Kapa Biosystems) employing a DNA Polymerase possessing strong 3′-5′ exonuclease (or proofreading) activity and displaying the lowest published error rate (i.e. highest fidelity) of all commercially available B-family DNA polymerases (Quail et al. (2012) Nat. Methods 9:10-11; Oyola et al. (2012) BMC Genomics 13:1). The manufacturer's protocol was modified to incorporate with-bead enzymatic and cleanup steps (Fisher et al. (2011) Genome Biol. 12:R1). Briefly, following the end repair reaction, Agencourt AMPure XP beads (Beckman-Coulter) were added to bind and wash the DNA fragments. The DNA was then eluted directly into 50 μL 1× A-tailing buffer containing the A-tailing enzyme. Following the A-tailing reaction, the DNA fragments were forced to bind to the same AMPure XP beads by adding 90 μL (1.8×) of PEG buffer (20% PEG-8000 in 2.5M NaCl). After washing, the DNA was eluted into 50 μL 1× ligation buffer with ligase and 100-fold molar excess of indexed Illumina TruSeq adapters. Ligation was performed for 16 hours at 16° C. Single-step size selection was performed by adding 40 μL (0.8×) of PEG buffer to enrich for ligated DNA fragments. The ligated fragments were then amplified using 500 nM Illumina backbone oligonucleotides and a variable number of PCR cycles (between 4 and 9) depending on input DNA mass. In order to minimize bias and maximize recovery of GC-rich templates, all PCR reactions were carried out in a BioRad DNA Engine Thermal Cycler with a ramp rate of 2.2° C./sec or an Eppendorf Vapo Protect Mastercycler with the Safe ramp rate setting.

Library purity and concentration was assessed by spectrophotometer (NanoDrop 2000) and qPCR (KAPA Biosystems), respectively. Fragment length was determined on a 2100 Bioanalyzer using the DNA 1000 Kit (Agilent).

Design of Library for Hybrid Selection

Custom hybrid selection was performed with the SeqCap EZ Choice Library, v2.0 (Roche NimbleGen). The custom SeqCap library was designed through the NimbleDesign portal (v1.2.R1) using genome build HG19 NCBI Build 37.1/GRCh37 and with Maximum Close Matches set to 1. Input genomic regions were selected according to the most frequently mutated genes and exons in NSCLC. These regions were identified from the COSMIC database, TCGA, and other published sources as described in the Detailed Materials. Final selector coordinates are provided in Table 1.

Hybrid Selection and High Throughput Sequencing

NimbleGen SeqCap EZ Choice was used according to the manufacturer's protocol with modifications. Between 9 and 12 indexed Illumina libraries were included in a single capture reaction. Prior to hybrid selection, the libraries were quantified with a NanoDrop 2000 spectrophotometer, and 83-111 ng of each library was added (1 μg total DNA per capture reaction). Following hybrid selection, the captured DNA fragments were amplified with 12-to-14 cycles of PCR using 1× KAPA HiFi Hot Start Ready Mix and 2 μM Illumina backbone oligonucleotides in 4-to-6 separate 50 μL reactions. The reactions were then pooled and processed with the QIAquick PCR Purification Kit (Qiagen). Multiplexed libraries were sequenced using 2×100 bp pared-end runs on an Illumina HiSeq 2000.

Mapping and Quality Control of NGS Data

Paired-end reads were mapped to the hg19 reference genome with BWA 0.6.2 (default parameters) (Li & Durbin (2009) Bioinformatics 25:1754-1760), and sorted/indexed with SAMtools (Li et al. (2009) Bioinformatics 25:2078-2079). QC was assessed using a custom Perl script to collect a variety of statistics, including mapping characteristics, read quality, and selector on-target rate (i.e., number of unique reads that intersect the selector space divided by all aligned reads), generated respectively by SAMtools flagstat, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and BEDTools coverageBed (Quinlan & Hall (2010) Bioinformatics 26:841-842). Importantly, we used a custom version of coverageBed modified to count each read at most once. Plots of fragment length distribution and sequence depth/coverage were automatically generated for visual QC assessment. To mitigate the impact of sequencing errors, analyses not involving fusions were restricted to properly paired reads, and high-quality bases with a Phred quality score of at least 30 (≦0.1% probability of a sequencing error) were further analyzed.

Analysis of Detection Thresholds by CAPP-Seq

Two dilution series were performed to assess the linearity and accuracy of CAPP-Seq for quantitating tumor-derived cfDNA. In one experiment, shorn genomic DNA from a NSCLC cell line (HCC78) was spiked into cfDNA from a healthy individual, while in a second experiment, shorn genomic DNA from one NSCLC cell line (NCI-H3122) was spiked into shorn genomic DNA from a second NSCLC line (HCC78). A total of 32 ng DNA was used for library construction. Following mapping and quality control, homozygous reporters were identified as alleles unique to each sample with at least 20× sequencing depth at an allelic fraction >80%. Fourteen such reporters were identified between HCC78 genomic DNA and plasma cfDNA (FIG. 9 (d), (e)), whereas 24 reporters were found between NCI-H3122 and HCC78 genomic DNA (FIG. 10).

CAPP-Seq Bioinformatics Pipeline

Details of bioinformatics methods are supplied in the Detailed Methods, and a graphical schematic is provided in FIG. 2. Briefly, for detection of SNVs and indels, we employed VarScan 2 (Koboldt et al. (2012) Genome Res. 22:568-576) with strict post-processing filters to improve variant call confidence, and for fusion identification and breakpoint characterization we used a novel algorithm, termed FACTERA (Detailed Methods). To quantify tumor burden in plasma cfDNA, allele frequencies of reporter SNVs/indels were assessed using the output of SAMtools mpileup (Li et al. (2009) Bioinformatics 25:2078-2079), and fusions, if detected, were enumerated with FACTERA.

Statistical Analysis

The NSCLC selector was validated in silico using an independent cohort of lung adenocarcinomas (Imielinski et al. (2012) Cell 150:1107-1120) (FIG. 1(c)). To assess statistical significance, we analyzed the same cohort using 10,000 random selectors sampled from the exome, each with an identical size distribution to the CAPP-Seq NSCLC selector. The performance of random selectors had a Gaussian distribution, and p-values were calculated accordingly. Note that all identified somatic lesions were considered in this analysis.

We used Monte Carlo sampling to estimate the distribution of background alleles across the NSCLC selector (FIG. 9 (a), (c); Detailed Methods). For each plasma sample, background alleles were defined as alleles remaining after exclusion of germline and/or somatic variant calls made by VarScan 2 (Koboldt et al. (2012) Genome Res. 22:568-576) (somatic p-value=0.01; otherwise, default parameters), and with a Phred quality score ≧30. To evaluate the impact of reporter number on tumor burden estimates, we also performed Monte Carlo sampling (1,000×), varying the number of reporters available {1, 2, . . . , max n} in two spiking experiments (FIG. 9 (d)-(f); FIG. 10 (b)-(d)).

To assess the significance of tumor burden estimates in plasma cfDNA, we compared patient-specific SNV frequencies against the null distribution of background SNVs across the selector. Briefly, patient-specific background was quantified using the method described for FIG. 9 (a) (Detailed Methods), but using the number of SNVs identified in the patient's tumor. For patients with at least 1 SNV, but no other reporter types, tumor-derived cfDNA was considered not detectable if mean SNV fractions fell below the 95^thpercentile of background alleles (i.e., P≧0.05) (FIG. 11 (a)). (Due to the ultra-low false detection rate for indels (Minoche et al. (2011) Genome Biol. 12:R112) and fusion breakpoints, these mutation types were considered detected when present with >0 read support.) For patients with detectable disease in only 1 time point, the corresponding empirical p-value is shown in FIG. 11 (a). To assess normality, we analyzed the patient with the most reporter alleles (i.e., P2; FIG. 11 (a)), and found that fractional abundance measurements fit a normal distribution (D'Agostino and Pearson omnibus normality test). Thus, for patients with detectable tumor-derived cfDNA in two time points and with at least 3 cfDNA SNVs/indels, the change in tumor burden was statistically assessed using a two-sided paired t-test. For P9, who lacked reporter SNVs/indels, statistical significance was estimated by correlation of CAPP-Seq measurements with known tumor volume (as measured by CT scans).

Additional details on cell lines, tumor cell sorting, optimizations of library preparation, mutation/translocation validation, CAPP-Seq design and analytical pipelines including FACTERA translocation detection tool, and additional statistical methods are presented in the Detailed Methods.

Detailed Methods
A. Molecular Biology Methods
A1. Cell Lines

The lung adenocarcinoma cell lines NCI-H3122 and HCC78 were obtained from ATCC and DSMZ, respectively, and grown in RPMI 1640 with L-glutamine (Gibco) supplemented with 10% fetal bovine serum (Gembio) and 1% penicillin/streptomycin cocktail. Cells were maintained in mid-log-phase growth in a 37° C. incubator with 5% CO₂. Genomic DNA was purified from freshly harvested cells with the DNeasy Blood & Tissue Kit (Qiagen).

A2. Pleural Fluid Processing and Flow Cytometry, and Cell Sorting

Cells from pleural fluid from patients P9 and P6 were harvested by centrifugation at 300×g for 5 min at 4° C. and washed in FACS staining buffer (HBSS+2% heat-inactivated calf serum [HICS]). Red blood cells were lysed with ACK Lysing Buffer (Invitrogen), and clumps were removed by passing through a 100 μm nylon filter. Filtered cells were spun down and resuspended in staining buffer. While on ice, the cell suspension was blocked for 20 min with 10 μg/mL rat IgG and then stained for 20 min with APC-conjugated mouse anti-human EpCAM (BioLegend, clone 9C4), PerCP-Cy5.5-conjugated mouse anti-human CD45 (eBioscience, clone 2D1), and PerCP-eFluor710-conjugated mouse anti-human CD31 (eBioscience, clone WM59). After staining, cells were washed and resuspended with staining buffer containing 1 μg/mL DAPI, analyzed, and sorted with a FACSAria II cell sorter (BD Biosciences). Cell doublets and DAPI-positive cells were excluded from analysis and sorting. CD31⁻CD45⁻EpCAM⁺ cells were sorted into staining buffer, spun down, and flash frozen in liquid nitrogen. DNA was isolated with the QIAamp DNA Micro Kit (Qiagen).

A3. Optimization of NGS Library Preparation from Low Input cfDNA

Any method for detecting mutant cfDNA relies on its ability to interrogate each cfDNA molecule in the circulation in order to maximize sensitivity. For this reason, we used the QIAamp Circulating Nucleic Acid kit (Qiagen) with carrier RNA as per the manufacturer's protocol to isolate cfDNA. We also took specific steps to improve the Illumina library preparation workflow.

Protocols for Illumina library construction were compared in a step-wise manner with the goal of (1) optimizing adapter ligation efficiency, (2) reducing the necessary number of PCR cycles following adapter ligation, (3) preserving the naturally occurring size distribution of cfDNA fragments, and (4) minimizing variability in depth of sequencing coverage across all captured genomic regions. Initial optimization was done with NEBNext DNA Library Prep Reagent Set for Illumina (New England BioLabs), which includes reagents for end-repair of the cfDNA fragments, A-tailing, adapter ligation, and amplification of ligated fragments with Phusion High-Fidelity PCR Master Mix. Input was 4 ng cfDNA (obtained from plasma of the same healthy volunteer) for all conditions. Relative allelic abundance in the constructed libraries was assessed by qPCR of 4 genomic loci (Roche NimbleGen: NSC-0237, NSC-0247, NSC-0268, and NSC-0272) and compared by the 2^−ΔCtmethod.

Ligations were performed at 20° C. for 15 min (as per the manufacturer's protocol), at 16° C. for 16 hours, or with temperature cycling for 16 hours as previously described (Lund et al. (1996) Nucl. Acids Res. 24:800-801). Ligation volumes were varied from the standard (50 μL) down to 10 μL while maintaining a constant concentration of DNA ligase, cfDNA fragments, and Illumina adapters. Subsequent optimizations incorporated ligation at 16° C. for 16 hours in 50 μL reaction volumes.

Next, we compared standard SPRI bead processing procedures, in which new AMPure XP beads are added after each enzymatic reaction and DNA is eluted from the beads for the next reaction, to with-bead protocol modifications as previously described (Fisher, S. et al. (2011) Genome Biol. 12:R1). We compared 2 concentrations of Illumina adapters in the ligation reaction: 12 nM (10-fold molar excess to cfDNA fragments) and 120 nM (100-fold molar excess).

Using the optimized library preparation procedures, we next compared the NEBNext DNA Library Prep Reagent Set (with Phusion DNA Polymerase) to the KAPA Library Preparation Kit (with KAPA HiFi DNA Polymerase). The KAPA Library Preparation Kit with our modifications was also compared to the NuGEN SP Ovation Ultralow Library System with automation on Mondrian SP Workstation.

A4. Evaluation of Library Preparation Modifications on CAPP-Seq Performance

We performed CAPP-Seq on 32 ng cfDNA using standard library preparation procedures with the NEBNext kit, or with optimized procedures using either the NEBNext kit or the KAPA Library Preparation Kit. In parallel we performed CAPP-Seq on 4 ng and 128 ng cfDNA using the KAPA kit with our optimized procedures. Indexed libraries were constructed, and hybrid selection was performed in multiplex. The post-capture multiplexed libraries were amplified with Illumina backbone primers for 14 cycles of PCR and then sequenced on a paired-end 100 bp lane of an Illumina HiSeq 2000.

We also evaluated CAPP-Seq on ultralow input following whole genome amplification (WGA). For WGA we chose not to use multiple displacement amplification with Φ29 DNA polymerase due given the small size of cfDNA fragments in plasma (FIG. 1(e)), and due to concern for chimera formation, which would confound analysis of recurrent gene fusions in NSCLC by CAPP-Seq. Instead we used SeqPlex DNA Amplification Kit (Sigma-Aldrich), which employs degenerate oligonucleotide primer PCR. We used the upper limit of input into this kit (1 ng) and performed whole genome amplification according to the manufacturer's protocol. Briefly, 1 ng cfDNA was amplified with real-time monitoring with SYBR Green I (Sigma-Aldrich) on a HT7900 Real Time PCR machine (Applied Biosystems). The amplification was terminated after 17 cycles yielding 2.8 μg DNA. The primer removal step yielded ˜600 ng DNA, and this entire amount was used for library preparation using the NEBNext kit with optimized procedures as described above.

A5. Validation of Variants Detected by CAPP-Seq

All structural rearrangements and a subset of tumoral SNVs detected by CAPP-Seq were independently confirmed by qPCR and/or Sanger sequencing of amplified fragments. For HCC78, a 120 bp fragment containing the SLC34A2-ROS1 breakpoint was amplified from genomic DNA using the primers: 5′-AGACGGGAGAAAATAGCACC-3′ and 5′-ACCAAGGGTTGCAGAAATCC-3′. A 141 bp fragment containing exon 2 of U2AF1 was amplified using the primers: 5′-CATGTGTTTGATATCTTCCCAGC-3′ and 5′-CTGGCTAAACGTCGGTTTATTG-3′. For NCI-H3122, a 143 bp fragment containing the EML4-ALK breakpoint was amplified using the primers: 5′-GAGATGGAGTTTCACTCTTGTTGC-3′ and 5′-GAACCTTTCCATCATACTTAGAAATAC-3′. 5 ng genomic DNA was used as template with 250 nM oligos and 1× Phusion PCR Master Mix (NEB) in 50 μL reactions. Products were resolved on 2.5% agarose gel and bands of the expected size were removed. The amplified DNA fragments were purified using the Qiaquick Gel Extraction Kit (Qiagen) and submitted for Sanger sequencing (Elim Biopharm). For P9, genomic DNA breakpoints were confirmed by qPCR using the primers: 5′-TCCATGGAAGCCAGAAC-3′ and 5′-ATGCTAAGATGTGTCTGTCA-3′ for EML4-ALK; 5′-CCTTAACACAGATGGCTCTTGATGC-3′ and 5′-TCCTCTTTCCACCTTGGCTTTCC-3′ for ROS1-MKX; and 5′-GGTTCAGAACTACCAATAACAAG-3′ and 5′-ACCTGATGTGTGACCTGATTGATG-3′ for FYN-ROS1. For qPCR, 10 ng of pre-amplified genomic DNA was used as template with 250 nM oligos and 1× Power SyberGreen Master Mix in 10 μL reactions performed in triplicate on a HT7900 Real Time PCR machine (Applied Biosystems). Standard PCR thermal cycling parameters were used. Amplification of amplicons spanning all 3 breakpoints detected in P9 were confirmed in tumor genomic DNA as well as plasma cfDNA, and PBL genomic DNA was used as a negative control. Separately, at least 88% of SNVs and indels detected were bona fide somatic mutations in tumors, as 38 of 46 of them were independently observed above 0.025% allele frequency in plasma cfDNA and/or were independently confirmed by SNaPshot clinical assays.

B. Bioinformatics and Statistical Methods
B1. Analysis of CAPP-Seq Background

The CAPP-Seq background rate was estimated by Monte Carlo sampling of allelic frequencies across the NSCLC selector (FIG. 9 (a)). Plasma cfDNA samples were pre-filtered to remove all variant calls and dominant alleles. Specifically, for each patient, we excluded germline, loss of heterozygosity (LOH), and/or somatic variant calls made by VarScan 2 (Koboldt et al. (2012) Genome Res. 22:568-576) (somatic p-value=0.01; otherwise, default parameters). We sampled 4 random background alleles across this subset of the selector (equal to the median number of SNVs per NSCLC patient detected by CAPP-Seq) and calculated their mean allelic frequency, only considering bases discordant with the prevailing genotype of the plasma sample at those 4 positions. This process was iterated 10,000 times, and mean, median, and 75^thpercentile statistics were collected. The entire procedure was then repeated for 5 total simulations, shown in FIG. 2a.

We likewise applied Monte Carlo simulation to estimate the probability of finding a background allele in plasma cfDNA at a given fractional abundance (FIG. 9 (c)). For consistency with the ranking of alleles in FIG. 9 (c), we populated a vector containing the mean background allele frequency for each genomic position across 7 plasma cfDNA samples, each filtered to remove dominant alleles as described above. Alleles were randomly sampled from this vector 10,000 times to identify the allele frequency with an empirical p-value of 0.01.

B2. ROS1 and U2AF1 Co-Association Analysis
B2.1 Assembly of ROS1 and U2AF1 Mutant NSCLC

We included only cases in which the status of both ROS1 fusion status and U2AF1 S34 mutation was known. There were 163 such cases from TCGA (genotyped for U2AF1 by whole exome sequencing and for ROS1 fusions by RNA-Seq as detailed below), 23 cases from Imielinski et al. (2012) Cell 150:1107-1120, 17 cases from Govindan et al. (2012) Cell 150:1121-1134, and 13 cases from the present study (11 patients and 2 NSCLC cell lines). U2AF1 S34F mutations were detected in 11 cases (5 from TCGA, 3 from Imielinski et al., 1 from Govindan et al., and 2 from the present study), and ROS1 fusions were detected in 6 cases (2 from TCGA, described below, and 4 from the present study). Significance testing was performed using the Fisher's exact test, and a two-tailed P-value is reported.

B2.2. Analysis of Whole Transcriptome Sequencing Data from TCGA for ROS1 Fusions

We identified two TCGA lung adenocarcinoma patients, TCGA-05-4426 and TCGA-64-1680, harboring candidate ROS1 fusions (FIG. 16 (a)) Importantly, the latter patient also has the U2AF1 S34F missense mutation reported in this study and in prior literature (see above). To further analyze both patients' putative rearrangements, whole transcriptome RNA-Seq data (.bam files) were obtained using the UCSC GeneTorrent system (https://cghub.ucsc.edu/downloads.html) and realigned to hg19 using BWA 0.6.2 using default parameters (Li & Durbin (2009) Bioinformatics 25:1754-1760) Importantly, mapped RNA-Seq reads extended significantly past coding regions, allowing for improved assessment of fusion events (FIG. 16 (b), (c)). From a manual inspection of associated RPKM expression data across ROS1 exons (FIG. 16 (a)), we suspected that breakpoint sites for these fusions may lie directly upstream of ROS1 exons 32 and 35, respectively. Using the Integrated Genome Viewer (IGV) (Robinson et al. (2011) Nat. Biotechnol. 29:24-26), we found improperly paired (or discordant) reads near these exons that link ROS1 to its well-described partners, SLC34A2 and CD74, respectively (FIG. 16 (b), (c)). Indeed, by applying FACTERA's templated fusion discovery (detailed below) to patient TCGA-64-1680, we recovered a single read near ROS1 exon 35 that also maps to CD74 (FIG. 16 (c)). Collectively, these data strongly support the existence of expressed ROS1 fusions in these two TCGA patients.

B3. CAPP-Seq Selector Design

Most human cancers are relatively heterogeneous for somatic mutations in individual genes. Specifically, in most human tumors, recurrent somatic alterations of single genes account for a minority of patients, and only a minority of tumor types can be defined using a small number of recurrent mutations (<5-10) at predefined positions. Therefore, the design of the selector is vital to the CAPP-Seq method because (1) it dictates which mutations can be detected in with high probability for a patient with a given cancer, and (2) the selector size (in kb) directly impacts the cost and depth of sequence coverage. For example, the hybrid selection libraries available in current whole exome capture kits range from 51-71 Mb, providing ˜40-60 fold maximum theoretical enrichment versus whole genome sequencing. The degree of potential enrichment is inversely proportional to the selector size such that for a ˜100 kb selector, >10,000 fold enrichment should be achievable.

We employed a six-phase design strategy to identify and prioritize genomic regions for the CAPP-Seq NSCLC selector as detailed below. Three phases were used to incorporate known and suspected NSCLC driver genes, as well as genomic regions known to participate in clinically actionable fusions (phases 1, 5, 6), while another three phases employed an algorithmic approach to maximize both the number of patients covered and SNVs per patient (phases 2-4). The latter relied upon a metric that we termed “Recurrence Index” (RI), defined as the number of NSCLC patients with SNVs that occur within a given kilobase of exonic sequence (i.e., No. of patients with mutations/exon length in kb). RI thus serves to measure patient-level recurrence frequency at the exon level, while simultaneously normalizing for gene/exon size. As a source of somatic mutation data uniformly genotyped across a large cohort of patients, in phases 2-4, we analyzed non-silent SNVs identified in TCGA whole exome sequencing data from 178 patients in the Lung Squamous Cell Carcinoma dataset (SCC) (Hammerman et al. (2012) Nature 489:519-525) and from 229 patients in the Lung Adenocarcinoma (LUAD) datasets (TCGA query date was Mar. 13, 2012). Thresholds for each metric (i.e. RI and patients per exon) were selected to statistically enrich for known/suspected drivers in SCC and LUAD data (FIG. 9). RefSeq exon coordinates (hg19) were obtained via the UCSC Table Browser (query date was Apr. 11, 2012).

The following algorithm was used to design the CAPP-Seq selector (parenthetical descriptions match design phases noted in FIG. 1 (b)).

Phase 1 (Known Drivers)

Initial seed genes were chosen based on their frequency of mutation in NSCLCs.

Analysis of COSMIC (v57) (Forbes et al. (2010) Nucl. Acids Res. 38:D652-657) identified known driver genes that are recurrently mutated in ≧9% of NSCLC (denominator ≧500 cases). Specific exons from these genes were selected based on the pattern of SNVs previously identified in NSCLC. The seed list also included single exons from genes with recurrent mutations that occurred at low frequency but had strong evidence for being driver mutations, such as BRAF exon 15, which harbors V600E mutations in <2% of NSCLC (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181; Okuda et al. (2008) Cancer Sci. 99:2280-2285; Su et al. (2011) J. Mol. Diagn. 13:74-84; Tsao et al. (2007) J. Clin. Oncol. 25:5240-5247; Chaft et al. (2012) Mol. Cancer Ther. 11:485-491; Paik et al. (2011) J. Clin. Oncol. 29:2046-2051; Stephens et al. (2004) Nature 431:525-526; Jin et al. (2010) Lung Cancer 69:279-283; Malanga et al. (2008) Cell Cycle 7:665-669).

Phase 2 (Max. Coverage)

For each exon with SNVs covering ≧5 patients in LUAD and SCC, we selected the exon with highest RI that identified at least 1 new patient when compared to the prior phase. Among exons with equally high RI, we added the exon with minimum overlap among patients already captured by the selector. This was repeated until no further exons met these criteria.

Phase 3 (RI≧30)

For each remaining exon with an RI≧30 and with SNVs covering ≧3 patients in LUAD and SCC, we identified the exon that would result in the largest reduction in patients with only 1 SNV. To break ties among equally best exons, the exon with highest RI was chosen. This was repeated until no additional exons satisfied these criteria.

Phase 4 (RI≧20)

Same procedure as phase 3, but using RI≧20.

Phase 5 (Predicted Drivers)

We included all exons from additional genes previously predicted to harbor driver mutations in NSCLC (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181).

Phase 6 (Add Fusions)

For recurrent rearrangements in NSCLC involving the receptor tyrosine kinases ALK, ROS1, and RET, the introns most frequently implicated in the fusion event and the flanking exons were included.

All exons included in the selector, along with their corresponding HUGO gene symbols and genomic coordinates, as well as patient statistics for NSCLC and a variety of other cancers, are provided in Table 1, organized by selector design phase.

C. CAPP-Seq Computational Pipeline
C1. Mutation Discovery: SNVs/Indels

For detection of somatic SNV and insertion/deletion events, we employed VarScan 2 (Koboldt et al. (2012) Genome Res 22:568-576) (somatic p-value=0.01, minimum variant frequency=5%, and otherwise default parameters). Somatic variant calls (SNV or indel) present at less than 0.5% mutant allelic frequency in the paired normal sample (PBLs), but in a position with at least 1000× overall depth in PBLs and 100× depth in the tumor, and with at least 1× read depth on each strand, were retained (Table 3). While the selector was designed to predominantly capture exons, in practice, it also captures limited sequence content flanking each targeted region. For instance, this phenomenon is the basis for the (thus far) uniformly successful recovery by CAPP-Seq of fusion partners (which are not included within the selector) for kinase genes such as ALK and ROS1 recurrently rearranged in NSCLC. As such, we also considered variant calls detected within 500 bps of defined selector coordinates. These calls were eliminated if present in non-coding repeat regions, since repeats may confound mapping accuracy. Repeat sequence coordinates were obtained using the RepeatMasker track in the UCSC table browser (hg19). Variant annotation was performed using the SeattleSeq Annotation 137 web server (http://snp.gs.washington.edu/SeattleSeqAnnotation137/). Complete details for all identified SNVs and indels are provided in Table 2.

By manual inspection, two patients (P2 and P6) had SNVs with frequencies consistent with potential heterozygous and homozygous alleles. We labeled these alleles accordingly (Table 3), and based on our assumption of zygosity in these two patients, we adjusted measured fractions of heterozygous reporters in plasma cfDNA to better estimate tumor burden (Table 4).

C2. Mutation Discovery: Fusions

For practical and robust de novo enumeration of genomic fusion events and breakpoints from paired-end next-generation sequencing data, we developed a novel heuristic approach, termed FACTERA (FACile Translocation Enumeration and Recovery Algorithm). FACTERA has minimal external dependencies, works directly on a preexisting .bam alignment file, and produces easily interpretable output. Major steps of the algorithm are summarized below, and are complemented by a graphical schematic to illustrate key elements of the breakpoint identification process (FIG. 4).

As input, FACTERA requires a .bam alignment file of paired-end reads produced by BWA (Li & Durbin (2009) Bioinformatics 25:1754-1760), exon coordinates in .bed format (e.g., hg19 RefSeq coordinates), and a 0.2 bit reference genome to enable fast sequence retrieval (e.g., hg19). In addition, the analysis can be optionally restricted to reads that overlap particular genomic regions (.bed file), such as the CAPP-Seq selector used in this work.

FACTERA processes the input in three sequential phases: identification of discordant reads, detection of breakpoints at base pair-resolution, and in silico validation of candidate fusions. Each phase is described in detail below.

C2.1. Identification of Discordant Reads

To iteratively reduce the sequence space for gene fusion identification, FACTERA, like other algorithms (e.g. BreakDancer (Chen et al. (2009) Nat. Methods 6:677-681)), identifies and classifies discordant read pairs. Such reads indicate a nearby fusion event since they either map to different chromosomes or are separated by an unexpectedly large insert size (i.e. total fragment length), as determined by the BWA mapping algorithm. The bitwise flag accompanying each aligned read encodes a variety of mapping characteristics (e.g., improperly paired, unmapped, wrong orientation, etc.) and is leveraged to rapidly filter the input for discordant pairs. The closest exon of each discordant read is subsequently identified, and used to cluster discordant pairs into distinct gene-gene groups, yielding a list of genomic regions R adjacent to candidate fusion sites. For each member gene of a discordant gene pair, the genomic region R_iis defined by taking the minimum of all 3′ exon/read coordinates in the cluster, and the maximum of all 5′ exon/read coordinates in the cluster. These regions are used to prioritize the search for breakpoints in the next phase (FIG. 4 (a)).

C2.2 Detection of Breakpoints at Base Pair-Resolution

Discordant read pairs may be introduced by NGS library preparation and/or sequencing artifacts (e.g., jumping PCR). However, they are also likely to flank the breakpoints of bona fide fusion events. As such, all discordant gene pairs identified in the preceding of one read matches the soft-clipped region of the other, FACTERA records a putative fusion event. To assess inter-read concordance (e.g. see reads 1 and 2 in FIG. 4 (c)), FACTERA employs the following algorithm. The mapped region of read 1 is parsed into all possible subsequences of length k (i.e., k-mers) using a sliding window (k=10, by default). Each k-mer, along with its lowest sequence index in read 1, is stored in a hash table data structure, allowing k-mer membership to be assessed in constant time (FIG. 4 (c), left panel). Subsequently, the soft clipped sequence of read 2 is parsed into non-overlapping subsequences of length k, and the hash table is interrogated for matching k-mers (FIG. 4 (c), right panel). If a minimum matching threshold is achieved (=0.5×the minimum length of the two compared subsequences), then the two reads are considered concordant. FACTERA will process at most 1000 (by default) putative breakpoint pairs for each discordant gene pair. Moreover, for each gene pair, FACTERA will only compare reads whose orientations are compatible with valid fusions. Such reads have soft-clipped sequences facing opposite directions (FIG. 4 (d), top panel). When this condition is not satisfied, FACTERA uses the reverse complement of read 1 for k-mer analysis (FIG. 4 (d), bottom panel).

In some instances, genomic subsequences flanking the true breakpoint may be nearly or completely identical, causing the aligned portions of soft-clipped reads to overlap. Unfortunately, this prevents an unambiguous determination of the breakpoint. As such, FACTERA incorporates a simple algorithm to arbitrarily adjust the breakpoint in one read (i.e., read 2) to match the other (i.e., read 1). Depending upon read orientation, there are two ways this can occur, both of which are illustrated in FIG. 4 (e). For each read, FACTERA calculates the distance between the breakpoint and the read coordinate corresponding to the first k-mer match between reads. For example, as anecdotally illustrated in FIG. 4 (e), x is defined as the distance between the breakpoint coordinate of read 1 and the index of the first matching k-mer, j, whereas y denotes the corresponding distance for read 2. The offset is estimated as the difference in distances (x, y) between the two reads (see FIG. 4 (e)).

C2.3. In Silico Validation of Candidate Fusions

To confirm each candidate breakpoint in silico, FACTERA performs a local realignment of reads against a template fusion sequence (±500 bp around the putative breakpoint) extracted from the 0.2 bit reference genome. BLAST is currently employed for this purpose, although BLAT or other fast aligners could be substituted. A BLAST database is constructed by collecting all reads that map to each candidate fusion sequence, including discordant reads and soft-clipped reads, as well as all unmapped reads in the original input .bam file. All reads that map to a given fusion candidate with at least 95% identity and a minimum length of 90% of the input read length (by default) are retained, and reads that span or flank the breakpoint are counted. As a final step, output redundancies are minimized by removing fusion sequences within a 20 bp interval of any fusion sequence with greater read support and with the same sequence orientation (to avoid removing reciprocal fusions).

FACTERA produces a simple output text file, which includes for each fusion sequence, the gene pair, the chromosomal sequence coordinates of the breakpoint, the fusion orientation (e.g., forward-forward or forward-reverse), the genomic sequences within 50 bp of the breakpoint, and depth statistics for reads spanning and flanking the breakpoint. Fusions identified in patients analyzed in this work are provided in Table 3.

C2.4. Experimental Validation of FACTERA

To experimentally evaluate the performance of FACTERA, we generated NGS data from two NSCLC cell lines, HCC78 (21.5M×100 bp paired-end reads) and NCI-H3122 (19.4M×100 bp paired-end reads), each of which has a known rearrangement (ROS1 and ALK, respectively) (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870; McDermott et al. (2008) Cancer Res. 68:3389-3395) with a breakpoint that has, to the best of our knowledge, not been previously published. FACTERA readily revealed evidence for a reciprocal SLC34A2-ROS1 translocation in the former and an EML4-ALK fusion in the latter. Precise breakpoints predicted by FACTERA were experimentally validated by PCR amplification and Sanger sequencing (FIG. 5; see also Validation of Variants Detected by CAPP-Seq). Importantly, FACTERA completed each run in practical time (˜90 sec), using only a single thread on a hexa-core 3.4 GHz Intel Xeon E5690 chip. These initial results illustrate the utility of FACTERA as part of the CAPP-Seq analysis pipeline.

C2.5. Templated Fusion Discovery

We implemented a user-directed option to “hunt” for fusions within expected candidate genes. A fusion could be missed by FACTERA if the fusion detection criteria employed by FACTERA are incompletely satisfied—such as if discordant reads, but not soft-clipped reads, are identified—and will most likely occur when fusion allele frequency in the tumor is extremely low. As input, the method is supplied with candidate fusion gene sequences as “baits”. All unmapped and soft-clipped reads in the input .bam file are subsequently aligned to these templates (using blastn) to identify reads that have sufficient similarity to both (for each read, 95% identity, e-value <1.0e-5, and at least 30% of the read length must map to the template, by default). Such reads are output as a list to the user for manual analysis.

We tested this simple approach on a low purity tumor sample found to harbor an ALK fusion by FISH, but not FACTERA (i.e., case P9). Using templates for ALK and its common fusion partner, ELM4, we identified 4 reads that mapped to both, in a region with an overall depth of ˜1900×. The estimated allele frequency of 0.21% is strikingly similar to the 0.22% tumor purity measured by FACS (FIG. 15), confirming the utility of the templated fusion discovery method. We subsequently FACS-depleted CD45+ immune populations and re-sequenced this patient's tumor. In the enriched tumor sample, FACTERA identified the EML4-ALK fusion, along with two novel ROS1 fusions (FIG. 4 (e), Table 3).

C3. Mutation Recovery: SNVs/Indels

Using a custom Perl script, previously identified reporter alleles were intersected with a SAMtools mpileup file generated for each plasma cfDNA sample, and the number and frequency of supporting reads was calculated for each reporter allele. Only reporters in properly paired reads at positions with at least 500× overall depth were considered.

C4. Mutation Recovery: Fusions

For enumeration of fusion frequency in sequenced plasma DNA, FACTERA executes the last step of the discovery phase (i.e., in silico validation of candidate fusions, above) using the set of previously identified fusion templates. The fusion allele frequency is calculated as α/β, where α is the number of breakpoint-spanning reads, and β is the mean overall depth within a genomic region ±5 bps around the breakpoint. Regarding the NSCLC selector described in this work, the latter calculation was always performed on the single gene contained in the NSCLC selector library. If both fusion genes are targeted within a selector library, overall depth is estimated by taking the mean depth calculated for both genes.

Notably, in some cases we observed lower fusion allele frequencies than would be expected for heterozygous alleles (e.g., see cell line fusions in Table 3). This was seen in cell lines, in an empirical spiking experiment, and in one patient's tumor and plasma samples (i.e., P6), and could potentially result from inefficient “pull-down” of fusions whose partners are not represented in the selector. Regardless, fusions are useful reporters—they possess virtually no background signal and show linear behavior over defined concentrations in a spiking experiment (FIG. 10 (d)). Moreover, allelic frequencies in plasma are easily adjusted for such inefficiencies by dividing the measured frequency in plasma by the corresponding frequency in the tumor. In cases where sequenced tumor tissue is impure, tumor content can be estimated using the frequencies of SNVs (or indels) as a reference frame, allowing the fusion fraction to be normalized accordingly (Table 4). As for SNVs/indels, only fusions present in at least one plasma sample were included in calculations of tumor burden.

C5. Screening Plasma cfDNA without Knowledge of Tumor DNA

We devised the following statistical algorithm as a first step toward non-invasive cancer screening with plasma cfDNA. The method identifies candidate SNVs using iterative models of (i) background noise in paired germline DNA (in this work, PBLs), (ii) base-pair resolution background frequencies in plasma cfDNA across the selector, and (iii) sequencing error in cfDNA. Anecdotal examples are provided in FIG. 17. The algorithm works in four main steps, detailed below.

As input, the algorithm takes allele frequencies from a single plasma cfDNA sample and analyzes high quality background alleles, defined in a first step for each genomic position as the non-dominant base with highest fractional abundance. Only alleles with depth of at least 500× and strand bias <90% (conservative, by default) are analyzed. For consistency with variant calling, we allowed the screening approach to interrogate selector regions within 500 bp of defined coordinates, expanding the effective sequence space from ˜125 kb to ˜600 kb.

Second, the binomial distribution is used to test whether a given input cfDNA allele is significantly different from the corresponding paired germline allele (FIG. 17 (a)-(b)). Here the probability of success is taken to be the frequency of the background allele in PBLs, and the number of trials is the allele's corresponding depth in plasma cfDNA. To avoid contributions from alleles in rare circulating tumor cells that might contaminate PBLs, input alleles with a fractional abundance greater than 0.5% in paired PBLs (by default) or a Bonferroni-adjusted binomial probability greater than 2.08×10⁻⁸are not further considered (alpha of 0.05/[˜600 kb*4 alleles per position]).

Third, a database of cfDNA background allele frequencies is assembled. Here, we used samples analyzed in the present study (i.e., pre-treatment NSCLC samples and 1 sample from a healthy volunteer), except the input sample is left out to avoid bias. Based on the assumption that all background allele fractions follow a normal distribution, a Z-test is employed to test whether a given input allele differs significantly from typical cfDNA background at the same position (FIG. 17 (a)-(b)). All alleles within the selector are evaluated, and those with an average background frequency of 5% or greater (by default) or a Bonferroni-adjusted single-tailed Z-score <5.6 are not further considered (alpha of 0.05, adjusted as above).

Finally, candidate alleles are tested for remaining possible sequencing errors. This step leverages the observation that non-tumor variants (i.e., “errors”) in plasma cfDNA tend to have a higher duplication rate than bona fide variants detectable in the patient's tumor (data not shown). As such, the number of supporting reads is compared for each input allele between nondeduped (all fragments meeting QC criteria; see Methods) and deduped data (only unique fragments meeting QC criteria). An outlier analysis is then used to distinguish candidate tumor-derived SNVs from remaining background noise (FIG. 17 (a)-(c)). Specifically, to reveal outlier tendency in the data, the square root of the robust distance Rd (Mahalanobis distance) is compared against the square root of the quantiles of a chi-squared distribution Cs. This transformation reveals natural separation between true SNVs and false positives in cancer patients (FIG. 17 (a), (c)), and notably, reveals an absence of outlier structure in patient samples lacking tumor-derived SNVs (FIG. 17 (b), (c)). To automatically call SNVs without prior knowledge, the screening approach iterates through data points by decreasing Rb and recalculating the Pearson's correlation coefficient Rho between Rd and Cs for points 1 to i, where Rd_iis the current maximum Rd. The algorithm iteratively reports outliers (i.e., candidate SNVs) until it terminates when Rho≧0.85.

Importantly, this approach positively identified 60% of the cancer samples with tumor-derived SNVs analyzed in this study with no false positive calls (FIG. 11 (g)). When corresponding germline DNA from PBLs are unavailable, one can skip the 2^ndstep in this screening routine. After removal of germline SNVs with an allelic fraction >20%, this modified approach identified no SNVs when applied to a healthy volunteer.

All patents, patent publications, and other published references mentioned herein are hereby incorporated by reference in their entireties as if each had been individually and specifically incorporated by reference herein.

While specific examples have been provided, the above description is illustrative and not restrictive. Any one or more of the features of the previously described embodiments can be combined in any manner with one or more features of any other embodiments in the present invention. Furthermore, many variations of the invention will become apparent to those skilled in the art upon review of the specification. The scope of the invention should, therefore, be determined by reference to the appended claims, along with their full scope of equivalents.

IDENTIFICATION AND USE OF CIRCULATING TUMOR MARKERS

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

STATEMENT OF GOVERNMENTAL SUPPORT

Provisional Applications (1)