The following relates to the medical arts, oncology arts, genomic arts, and related arts. It is described with particular reference to oncological tumor delineation applications; however, the following is more generally applicable in medical or veterinary research and development, screening, diagnosis, clinical monitoring of metastasis or other conditions, interventional planning, and other medical or veterinary applications directed toward oncological conditions and other adverse conditions.
Cancer arises when normal body cells mutate or otherwise transform into cancerous cells that divide and multiply in an uncontrolled manner. In some cancers the cancerous cells remain localized, at least initially, so as to form a malignant tumor which often invades surrounding tissue with micro infiltrations. At this point the cancer can sometimes be treated by removing the tumor; however, such removal should be complete otherwise the remaining cancer cells can continue to multiply and lead to a recurrence of the cancer. In addition to surgical removal, an adjuvant andor neoadjuvant therapy or therapies may be applied, such as radiation therapy, chemotherapy, or so forth, which may address any incompleteness of the malignant tissue removal. A cancer metastasizes when it becomes delocalized and spreads to substantial portions of the body through the bloodstream or through the lymphatic system. Metastatic cancer is typically treated by administration of drugs (chemotherapy) or radiation in the form of radioactive implants (brachytherapy) or direct application of ionizing radiation (radiation therapy). These techniques may also be used prior to metastasis, either instead of surgical tumor removal in cases for which surgical removal of the malignancy is contraindicated, or in addition to surgical tumor removal to cull any cancer cells that remain after the tumor removal.
A known tool for cancer identification is genetic analysis. Typically, this entails performing genotyping to identify whether a suspect cell includes a particular genetic variant, or combination of variants, that has (have) been shown in clinical studies to correlate with a type of cancer. Ongoing oncology research is continually expanding the database of such genetic signatures for identifying various types of cancer.
The effectiveness of these genetic approaches is contingent upon there being a known genetic signature for the specific cancer condition of the subject (e.g., human oncology patient or veterinary oncology subject) under investigation. This may not always be the case. Some variants that are actually related to cancer may be novel (e.g., specific to a particular subject and not generally observed in the pool of patients with that cancer), or may be population specific (e.g., specific to a particular ethnic group, gender, geographical region, or so forth).
Although the number of variant-cancer correlations identified in the oncology literature is always expanding, which should in principle, increase the effectiveness of genetic analysis for cancer diagnosis, there are practical limitations. The adoption of newly published variants for clinical diagnosis and monitoring can be delayed by concerns about validation andor by government regulatory delays. Moreover, a larger variant database translates into longer processing time as more and more variants must be acquired and tested. Acquisition delays can be reduced by acquiring a whole genome sequence (WGS) using advanced sequencing technologies. The downstream processing delays, however, are not reduced by WGS acquisition.
Moreover, the variants database cannot encompass unique (or nearly unique) variants that occur in a portion of the cancer pool that is too small to be statistically detectable in clinical studies. A larger variants database also increases the likelihood of ambiguous or irreconcilable data, such as studies drawing contradictory conclusions as to the correlation (or lack thereof) between a particular variant and a particular cancer. In such cases existing genetic analyses are unlikely to yield a clinically useful result.
The following contemplates improved apparatuses and methods that overcome the aforementioned limitations and others.
According to one aspect, a method comprises: processing a suspect tissue sample acquired from a subject to generate a suspect whole genome sequence; processing a normal tissue sample acquired from the subject to generate a normal whole genome sequence; computing a whole genome sequence comparison metric comparing the suspect whole genome sequence with the normal whole genome sequence; and identifying whether the suspect tissue sample comprises cancer tissue based on the computed whole genome sequence comparison metric.
According to another aspect, a non-transitory storage medium stores instructions executable by an electronic data processing device to perform a method as set forth in the immediately preceding paragraph. According to another aspect, an apparatus comprises an electronic data processing device configured to perform a method as set forth in the immediately preceding paragraph. According to another aspect, a method as set forth in the immediately preceding paragraph further comprises: acquiring tissue samples from the subject at a plurality of sampling locations in or near a tumor; recording the sampling locations; performing the processing, computing, and identifying for each tissue sample; and delineating a boundary of the tumor based on the identifying and the recorded sampling locations.
According to another aspect, a method comprises: classifying tissue samples acquired from a subject at sampling locations in or near a tumor respective to cancer based on genetic testing of the tissue samples; and delineating a boundary of the tumor based on the classifying and knowledge of the sampling locations from which the samples were acquired.
According to another aspect, a method comprises: acquiring a plurality of probative tissue samples from a subject in or near a tumor; recording the sampling locations of the probative tissue samples; classifying each probative tissue sample respective to cancer based on genetic testing of the probative tissue sample; and delineating a boundary of the tumor based on the classifications of the probative tissue samples and the recorded sampling locations.
One advantage resides in providing identification of cancer cells based on WGS data with sufficient rapidity for use in time-critical clinical application such as tumor delineation preparatory to an interventional oncology procedure.
Another advantage resides in providing cancer cell identification based on WGS that is not reliant upon calling specific cancer-correlative variants.
Another advantage resides in providing broad-based cancer cell identification that is not limited to specific known cancer types having identified correlative genetic variants.
Another advantage resides in providing tumor delineation that is not dependent upon the cancer cells exhibiting distinctive morphology or staining characteristics.
Numerous additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description.
The invention may take form in various components and arrangements of components, and in various process operations and arrangements of process operations. The drawings are only for the purpose of illustrating preferred embodiments and are not to be construed as limiting the invention.
Existing genetic analyses correlate observable genetic variants with specific types of cancer. This approach assumes that cancers fall into well-defined types, and that a given type of cancer can be characterized by correlative genetic variants that are common to patients (or veterinary subjects, in the veterinary context) having that type of cancer.
However, it is recognized herein that these assumptions may not be met in many situations. For example, reported studies in both oestrogen receptor-positive and oestrogen receptor-negative breast cancer have shown that substantial complexity and heterogeneity is actually observed between cancer genomes from different patients with the same breast cancer histopathological phenotype (inter-tumoural heterogeneity). See Shah et al., “Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution”, Nature vol. 461 pages 809-813 (2009); Stephens et al., “Complex landscapes of somatic rearrangement in human breast cancer genomes”, Nature vol. 462 pages 1005-1010 (2009); and Ding et al., “Genome remodelling in a basal-like breast cancer metastasis and xenograft”, Nature vol. 464, pages 999-1005 (2010). For example, none of the novel fusion genes identified by Stephens et al. were present more than once in any of the twenty-four cancers studied, and three expressed in-frame fusion genes selected for follow-up were not present in an additional 288 breast cancers studied as reported in Shah et al. Another study has described substantial heterogeneity within individual breast tumors (intra-tumoral heterogeneity), where multiple tumor subpopulations have been identified, each with distinct genomic profiles. See Navin et al., “Inferring tumor progression from genomic heterogeneity”, Genome Res. Vol. 20 pages 68-80 (2010).
Moreover, it is known that differences in variant-cancer correlation can occur between populations, such that genomic signatures (e.g., mutations, single-nucleotide polymorphisms i.e. SNPs, insertions or deletions i.e. indels, etc.) reported in literature for a particular population may be inappropriate for use in the other population. For example, in one study of sequence variants flagged as disease mutations, 74% of the studied variants turned out to be polymorphisms. Still further, even if a mutation is cited in literature as correlating with a certain type of cancer, this does not guarantee that it indeed is the causative mutation. In fact 27% of the cited disease mutations were found to be likely polymorphisms or to be misannotated in the same study.
Indeed, the conventional model for carcinogenesis, namely a gradual accumulation of individual, relatively discrete genetic mutations transitioning normal cells into cancer cells, has been challenged. For example, a recently developed model for some instances of carcinogenesis is chromothripsis. In this model, a chromosome undergoes large scale fracturing followed by inaccurate reassembly. Stephens et al., “Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development”, Cell vol. 144 no. 1 pages 27-40 (January 2011). The chromothripsis model does not predict that a particular type of cancer would be likely to be associated with correlative discrete genetic variants. Another model that is becoming popular hypothesizes driver and passenger mutations. This model is based on the observation that many cancer genomes are riddled with mutations. In this model, the vast majority of these mutations are likely to be passengers that is, mutations that do not contribute to the development of cancer but instead have occurred during the growth of the cancer. See http:www.news-medical.net/news/20100219/Cancer-genomes-Distinguishing-between-driver-and-passenger-mutations.aspx (last accessed Oct. 27, 2011). According to this model, most of the mutations in the biological databases will be passenger mutations.
Cancer identification techniques disclosed herein reduce or eliminate reliance upon literature-based cancer-correlative genetic variants. The disclosed techniques rely instead upon first principles considerations that are expected to be valid for all cancers regardless of the carcinogenesis mechanism. The disclosed techniques also leverage the availability of a whole genome sequence (WGS) which is provided by some existing commercially available genome sequencers or sequencing services (suitable sequencers or sequencing services are available, for example, from: Illumina®, San Diego, Calif., USA; Knome®, Cambridge, Mass., USA; Roche 454 (available from Roche, Basel, Switzerland); and Ion Torrent, Guilford, Conn., USA.
The techniques disclosed herein are premised on the following observation: All cancers are associated with abnormal changes to the genome. This is true regardless of the particular mechanism of carcinogenesis, and regardless of the particular type of cancer. Based on this observation, the disclosed techniques rely upon comparison of the WGS of a suspect cell with the WGS of a normal cell from the same individual. If the suspect cell is indeed a cancer cell, then the difference between its WGS and the WGS of a normal cell from the same individual is expected to be larger than the difference between the WGS of two different normal cells from the same individual. Thus, by comparing the WGS of a suspect tissue sample taken from a subject (e.g., a human medical subject, or a veterinary subject) with the WGS of a normal tissue sample taken from the same subject, the likelihood that the suspect tissue sample actually comprises cancer tissue is readily assessed. The WGS of normal tissue is employed as a filter to remove portions of the genome that are unrelated to cancer, leaving only the unique variants that are probative of whether the suspect tissue is actually cancer tissue.
This approach has substantial advantages. It substantially reduces the likelihood of misinterpreting a benign (i.e., not cancer-related) variant as a cancer signature, since such benign variants will be filtered out by comparison with the normal WGS of the same subject. On the other hand, a unique cancer-related variant that would not be detected by comparison with variant-cancer correlates from the literature is readily detected using the disclosed approach.
The disclosed approach determines whether the suspect tissue sample comprises cancer; however, it does not identify which type of cancer. The skilled artisan might view this as a substantial disadvantage for cancer diagnosis and monitoring. However, it is recognized herein that this potentially perceived disadvantage is not as substantial as might initially be thought. First, because the disclosed approaches do not rely upon exhaustive comparison of genetic material with a reference database of variants, they are substantially faster than conventional variant-based cancer identification. Thus, they can be used in initial cancer screening (with follow-up in the form of a conventional variant-based cancer identification in cases where the disclosed approach indicates a likelihood of cancer). The disclosed approaches are also useful in cancer monitoring, since in that case the type of cancer is (usually) already known and the information being sought is the progression of the cancer. As further disclosed herein, the speed of the disclosed approaches for even make them viable techniques for use in delineating a tumor during planning for an interventional procedure such as surgical removal or radiation therapy.
With reference to
In any of these embodiments, the sampling laboratory 8 extracts at least two tissue samples from the subject 6, namely a “suspect” tissue sample 10 and a “normal” tissue sample 12. The suspect tissue sample 10 is a tissue sample acquired from a location or region of the subject 6 that is suspected of comprising cancer tissue. For example, the suspect tissue sample 10 may be acquired from a tumor suspected or known to be malignant (it is to be understood that as used herein “suspected” encompasses “known”), or from a lung suspected to have lung cancer, or from a breast cancer lesion known or suspected to be malignant, or so forth. The normal tissue sample 12 is acquired from the same subject 6, but from a region or location of the subject 6 that is effective to ensure that the normal tissue sample 12 does not comprise cancer tissue. The identification of such a “normal” region from which the normal tissue sample 12 may be extracted can be based on various types of information. For example, in the case of a malignant tumor that has not (yet) metastasized the normal tissue sample 12 can be safely drawn from a location of the same type of tissue that is sufficiently far away from the tumor that it is unlikely to contain a non-negligible quantity of cancer cells. In the case of metastatic cancer, the normal tissue sample 12 may be drawn from tissue of a type that is unlikely to contain a non-negligible quantity of metastasized cancer cells. For example, if the cancer is unlikely to have spread to oral tissue, then the normal tissue sample 12 may be an oral sample. In general, the suspect tissue sample 10 and the normal tissue sample 12 may or may not be of the same tissue type.
It will be noted that in illustrative
The tissue samples 10, 12 are conveyed from the sampling laboratory 8 to the genomics laboratory 4 (unless the laboratories 4, 8 are the same physical establishment). At the genomics laboratory 4, each sample 10, 12 is suitably prepared and processed using a genetic sequencing apparatus 14 to generate a suspect whole genome sequence (suspect WGS) 20 and a normal whole genome sequence (normal WGS) 22, corresponding to the suspect tissue sample 10 and the normal tissue sample 12 respectively. The genetic sequencing apparatus 14 can employ substantially any sequencer that is capable of generating a whole genome sequence (WGS). Some suitable sequencing apparatus are available from Illumina®, San Diego, Calif., USA; Knome®, Cambridge, Mass., USA; Roche 454 (available from Roche, Basel, Switzerland); and Ion Torrent, Guilford, Conn., USA.
As used herein, a “whole genome sequence”, or WGS (also referred to in the art as a “full”, “complete”, or entire” genome sequence), or similar phraseology is to be understood as encompassing a substantial, but not necessarily complete, genome of a subject. In the art the term “whole genome sequence”, or WGS is used to refer to a nearly complete genome of the subject, such as at least 95% complete in some usages. The term “whole genome sequence”, or WGS as used herein does not encompass “sequences” employed for gene-specific techniques such as single nucleotide polymorphism (SNP) genotyping, for which typically less than 0.1% of the genome is covered. The term “whole genome sequence”, or WGS as used herein does not require that the genome be aligned with any reference sequence, and does not require that variants or other features be annotated.
The WGS 10, 12 are processed by an electronic data processing device 24, which in illustrative
The disclosed cancer identification tests are based on comparison of the suspect whole genome sequence 20 with the normal whole genome sequence 22, with the general premise being that the larger the difference is between these WGS 20, 22 the more likely that the suspect WGS 20 is cancer tissue. In case of cancerous cells, the changes in the genome become more pronounced with large indels (insertionsdeletions), wide copy number variations (CNV's), chromosomal aberrations and rearrangements and aneuploidy in extreme cases of highly malignant and dedifferentiated tumor. Again, this is true regardless of the mechanism of carcinogenesis. These genomic changes induce significant alterations or errors in the whole genome, causing the WGS of cancer cells to deviate substantially from the WGS of normal cells. In general, this is a matter of degree. Even the WGS of normal cells is expected to have deviations from one another. These deviations are expected to be substantially larger for cancer cells. This premise can also be applied to monitoring cancer progression from one cancer stage to the next, as the later cancer stages are expected to exhibit more differentiation (versus earlier stage cancer cells) respective to the normal cell WGS. Indeed, WGS of later stage cancer cells are expected to exhibit quantifiable increase in differentiation as compared with the WGS of earlier-stage cancer cells. Advantageously, these changes can be determined even before subjecting the WGS of the suspect tissue sample to the detailed analysis pipeline (e.g., including full alignmentassembly, variant calling and annotation, and comparison with literature variant-cancer correlation databases.
Toward this end, an operation 30 computes a WGS comparison metric providing a quantitative comparison between the suspect whole genome sequence 20 and the normal whole genome sequence 22. A decision operation 32 determines whether the quantitative WGS comparison metric satisfies a cancer criterion. Depending upon the decision reached at the decision operation 32, the suspect tissue sample 10 is either classified as normal tissue (operation 34) or is classified as cancer tissue (operation 36). In this regard, the decision operation 32 can also be viewed as a classifier or classification operation.
Note that although a binary (i.e., either cancer or normal) classification is employed in the illustrative classifier 32 of
The classifier 32 does not opine as to the type of cancer, but only as to whether or not the suspect sample 10 comprises cancer. The output 34, 36 may be interpreted andor utilized in various ways. In the illustrative example of
In the illustrative example of
Having provided an overview of the cancer testing techniques disclosed herein with reference to
With reference to
The WGS comparison metric computation operation 301 described with reference to
A property of the Bloom filter is that it never erroneously indicates that a read is not in the Bloom filter when it actually is; however, there is a possibility that the Bloom filter may indicate a read is in the filter when it is not. Id. This can occur if other add operations have set all of the bits that would have been set by adding the read of the query so that the query returns all 1's even though the read of the query has not actually been added to the Bloom filter. Such an error is not particularly significant for this application, however, because it will only result in the number of duplicate reads being overestimated by one (since the first time the read is checked it will show up as being a duplicate when it is not; thereafter, any repeat of that read check will actually be a duplicate and will be correctly recognized as such). Moreover, the Bloom filter can be fine tuned for the accuracy required and time taken to report by adjusting the number of bits in the array and the number of hash functions.
The WGS comparison metric 301 of
With reference to
In performing the operation 64, the property that the Bloom filter never erroneously indicates that a read is not in the filter when it actually is ensures that the set of unique reads 66 does include not include any reads that are part of the normal WGS. However, it is possible that a few unique reads may be erroneously filtered out by the operation 64 since the Bloom filter 62 can erroneously indicate a read is in the filter when it is not. Thus, it is assured that the reads 66 are all unique to the suspect WGS 20, although some unique reads may have been missed.
The set of unique reads 66 can be treated as the WGS comparison metric, or alternatively a WGS comparison metric can be derived from the set 66. In the illustrative embodiment of
Alternatively, as also shown in
In the approach of
With reference to
In one approach, the WGS comparison metric comprises the quantity of the unique variants found only in the suspect WGS (again, optionally normalized by the total number of variants in the aligned suspect WGS 72 or by another normalization factor). In the illustrative example, this WGS comparison metric serves as input to a classifier 323 which compares the quantity of the unique variants found only in the suspect WGS against a suitable cancer criterion. Typically, a higher number of unique variants in the suspect WGS 20 tends to suggest cancer, and so the cancer criterion employed by the classifier 323 is suitably a threshold above which the suspect tissue sample 20 is labeled as cancer.
In another approach also depicted as an alternative classifier 3233 in
With reference to
With continuing reference to
The disclosed cancer tests based on WGS data provide fast assessment for pre-screening the massive WGS for probable genomic alterations attributable to cancer, thus providing a guide for computationally and time extensive analysis pipeline. The disclosed cancer tests are also expected to be useful for quantization of the progression of cancer. The disclosed cancer test embodiments effectively measure the genomic damage incurred due to the cancer on the scale of the entire WGS. These results are obtainable quickly without waiting for detailed specific variant-based genomic analysis. The disclosed cancer tests can be used to select defined analysis pipeline for cancer which is different from normal genome analysis, and employs a limited computational infrastructure. The WGS comparison metric is a suitable measure of the dedifferentiationmalignancy level of the cancer and thus is of prognostic value.
In some practical cancer diagnosis applications, suspect and normal tissue samples 10, 12 are sequenced to the same coverage and the raw sequencing reads are used to measure the randomness of the cancer genome. The base-line (i.e., normal) WGS 22 for normal cells is prepared from the subject 6 by performing whole genome sequencing on normal tissue samples 12 which may, for example, be white blood cells (WBC), cells from the buccal cavity, or so forth. The suspect WGS 20 is obtained from cancerous cells sequencing. The raw reads are directly compared and the WGS difference metric obtained.
For detection of cancer progression, suspect tissue samples 10 are collected from different regions of the cancer tissue and boundary and also from involved lymph node or nodes in case of nodal progression of disease (where possible). Suspect tissue samples 10 may also be collected from metastatic foci (where possible and applicable). Normal tissue samples 12 are collected from appropriate normal tissue, such as normal lung tissue in the case of small cell lung carcinoma, or from a skin biopsy in case of basal cell carcinomacutaneous squamous cell carcinoma. The normal tissue samples 12 serve as a control or baseline.
Another application of the cancer cell identification approaches disclosed herein pertains to tumor delineation. As part of the planning process for surgical tumor removal, gamma knife surgery, or radiation therapy, the tumor should be accurately delineated. However, because cancer cells are closely related to, and hence may be difficult to distinguish from, normal body cells, such delineation can be difficult. Imaging techniques such as computed tomography (CT) or magnetic resonance imaging (MRI) may fail to provide a crisp delineation between the tumor and surrounding healthy tissue, and the imaged boundary (even if well defined in the image) may not precisely match the physical distribution of cancer cells due to microinfiltrations or the like. Histopathology can also be employed. Here, suspect tissue is extracted and examined microscopically, possibly in conjunction with probative staining, in order to differentiate and identify cancer cells. Histopathology is reliant upon the cancer cells having morphologically distinct characteristics andor an identifiable coloration under appropriate staining conditions. Unfortunately, this is not always the case. Where the differentiation from normal cells is subtle, accurate histopathology assessment is reliant upon the skill of the human technician and hence is prone to human error. Indeed, in some cases the cancer cells may be morphologically identical with normal cells, making histopathology ineffective.
The rapid throughput provided by the disclosed cancer cell identification techniques facilitates the use of these techniques in tumor boundary delineation.
With reference to
Once the tissue samples are collected, they are processed as disclosed herein with reference to
In one approach, the tissue samples 104 are collected from different depths of the tumor radially outwards from center to outside the boundary indicated by imaging, as shown in
In some embodiments, genetic variants such as single nucleotide polymorphisms (SNP's), indels, structural variants (SV's), copy number variants (CNV's), and so forth are extracted using conventional genetic analysis, expression patterns are extracted and compared against a database of signatures are reported to have association with the type of cancer corresponding to the tumor 100. The resection boundary 110 is drawn across points where normal sequence patterns are observed.
However, it is generally not necessary to identify the type of cancer, as the nature of the tumor 100 is generally known before scheduling radiation therapy, gamma knife surgery, surgical tumor removal, or the like. Accordingly, the disclosed approach, e.g. as described herein with reference to operations 30, 32 of
In a variant approach, tissue samples 104 are collected as described with reference to
In another variant approach, sample collection is as described with reference to
In another variant approach, sample collection is as shown in
In another variant approach, image guided tissue sample collection is performed as described with reference to
In another variant approach, the sequencing reads from different tissue samples 104 are subtracted from each other. A percentage of variation within normal tissue is determined (e.g., using the normal tissue samples 108). A variation of around 1.5-2.5% is generally expected for normal tissue. Cancer tissue samples are expected to exhibit a larger variation than normal tissue, thus enabling the boundary 110 to be detected. For example, in some such embodiments, if the reads similarity is less than 97.5% between two tissue samples, then it may be regarded as difference in cells types and the boundary 110 may be thusly defined.
The invention has been described with reference to the preferred embodiments. Obviously, modifications and alterations will occur to others upon reading and understanding the preceding detailed description. It is intended that the invention be construed as including all such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
11193637.3 | Dec 2011 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2012/056821 | 11/29/2012 | WO | 00 | 6/3/2014 |
Number | Date | Country | |
---|---|---|---|
61568262 | Dec 2011 | US |