This application contains a sequence listing filed in electronic form as an ASCII.txt file entitled “T15956 (222110-1900) AS FILED Sequence Listing_ST25 2019_10_14” which was created on Oct. 11, 2019 and is 125 KB. The entire contents of the sequence listing are incorporated herein by reference in its entirety.
Due to a large number silent mutations and the absence of tools to detect them, the measurement of genome-wide chromosomal DNA abnormalities is not routine. The current next-generation sequencing (NGS) methods make inherent sequencing errors. The sequencing errors can be partially alleviated by increasing the number of runs and improving the purity of the sample; however, even deep sequencing methods suffer from false detection rates. Detection of genome-wide variance is rarely better than 1%. This is far from the 106 sensitivity expected to be required for detecting the rate of silent mutations accumulating with each cell division or after an exposure to a mutagen, for example, a low-dose particle exposure. Therefore, the current NGS methods are insufficient to estimate genome-wide accumulated mutations and/or the rate of mutations, for example, point mutations and insertion/deletion (indel) mutations.
The premalignant genome-wide accumulated mutations and/or the rate of mutations usually cause few or no phenotypic effects but stochastically (randomly but extremely rarely) lead to driver mutations which cause cancer. The driver mutations accelerate the oncogenic mutational process and lead to evolutionary selection of more drivers in cancer-causing genes. The genome-wide accumulated mutations and/or the rate of mutations are dominated by point mutations (−95%) and short indel mutations (˜1-3%) and include a small component of translocations when high-linear energy transfer (LET) radiation is considered (˜1%). Driver mutations are far too complex to measure and interpret and are too infrequent to detect in small tissue volumes.
The invention relates to materials and methods of determining accumulated mutations and the rate of mutations in a target genomic sequence (also referred to herein as “a target sequence”), particularly a target sequence which is a part of a short interspersed element (SINE), a long interspersed element (LINE), any highly repeated sequence in a cell's genome, and/or the mitochondrial genome. Thus, the target sequence is present in a large number of copies per genome as SINEs, LINEs, or other highly repeated sequences within the genome of the cell, or the mitochondrial genome which is highly repeated in each cell in each mitochondrion. Because the entire mitochondrial genome is highly repeated due to the number of genome copies in each mitochondria and the number of mitochondria in each cell, the mitochondrial genome, in addition to a SINE and/or LINE, can serve as a target sequence in one embodiment of the invention. The assay to determine the number of accumulated mutations in a target sequence utilizes a combination of a target sequence clamp with digital PCR (dPCR). The target sequence clamp binds only to the wild-type target sequence, prevents PCR amplification of only the amplicons that have the wild-type target sequence and permits PCR amplification of only the amplicons that have the mutated target sequence. The dPCR, where the sample is separated into a large number of partitions, detects the presence of the DNA fragments containing the mutant target sequences in the large number of analyzed genomic DNA fragments. The PCR amplification in a partition of the sample indicates the presence of the mutant target sequence in that partition. The accumulated mutations in the target sequence can be calculated based on the number of fragments of the genomic DNA that arise from one genome and the number of fragments of the genomic DNA per genome that contain the mutated target sequence.
The accumulated mutations in a target sequence can be used to determine the rate of mutations in the target sequence, the accumulated mutations in the genome (genome-wide mutations) and the rate of mutations in the genome (genome-wide rate of mutations).
Accumulated mutations and the rate of mutations in the genome are directly proportional to genomic age and the risk of cancer. Accordingly, the invention also provides a method of calculating the genomic age and/or the risk of cancer in a subject. In one embodiment, the risk of cancer in a tissue or organ is determined.
Furthermore, the invention provides a kit to carry out the methods of the invention.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication, with color drawing(s), will be provided by the Office upon request and payment of the necessary fee.
SEQ ID NO: 1: Forward Primer B1.
SEQ ID NO: 2: Reverse Primer B1 R001.
SEQ ID NO: 3: Clamp1.
SEQ ID NO: 4: Clamp2.
SEQ ID NO: 5: Clamp3.
SEQ ID NO: 6: Sequence of mouse B1.
SEQ ID NOs: 7-10: Example of clamp sequences for Alu SINEs.
SEQ ID NOs: 11-98: Examples of target amplicons in Alu SINEs.
SEQ ID NOs: 99-274: Forward and reverse primers for clamp sequence of SEQ ID NOs: 7-10 and Alu SINE sequences of 11-98.
SEQ ID NOs: 275-316: Sequence of SINEs indicated in Table 1.
SEQ ID NOs: 317-416: Examples of frequent sequences in human genomes that can be used as clamp sequences.
SEQ ID NOs: 417-478: Examples of Alu sequences in humans.
SEQ ID NO: 479: An example of a forward primer for B1 having the sequence of SEQ ID NO: 281.
SEQ ID NO: 480: An example of a reverse primer for B1 having the sequence of SEQ ID NO: 281.
SEQ ID NO: 481: An example of a clamp for B1 having the sequence of SEQ ID NO: 281.
As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms “including”, “includes”, “having” “has”, “with”, or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising”.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 0 to 20%, 0 to 10%, 0 to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated, the term “about” means within an acceptable error range for the particular value. In the context of compositions containing amounts of ingredients where the terms “about” or “approximately” are used, these compositions contain the stated amount of the ingredient with a variation (error range) of 0 to 10% around the value (X±10%).
In the present disclosure, ranges are stated in shorthand to avoid having to set out at length and describe each and every value within the range. Any appropriate value within the range can be selected, where appropriate, as the upper value, lower value, or the terminus of the range. For example, a range of 0.1-1.0 represents the terminal values of 0.1 and 1.0, as well as the intermediate values of 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and all intermediate ranges encompassed within 0.1-1.0, such as 0.2-0.5, 0.2-0.8, 0.7-1.0, etc. Values having at least two significant digits within a range are envisioned, for example, a range of 5-10 indicates all the values between 5.0 and 10.0 as well as between 5.00 and 10.00, including the terminal values.
When ranges are used, such as for length of a SINE or a LINE or target sequence within a genome, primer or target sequence clamp, combinations and subcombinations of ranges (e.g., subranges within the disclosed ranges) and specific embodiments therein are intended to be explicitly included.
As used herein, the term “cancer” refers to the presence of cells possessing abnormal growth characteristics, such as uncontrolled proliferation, immortality, metastatic potential, rapid growth and proliferation rate, perturbed oncogenic signaling, and certain characteristic morphological features. This includes but is not limited to the growth of: (1) benign or malignant cells (e.g., tumor cells) that correlate with overexpression of a serine/threonine kinase; or (2) benign or malignant cells (e.g., tumor cells) that correlate with abnormally high levels of serine/threonine kinase activity or lipid kinase activity. Non-limiting serine/threonine kinases implicated in cancer include but are not limited to PI-3K mTOR and AKT. Exemplary lipid kinases include but are not limited to PI3 kinases such as PBKα, PBKβ, PBKδ, and PBKγ.
“Subject” refers to an animal, such as a mammal, for example a human. The methods described herein can be useful in both humans and non-human animals. In some embodiments, the subject is a mammal, and in some embodiments, the subject is human. The invention can be used in a subject selected from non-limiting examples of a human, non-human primate, rat, mouse, pig, dog or cat. Additional embodiments of the animals in which the invention can be practiced are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
Where the term “and/or” is used within the application, it is intended that the elements recited within the phrase where the term “and/or” is used can be assess individually or in any combination of the recited elements. For example, the phrase “target SINE, target LINE and/or the genome” is meant to convey that each element of the phrase can be assessed individually or in any possible combination (e.g., target SINE alone, target LINE alone, genome alone, target SINE and target LINE in combination, target SINE and genome in combination, target LINE and genome in combination, or target SINE, target line, and genome in combination).
The invention relates to materials and methods of determining accumulated mutations and the rate of mutations in a target genomic sequence (also referred to herein as “a target sequence”), particularly a target sequence which is a part of a short interspersed element (SINE), a long interspersed element (LINE), any highly repeated sequence in a cell's genome, and/or the mitochondrial genome. Thus, the target sequence is present in a large number of copies per genome as SINEs, LINEs, or other highly repeated sequences within the genome of the cell. Thus, the invention provides an assay to determine the accumulated mutations and/or the rate of mutations in a target sequence within a short interspersed element (SINE), long interspersed elements (LINEs), mitochondrial genome and/or the genome (as used herein, the target sequence is any highly repeated sequence within the genome of a cell). The accumulated mutations and/or the rate of mutations in a target SINE, target LINE and/or the genome integrate various causes of DNA damage. As used herein, the term “genome” refers to highly repeated sequences within the genome of a cell, such as mitochondrial genomes. “Highly repeated sequences” are a nucleotide sequence that is repeated hundreds to thousands of times within the genome of a cell. As such, the invention provides materials and methods for quantitative estimation of mutations and rate of mutations. The invention also provides an assay to measure point mutations and indels (which together comprise >95% of all mutations) in a target sequence within a cell a target SINE and/or target LINE for example, Alu in humans or B1 in mice or a target LINE, such as those provided in Tables 6-7 (which provides GenBank Accession numbers for partial LINE1 sequences and full length LINE1 sequences for humans and mice, each of which is hereby incorporated by reference in their entirety). The accumulated mutations and/or the rate of mutations in a target sequence within a target SINE or genome of the cell can be extrapolated to measure point mutations and indels in the genome.
The current approaches to estimating cancer risk require screening many animals for long periods of time. These methods generally feature genetically defined animals with driver mutations that cause cancers and incidence rates that are not representative of spontaneously occurring human cancers. The invention provides cancer risk assessments that can be performed on a subject-by-subject basis and on an organ-by-organ basis, thus allowing for subject-specific and organ-specific estimates of cancer risk. The methods of the invention can also be used to determine the effects on cancer risk of genetic and non-genetic factors, for example, race, family lineage, and environmental factors such as food, lifestyle choices, smoking, etc.
The accumulated mutations and/or the rate of mutations in a target sequence within the target SINE, target LINE, and/or the genome is directly proportional to the individual's chronological age. Specifically, an individual with a higher chronological age has more accumulated mutations and/or a higher rate of mutations in a target sequence within a target SINE, target LINE, and/or the genome compared to an individual with a relatively lower chronological age. An embodiment of the invention provides an assay to determine accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome. The genome-wide accumulated mutations and the rate of mutations can be used to estimate genomic age of an individual for correlation with the individual's chronological age.
For the purpose of the invention, the phrase “chronological age” is the age of a subject based on the subject's date of birth. Accordingly, compared to a chronologically younger subject, a chronologically older subject has a date of birth earlier in time.
Age-related DNA damage is random, is not confined to coding regions and increases with age. Therefore, the number of mutations in certain tumors is directly proportional to the age of the patient (
While almost all mutations are “passenger” mutations, i.e., silent mutations, occasionally a “driver” mutation occurs. Driver mutations confer a competitive advantage upon the reproduction of the affected cell. Thus, while it would be difficult to detect cells with a driver mutation without complete organs to study and without unrealistic databases and sequencing resources, passenger mutation frequencies can be used to mathematically determine driver mutation frequency and downstream cancer incidence. The process from driver mutation to tumor is typically estimated to be 1 to 15 years, and tumors typically have only a few driver mutations (≥10). Notably, 10 times fewer driver mutations are affected by chromosome changes than by point mutations, and high-LET radiation induces both types of mutations.
Since the rate of mutations, and consequently, the chances of the occurrence of driver mutations increase with increasing chronological age, chronological age typically correlates with cancer risk in the general population, i.e., higher chronological age of a subject typically, but not necessarily, indicates higher risk of cancer in the subject. For example, cancer incidence per 100,000 is 17 for ages less than 20, which increases by a factor of 10 to 157 for ages 20-49, increases another 5-fold for ages 50 to 64, and further increases another 3-fold for ages over 75 (>2,200/100,000), for a total increase of more than 130-fold (
For the purpose of the invention, the “genomic age” indicates the accumulated mutations and/or the rate of mutations in the genome of a subject (
The comparison of a subject's chronological age and genomic age can be expressed as genomic age−the chronological age (Δage). Therefore, a positive Δage indicates that a subject is aging at a higher rate than average, whereas a negative Δage indicates that a subject is aging at a lower rate than average
A standard scale for the genomic age for a particular species can be determined based on the average accumulated mutations and/or the average rate of mutations in different groups of individuals of varying ages. As such, the standard scale for the genomic age for the species indicates the average accumulated mutations and/or the average rate of mutations in a target sequence in a target SINE, target LINE, and/or the genomes of individuals belonging to the species at increasing chronological ages.
Accordingly, the invention provides methods for measuring accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome to determine the genomic age of a subject and, consequently, to determine the cancer risk of the subject. Genomic age and chronological age of a subject can be compared to the known chronological age of the subject and the standard scale of genomic age to identify the subject's risk for cancer and offer enhanced cancer screening if the subject has a higher risk of cancer. Low-risk groups, on the other hand, can be spared from unnecessary screening tests. The accumulated mutations and/or mutation rates can also be used to evaluate the impact of environment (e.g., insecticides), lifestyle changes (e.g., weight loss or smoking cessation), and therapies (e.g., X-rays, medications) on genotoxic load, mutation rate, and, consequently, cancer risk.
Accordingly, an embodiment of the invention provides an assay to determine the number of accumulated mutations in a target sequence within a target SINE and/or target LINE and/or genome of a subject. For the purpose of this invention, this assay is called the clamp/dPCR combination assay.
The clamp/dPCR combination assay comprises the steps of:
a) obtaining a genomic DNA sample from the subject and fragmenting the genomic DNA sample, or obtaining a fragmented genomic DNA sample from the subject,
b) mixing a predetermined number of fragments of the genomic DNA that arise from a predetermined number of genomes with a reagent mixture to produce a reaction mixture, the reagent mixture comprising:
i) a pair of polymerase chain reaction primers that amplify a target amplicon comprising the target sequence within the target SINE,
ii) a target sequence clamp which binds only to the wild-type target sequence within the SINE, wherein the target sequence clamp prevents the PCR amplification of only those target amplicons that have the target wild-type sequence within the SINE and permits the PCR amplification of only those target amplicons that have the target mutated sequence within the SINE, and
iii) a DNA polymerase enzyme and the reactants for a digital PCR (dPCR),
c) subjecting the reaction mixture to the dPCR,
d) identifying the number of fragments of the genomic DNA comprising the target amplicon having the target mutated sequence within the SINE based on the number of positive PCR amplifications in the dPCR,
e) calculating the number of accumulated mutations per genome in the target sequence within the target SINE and/or target LINE and/or genome based on the number of fragments of the genomic DNA that arise from one genome and the number of fragments of the genomic DNA per genome that comprise the target amplicons having the target mutated sequence within the target SINE and/or target LINE and/or genome wherein the presence of the target mutated sequence within the target SINE and/or target LINE and/or genome is indicated by the positive PCR amplification in the dPCR.
LINEs are transposons that are about 5-6 kb long, contain an internal polymerase II promoter and encode two open reading frames (ORFs). Upon translation, a LINE RNA assembles and moves to the nucleus, where an endonuclease activity makes a single-stranded nick and the reverse transcriptase uses the nicked DNA to prime reverse transcription from the 3′ end of the LINE RNA. Reverse transcription frequently fails to proceed to the 5′ end, resulting in many truncated, nonfunctional insertions. Most LINE-derived repeats are short, with an average size of 900 bp for all LINE1 copies, and a median size of 1,070 bp for copies of the currently active LINE1 element. Three distantly related LINE families are found in the human genome: LINE1, LINE2 and LINE3, with LINE1 being the only remaining active LINE. Exemplary target LINE1 sequences are provided in Tables 6-7, which provide both partial and full length LINE1 sequences for humans and mice, identified by GenBank accession number. Other LINE1 sequences, including those of other animal species, are known in the art and can be easily identified in various databases, such as GenBank.
A SINE is a highly repetitive sequence that retrotransposes into a eukaryotic genome through intermediates transcribed by RNA polymerase III (pol III). In many species, SINEs are ubiquitously dispersed throughout the genome and can constitute a significant mass fraction of total genome, for example, typically about 10% or even above 10% in some cases.
SINEs cause mutations both by their retrotransposition within genes and by unequal recombination.
SINEs are relatively short (<700 bp) nonautonomous retroposons transcribed by pol III from an internal promoter and reverse transcribed by the reverse transcriptase of long interspersed elements. Eukaryotic genomes typically contain hundreds of thousands, and sometimes even more, of SINE copies (see Table 1, column: copy number). A SINE typically consists of a head, body and tail. The 5′-terminal head originates from one of the cellular RNAs synthesized by pol III: tRNA, 7SL RNA or SS rRNA; the body can contain a central domain which may be shared by distant SINE families; and the 3′-terminal tail is a sequence of variable length consisting of simple and often degenerate repeats. Various aspects of SINE structure, biology and evolution have been reviewed in Vassetzky et al. (2012). Vassetzky et al. also provide a database of SINE families from animals, flowering plants and green algae (see sines.eimb.ru). Non-limiting examples of SINEs from certain animals from the database provided by Vassetzky et al. are provided in Table 1. Several variants of the SINEs provided in Table 1 are also described by Vassetzky et al. (see sines.eimb.ru). Additional examples of SINEs suitable according to the methods of the claimed invention are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
Mus musculus
menadoensis)
Callosciurus)
tridecemlineatus)
A SINE and/or LINE and/or target sequence within the genome used in an embodiment of the invention is spread throughout the genome and covers at least a certain percentage of the entire genome, and thus forms a representative sample of the entire genome. For example, a preferred SINE and/or LINE and/or target sequence within the genome for use in the assay of the invention typically covers about 4-15%, about 5-14%, about 6-13%, about 7-12%, about 8-11%, about 9-10% or about 10% of the genome.
The SINE and/or LINE and/or target sequence within the genome used for a particular assay depends on the species to which the subject belongs and the prevalence of the SINE and/or LINE and/or target sequence within the genome in the species' genome. A preferred SINE and/or LINE and/or target sequence within the genome for use in the assay of the invention typically comprises about 50-500, about 100-400, about 100-250, about 200-300, about 250-350 or about 300 bp.
A person of ordinary skill in the art can identify a suitable SINE for use in a particular species. Also, depending on the sequence of the SINE and/or LINE and/or target sequence within the genome, a person of ordinary skill in the art can design a target sequence clamp and a primer pair to conduct the assay of the invention. Such embodiments are within the purview of the invention.
The target sequence and, accordingly, the clamp can be designed based on the sequence of the SINE and/or LINE and/or target sequence within the genome. A person of ordinary skill in the art can appreciate that variation exists even within the large number of copies of a particular SINE and/or LINE and/or target sequence within the genome throughout the genome. However, within the larger sequence of a SINE and/or LINE and/or target sequence within the genome, certain portions of the SINE and/or LINE and/or target sequence within the genome do not show much variability among the copies of the SINE and/or LINE and/or target sequence within the genome throughout the genome. For example, in humans, Alu is about 300 bp long; however, portions of Alu that are about 5-50 bp, about 10-40 bp, about 20-30 bp, about 25 bp or about 10-15 bp are highly conserved among the large number of copies of Alu throughout the genome. In certain embodiments, the clamp sequences are designed based on sequences that are conserved across different human races or animal breeds/strains.
In one embodiment, the clamp sequence is selected from the sequences provided in Table 2 below. These sequences are the most common sequences that occur in human Chromosome 1.
Homo sapiens genome, Chromosome 1.
Therefore, in one embodiment, the target sequence, and, accordingly, the clamp, is designed based on the highly conserved portion of a particular SINE.
Examples of Alu SINES, suitable clamp sequences for particular Alu SINES and corresponding primer pairs are given in Table 3.
Table 4 below shows the frequencies of each of the clamp sequences having SEQ ID NOs: 7 to 10 on each of the human chromosomes.
In certain embodiments, a number of clamps encompassing a highly conserved region are designed by “walking across” the highly conserved region. Each of these clamps can be tested to identify the camp which exhibits maximum experimental ease, accuracy and reproducibility. For the purposes of this invention, the term “walking across” indicates the process of designing a plurality of clamps that bind to an area of interest, wherein each of the plurality of clamps has a specific length, for example, 10-15 bp, each of the plurality of clamps begins at a particular nucleotide of the area of interest and the plurality of clamps as a whole cover the entire are of interest. In general, the clamps are about 15-50 bp, about 15-40 bp, about 15-30 bp, about 16-28 bp, about 17-26 bp, about 18-24 bp, about 19-22 bp, or about 20 bp.
For example, if a SINE is about 300 bp containing a highly conserved region of 50 bp, a plurality of clamps, each of about 10-15 bp, is designed by walking across the highly conserved region of 50 bp. For example, a plurality of clamps of 10 bp are designed based on a highly conserved region of 50 bp by designing clamps that have the sequence of 1-10 bases, 2-11 bases, 3-12 bases, . . . , and 41-50 bases of the highly conserved region. As such, 41 different clamps can be designed and tested to identify the preferred clamp for use in an assay according to the invention.
A person of ordinary skill in the art can design a plurality of clamps of a particular length within an area of interest, and such embodiments are within the purview of the invention.
For example, the sequence for Alu is highly conserved, particularly among the first 50 bp, and other sites in the Alu sequence. A sequence alignment between certain Alu sequences is provided in
Additional examples of Alu sequences are provided as SEQ ID NOs: 417 to 478. Conserved domains in these sequences can be determined by a person of ordinary skill in the art to design appropriate clamp sequences and primer sequences.
In a further embodiment, multiple versions of clamps can be used in an assay where the different versions of the clamps are directed to variants of Alu sequences. Therefore, as a whole, the multiple versions of clamps can be used to identify mutations in a region of interest beyond the variability naturally observed in the region of interest.
In preferred embodiments, the clamps are designed based on a region of about 16 to 20 highly conserved bp. The clamp sites also require the presence of suitable 5′ and 3′ primers for the PCR component.
Examples of 14 bp clamps derived from the areas of greatest frequency in Alu, which occur in both mice and humans, are provided in SEQ ID NOs: 7-10. The clamp sequence CCTGTAATCCCAGC (SEQ ID NO: 7) has about 16,000 repeats on chromosome 12; the clamp sequences CTAAAAATACAAAA (SEQ ID NO: 8) and TGCACTCCAGCCTG (SEQ ID NO: 9) each have approximately 10,000 repeats on chromosome 12; and the clamp sequence TCTCAAAAAAAAAA (SEQ ID NO: 10) has approximately 7,000 repeats on chromosome 12. All of these are clamps are suitable for appropriate 3′ and 5′ PCR primers.
In one embodiment, the subject is a mouse and the SINE is B1 having the sequence of SEQ ID NO: 6. An example of the primer pair used in mice with this B1 (SEQ ID NO: 6) as the SINE comprises the primers having the sequences of SEQ ID NOs: 1 and 2 and the target sequence clamp having a sequence selected from SEQ ID NO: 3, 4 and 5.
In another embodiment, the subject is a mouse and the SINE is B1 having the sequence of SEQ ID NO: 281. An example of the primer pair used in mice with this B1 (SEQ ID NO: 281) as the SINE comprises the primers having the sequences of SEQ ID NOs: 479 and 480 and the target sequence clamp having a sequence selected from SEQ ID NO: 3, 4 and 481.
In a further embodiment of the invention, the subject is human and the SINE is Alu.
The step of obtaining a genomic DNA sample from the subject and fragmenting the DNA sample can be performed based on methods well known in the art. Methods and parameters used in fragmenting genomic DNA depend on the size of the genome and the desired average and median size of the fragments. The desired size of the fragments depends on the size of the target SINE, target LINE and/or target sequence within the genome i.e., most of the fragments must allow binding of the primers and the target sequence clamps for the assay to be successful. For example, for a target SINE, target LINE and/or target sequence within the genome of about 100 to 500 bp, the substantial number of DNA fragments is about 800-1500 bp each. For example, in a typical genomic DNA fragment sample, each fragment from at least about 80-99%, about 82-98%, about 84-96%, about 86-94%, about 88-92%, or about 90% of all of the genomic DNA fragments have the desired size. A person of ordinary skill in the art can determine the optimal size of the genomic fragments for a particular assay. Also, the techniques of producing genomic fragments of a desired size are well-known in the art and such embodiments are within the purview of the invention.
A person of ordinary skill in the art would appreciate that since the fragmenting of the genomic DNA is random, a fragment can contain more than one SINE and/or one or more LINE and/or target sequence within the genome. A fragment can also contain only a part of the SINE or part of a LINE and/or target sequence from within the genome. These issues can be addressed by using a large sample and/or running multiple repeats of the assay so that the possible errors are diluted and a more accurate estimation of the mutations is obtained.
Mixing a predetermined number of fragments of the genomic DNA that arise from a predetermined number of genomes with a reagent mixture to produce a reaction mixture is intended to provide the values used in the calculations of the accumulated mutations and/or rate of mutations in the target sequence within the target SINE and/or target LINE and/or the genome. For example, a person of ordinary skill in the art can calculate the accumulated mutations and/or the rage of mutations in the target SINE and/or target LINE and/or target sequence within the genome based on the number of fragments arising from one genome, the number of fragments subjected to dPCR, the number of mutated target sequences as indicated by the positive PCR amplification results and the size and frequency of the target SINE and/or target LINE and/or target sequence within the genome in the genome. Also, based on the size and frequency of the target SINE and/or target LINE and/or target sequence within the genome and the accumulated mutations and/or the rate of mutations in the target SINE and/or target LINE and/or target sequence within the genome, a person of ordinary skill in the art can calculate the accumulated mutations and/or the rate of mutations in the genome.
In addition to the primer pair, the target sequence clamp and the DNA polymerase, the reagent mixture contains reagents for the dPCR. The reagent mixture comprises deoxyribonucleotides (dNTPs), metal ions (for example, Mg2− and Mn2−), and a buffer. Additional reagents which may be used in a dPCR reaction are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
The pair of polymerase chain reaction primers that amplify a target amplicon comprises the target sequence within the target SINE and/or target LINE and/or target sequence within the genome. The primers are designed so that an amplicon is not produced when the target sequence clamp is bound to the target sequence within the target SINE and/or target LINE and/or target sequence within the genome.
Based on a particular target SINE and/or target LINE and/or target sequence within the genome, for example, a target SINE selected from Table 1, a person of ordinary skill in the art can design appropriate primers and the target sequence clamp. For a particular target SINE and/or target LINE and/or target sequence within the genome, person of ordinary skill in the art can test multiple primer pairs and/or target sequence clamps to identify the optimal combination of primers and target sequence clamps and such embodiments are within the purview of the invention.
A target sequence within a target SINE and/or target LINE and/or target sequence within the genome is the sequence to which the target sequence clamp binds. Particularly, the target sequence clamp is complementary to the target sequence within the target SINE and/or target LINE and/or target sequence within the genome.
A wild-type target sequence does not contain any mutations. A mutated target sequence contains one or more point mutations and/or indel mutations. Accordingly, a wild-type target amplicon contains the wild-type target sequence and a mutated target amplicon contains a mutated target sequence.
In one embodiment, the target sequence clamp is designed based on the sequence of the SINE, LINE and/or genomic sequence and is about 15-50 bp, about 15-40 bp, about 15-30 bp, about 16-28 bp, about 17-26 bp, about 18-24 bp, about 19-22 bp, or about 20 bp.
In an embodiment of the invention, the target sequence clamp is designed so that the melting temperature of the target sequence clamp with the target sequence is higher than the temperatures used in the PCR cycle. The higher melting temperature of the clamp ensures that the clamp is bound to the clamp target sequence during the PCR cycles when the clamp is perfectly matched with the target sequence. A mutation in a target sequence reduces the melting temperature of the target sequence clamp with the mutated target sequence and the target sequence clamp is not bound to the mutated target sequence at the temperatures of the PCR cycles, particularly the annealing steps and the amplification steps of the PCR cycles. Therefore, the target sequence clamp prevents PCR amplification of the target amplicon when the amplicon contains the wild-type target sequence and the clamp permits PCR amplification of the target amplicon when the amplicon contains a mutated target sequence.
In another embodiment of the invention, the target sequence clamp comprises xenonucleotide (XNA). A variety of XNA are known in the art. The target sequence XNA clamp also suppresses PCR amplification of the amplicons containing wild-type clamp target sequences and allows selective PCR amplification of only the amplicons containing mutated target clamp sequences. XNA, for example, can contain an amino acid linkages rather than a phosphate between bases, which causes it to bind tightly with the wild-type clamp target sequence and reduces hydration and heat instability. Therefore, a target XNA sequence clamp does not melt off the wild-type clamp target sequence at the usual PCR temperatures when the match is perfect.
In a further embodiment, a target XNA clamp of about 13-20 bp is used. In other embodiments, the XNA clamp is about 15-50 bp, about 15-40 bp, about 15-30 bp, about 16-28 bp, about 17-26 bp, about 18-24 bp, about 19-22 bp, or about 20 bp. When a 13-20-bp XNA is used, a single-point mutation in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome lowers the melting point between the two binding sequences by 15-20° C. Indel mutations lower the melting point between a target XNA sequence clamp and the wild-type clamp target sequence by more than 15-20° C. Because the mutated target sequence does not bind the target XNA sequence clamp, only the amplicons containing the mutated target sequences are amplified during PCR.
Accordingly, when the target sequence clamp binds to the wild-type target sequence within the target SINE and/or target LINE and/or target sequence within the genome the target sequence clamp prevents the PCR amplification of the target amplicons that have the target wild-type sequence within the SINE. In contrast, when the target sequence contains a mutation, the target sequence clamp cannot bind to the target sequence, which allows the PCR amplification of the target amplicons that have the target mutated sequence within the SINE, LINE and/or target genomic sequence.
dPCR, as used in the claimed invention, refers to a PCR where the PCR reaction is carried out as a single reaction within a sample; however, the sample is separated into a large number of partitions and the reaction is carried out in each partition individually and separately from the other partitions. dPCR involves identification of the amplification of the target amplicons in each of the large number of partitions. dPCR enables precise and highly sensitive quantification of nucleic acids. An overview of dPCR is provided by Baker (2012), the contents of which are incorporated herein in their entirety.
In one embodiment of the invention, the dPCR used in the assay is droplet digital PCR (ddPCR). In ddPCR, a PCR sample is partitioned into a large number of droplets, for example, 20,000 droplets, using water-oil emulsion droplet technology. After amplification, droplets containing the target sequence are detected by fluorescence and scored as positive, and droplets without fluorescence are scored as negative. Poisson statistical analysis of the numbers of positive and negative droplets yields absolute quantitation of the target sequences. An overview of ddPCR is provided by Hundson et al. (2011), the contents of which are incorporated herein in their entirety.
When the dPCR results are obtained, the number of accumulated mutations per genome in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome is calculated based on the number of fragments of the genomic DNA that arise from one genome and the number of fragments per genome that comprise the target amplicons having the target mutated sequence within the target SINE and/or target LINE and/or target sequence within the genome. As discussed above, the presence of the target mutated sequence within the target SINE and/or target LINE and/or target sequence within the genome is indicated by the positive PCR amplification in the dPCR.
In standard dPCR or a ddPCR mix, the assay of the invention enables the detection of 1-2 mutant DNA fragments in a pool of 100,000 wild-type amplicons (
As such, an embodiment of the invention provides a method of estimating the accumulated mutations in a target sequence within a target SINE and/or target LINE. The number of accumulated mutations in a target sequence within a target SINE and/or target LINE and/or target sequence within the genome can be used to determine the rate of mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome. For example, the accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome can be determined at two time points and the rate of mutations can be calculated based on the difference in the number of accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome and the duration between the two time points.
In another embodiment, a first sample is obtained from the subject at Time 1 and a second sample is obtained from the subject at Time 2. The accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome are estimated in the first and the second samples according to the clamp/dPCR combination assay and the rate of mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome can be calculated based on the difference in the number of accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome and the duration between Time 1 and Time 2.
In a specific embodiment, a sample is obtained from the subject at birth. This sample provides accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome at birth, which can be considered as baseline mutations or the state of no mutations. The accumulated mutations estimated in a sample obtained from the subject at a later time can be compared to the baseline mutations or the state of no mutations.
Accordingly, an embodiment of the invention provides a method for calculating the rate of mutations in a target sequence within a target SINE and/or target LINE and/or target sequence within the genome in a subject. The method comprises the steps of:
a) according to the clamp/dPCR combination assay, determining the number of accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome in a first sample obtained from the subject at a first time point,
b) according to the clamp/dPCR combination assay, determining the number of accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome in a second sample obtained from the subject at a second time point,
c) calculating the rate of mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome in the subject based on the difference between the number of accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome in the subject at the first time point and the second time point and the duration between the first time point and the second time point.
The number of accumulated mutations and/or the rate of mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome can be used to estimate the accumulated mutations and/or the rate of mutations in the genome of a subject. For example, the number of accumulated mutations in the genome of a subject can be calculated based on the frequency of occurrence of a target SINE and/or target LINE and/or target sequence within the genome throughout the genome and the number of accumulated mutations in the target sequence within the target SINE and/or target LINE and/or target sequence within the genome.
Similarly, the rate of mutations in a target sequence within a target SINE and/or target LINE and/or target sequence within the genome can be used to estimate the rate of mutations in the genome of the subject. For example, the rate of accumulated mutations in the genome of the subject can be calculated based on the frequency of occurrence of the target SINE and/or target LINE and/or in the genome and the rate of mutations in the target sequence within the target SINE.
As such, an embodiment of the invention provides a method for determining accumulated mutations and/or the rate of the mutations in a target sequence within a target SINE and/or target LINE and/or the genome of the subject.
Accumulated mutations and/or the rate of mutations typically increase with age. For example, the number of accumulated mutations and/or the rate of mutations in a chronologically older subject are typically higher than the corresponding values in a chronologically younger subject. Also, different individuals age at a different rates, i.e., the accumulated mutations and/or the rate of mutations in two individuals of the same age can be different. For example, individuals exposed to higher levels of mutagens like carcinogens, mutagenic chemicals, radiation, stress, etc. typically have more accumulated mutations and/or a higher rate of mutations compared to individuals not exposed to such mutagens or exposed to relatively lower levels of mutagens.
A standard scale for genomic age for a particular species can be determined based on average accumulated mutations and/or the average rate of mutations in different groups of individuals of varying ages that are living under the conditions of exposure to only natural mutagens and/or the conditions of minimal exposure to man-made mutagens.
For the purpose of the invention, the phrase “the conditions of exposure to only natural mutagens” indicates exposure to only unavoidable natural mutagens, for example, cosmic radiation, ultraviolet rays from the sun, mutagens that may be naturally (i.e., without interference from humans) present in soil, air, water, and food or other environmental factors. Additional examples of unavoidable natural mutagens can be readily envisioned by a person of ordinary skill in the art.
For example, an individual living in the conditions of exposure to only natural mutagen is living in conditions that are free from:
a) exposure to man-made mutagens, such as synthetic carcinogens, synthetic pollutants, radiation from man-made sources, etc., and
b) avoidable/unnecessary exposure to natural mutagens, for example, smoking, using tobacco and other avoidable/unnecessary exposure to natural carcinogens.
Similarly, for the purpose of the invention, the phrase “the conditions of minimal exposure to man-made mutagens” indicates minimal exposure to unavoidable natural mutagens (discussed above) and minimal exposure to man-made mutagens, such as synthetic carcinogens, synthetic pollutants and radiation from man-made sources. The conditions of minimal exposure to man-made mutagens are also free from avoidable/unnecessary exposure to natural mutagens, for example, smoking, using tobacco and other avoidable/unnecessary exposure to natural carcinogens.
An example of an individual living under the conditions of exposure to only natural mutagens and/or the conditions of minimal exposure to man-made mutagens is an individual living in the countryside. Because of the industrialized lifestyle of almost everyone in the world, it is very difficult and almost impossible to find individuals living under the conditions of exposure to only natural mutagens. Therefore, the standard scale for the genomic age for a particular species can be determined based on the average accumulated mutations and/or the average rate of mutations in different groups of individuals of varying ages that are living under the conditions of minimal exposure to man-made mutagens.
Accordingly, a standard scale for the genomic age of humans can be produced by determining the average accumulated mutations and/or the average rate of mutations in humans of varying ages that live in the conditions of minimal exposure to man-made mutagens, for example, people living in the countryside. Such a scale of genomic age can be used to determine the genomic age of an individual based on the individual's accumulated mutations and/or rate of mutations in the genome.
The exposure to avoidable/unnecessary natural mutagens and/or the exposure to man-made mutagens typically increase the accumulated mutations and/or the rate of mutations in the genome of a subject. For example, a person living in the countryside typically has fewer accumulated mutations and/or a lower rate of mutations compared to a person living in a city, particularly a polluted city. Therefore, a person of a particular chronological age living in the countryside typically has a lower genomic age compared to the genomic age of a person of the same chronological age living in a city.
Accordingly, the clamp/dPCR combination assay for determining the accumulated mutations and/or the rate of the mutations in a target sequence in a target SINE, target LINE and/or the genome of a subject can be used to determine the genomic age of the subject. The method comprises the steps of:
a) preparing a standard genomic age scale for individuals belonging to the species of the subject and living under the conditions of exposure to only natural mutagens or the conditions of minimal exposure to man-made mutagens, or obtaining a pre-determined standard genomic age scale for the species of the subject,
b) determining the accumulated mutations and/or the rate of mutations in the subject according to the clamp/dPCR combination assay, and
c) estimating the genomic age of the subject based on the comparison of the accumulated mutations and/or the rate of mutations in the subject with the standard scale for the genomic age of the subject.
Similar to increasing age, a subject having cancer or having a higher risk of developing cancer exhibits an increase in the accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome. For example, a person of a particular chronological age having more accumulated mutations and/or a higher rate of mutations in a target sequence in a target SINE, target LINE and/or the genome is at a higher risk of developing cancer compared to a person of the same chronological age who has relatively fewer accumulated mutations and/or a lower rate of mutations in a target sequence in a target SINE, target LINE and/or the genome.
Also, chronologically older individuals are at a higher risk of developing cancer because the accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome are typically higher in chronologically older individuals compared to the corresponding values in chronologically younger individuals.
Accordingly, an embodiment of the invention provides a method of identifying a higher risk of cancer development in a subject based on the accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the individual and a standard scale of cancer risk in the species to which the subject belongs.
The standard scale of cancer risk indicates the risk of cancer in a subject based on the accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the subject. For example, more accumulated mutations and/or a higher rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in a subject indicates a higher risk of cancer development in the subject compared to an individual of the same chronological age as the subject and having relatively fewer accumulated mutations and/or a lower rate of mutations in a target sequence in a target SINE, target LINE and/or the genome.
A standard scale of the cancer risk for a species, for example, humans, can be produced by determining the average accumulated mutations and/or the average rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in individuals of varying ages that are free from cancer and/or are known to have a low risk of cancer development. The standard scale of cancer risk can be used to determine the risk of cancer development in a subject based on the subject's accumulated mutations and/or rate of mutations in a target sequence in a target SINE, target LINE and/or the genome and the subject's chronological age. As such, the standard scale of cancer risk in the species indicates, at increasing chronological age, the average accumulated mutations and/or the average rate of mutations in the target sequence in the target SINE, target LINE and/or the genomes of individuals of varying ages that belong to the species and are free from cancer and/or are known to have a low risk of cancer development.
Accordingly, an embodiment of the invention provides a method for determining the risk of cancer development of a subject based on accumulated mutations and/or rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the subject. The method comprises the steps of:
a) preparing a standard scale for cancer risk by determining the average accumulated mutations and/or the average rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in individuals of varying ages that are free from cancer and/or are known to have a low risk of cancer development, or obtaining a pre-determined standard scale of cancer risk,
b) determining the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome in the subject, and
c) estimating the risk of cancer development of the subject based on the comparison of the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome of the subject with the standard scale of cancer risk.
The step of estimating the risk of cancer development of the subject based on the comparison of the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome of the subject with the standard scale can be:
a) identifying the subject as having a higher risk of cancer development if the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome of the subject are higher than the corresponding values in the standard scale of cancer risk, or
b) identifying the subject as having a lower risk or no risk of cancer development if the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome of the subject are lower than or equal to the corresponding values in the standard scale of cancer risk.
Every individual always carries some risk of cancer development. For example, spontaneous oncogenic mutations occur even among individuals living with minimal exposure to mutagens. Therefore, for the purpose of this invention, a higher risk of cancer development of a subject refers to a higher risk of cancer development compared to the risk of cancer development in the population of the same chronological age as the subject that is free from cancer and/or is known to have a low risk of cancer development. Similarly, a lower risk of cancer development of a subject refers to a lower risk of cancer development compared to the risk of cancer development in the average population of the same chronological age as the subject that is free from cancer and/or is known to have a low risk of cancer development.
If a subject is identified as having a higher risk of cancer development, enhanced screening for cancer can be administered to the subject for early detection and treatment of cancer. As is well-established in the art, early detection and treatment of cancer typically results in cancer-free survival. Therefore, administering enhanced screening to a subject based on the subject's identification as having a higher risk of cancer development ensures that the cancer, if developed, is identified during the early stages, thereby increasing the chances of cancer-free survival of the subject.
Enhanced screening for cancer indicates that the cancer screening is administered more frequently than recommended for a healthy individual. For example, if recommended screening frequency for cancer for a healthy individual is once a year, an individual identified as having a higher risk of cancer development can be screened every six months. Recommended cancer screening schedules for various cancers and the modifications which can be done to the recommended schedules to produce an enhanced screening schedule are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
Also, if a subject is identified as having a higher risk of cancer development, lifestyle changes can be recommended to the subject to reduce the risk of cancer development. Non-limiting examples of lifestyle changes which can reduce the risk of cancer development include cessation of smoking, reducing the exposure to a known carcinogen, or changing a profession or job which poses increased exposure to a particular carcinogen. Additional examples of lifestyle changes which can reduce the risk of cancer development are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
If a subject is identified as having a low risk or no risk of cancer development, enhanced screening for cancer is withheld from the subject and, optionally, routine screening is administered. Withholding enhanced screening for cancer from a subject based on the subject's identification as having a lower risk or no risk of cancer development ensures that the subject does not receive any unnecessary cancer screening. Avoiding unnecessary cancer screening may be significant because sometimes the cancer screening itself uses mutagens, for example, x-rays for the identification of breast cancer.
Personal lifestyle and environmental exposures affect a subject's risk for cancer. These factors are extrinsic to a subject's inherited genetics. For example, smoking causes as much as a 10-fold increased rate of accumulation of pulmonary epithelial mutations. Therefore, an embodiment of the invention provides a tissue; or organ-specific prediction of the risk of cancer development. Most solid tumors in adults have 33 to 66 genes with subtle somatic mutations expected to alter their proteins (
Accordingly, a further embodiment of the invention provides a method of identifying a higher risk of cancer development in a tissue or organ of a subject based on the accumulated mutations and/or the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the cells of the tissue or organ.
A standard scale of the cancer risk for a tissue or organ in a species, for example, breast cancer in humans, can be produced by determining the average accumulated mutations and/or the average rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in the cells of the tissue or organ from individuals of varying ages that are free from cancer and/or are known to have a low risk of cancer development. The standard scale of cancer risk for a tissue or organ can be used to determine the risk of cancer development in the tissue or organ of a subject based on the subject's accumulated mutations and/or rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in the cells of the tissue or organ and the subject's chronological age. As such, a standard scale of cancer risk for a tissue or organ in a species indicates, at increasing chronological ages, the average accumulated mutations and/or the average rate of mutations in a target sequence in a target SINE, target LINE and/or the genomes of the cells in the tissues or organs of individuals of varying ages that belong to the species and are free from cancer and/or are known to have a low risk of cancer development in the tissue or organ.
Non-limiting examples of the tissue or organ which can be used in the methods of the invention include placenta, brain, eyes, pineal gland, pituitary gland, thyroid gland, parathyroid glands, thorax, heart, lung, esophagus, thymus gland, pleura, adrenal glands, appendix, gall bladder, urinary bladder, large intestine, small intestine, kidneys, liver, pancreas, spleen, stoma, ovaries, uterus, testis, skin, blood or buffy coat sample of blood. Additional examples of organs and tissues are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
In certain embodiments, the methods of current invention are practiced to determine the risk of cancer, wherein the cancer is selected from acute lymphoblastic leukemia, acute myeloid leukemia, adrenocortical carcinoma, AIDS-related cancers, AIDS-related lymphoma, anal cancer, appendix cancer, astrocytoma, cerebellar cstrocytoma, basal cell carcinoma, bile duct cancer, extrahepatic bladder cancer, bladder cancer, bone cancer, osteosarcoma and malignant fibrous histiocytoma, brain stem glioma, brain tumor, central nervous system embryonal tumors, cerebral astrocytoma/malignant glioma, ependymoblastoma, medulloblastoma, medulloepithelioma, pineal parenchymal tumors of intermediate differentiation, supratentorial primitive neuroectodermal tumors and pineoblastoma, visual pathway and hypothalamic glioma, brain and spinal cord tumors, breast cancer, bronchial tumors, Burkitt lymphoma, carcinoid tumor, gastrointestinal cancer, carcinoma of the head and neck, central nervous system embryonal tumors, central nervous system lymphoma, cervical cancer, chordoma, chronic lymphocytic leukemia, chronic myelogenous leukemia, chronic myeloproliferative disorders, colorectal cancer, cutaneous T-cell lymphoma, embryonal tumors, endometrial cancer, ependymoblastoma, ependymoma, esophageal cancer, Ewing family of tumors, extracranial germ cell tumor, extrahepatic bile duct cancer, eye cancer, intraocular melanoma, retinoblastoma, gallbladder cancer, gastric (stomach) cancer, gastrointestinal carcinoid tumor, gastrointestinal stromal tumor (GIST), extracranial germ cell tumor, germ cell tumor, extragonadal germ cell tumor, ovarian cancer, gestational trophoblastic tumor, glioma, brain stem glioma, hairy cell leukemia, head and neck cancer, hepatocellular (liver) cancer, Hodgkin's lymphoma, hypopharyngeal cancer, hypothalamic and visual pathway glioma, intraocular melanoma, islet cell tumors (endocrine pancreas), Kaposi sarcoma, kidney (renal cell) cancer, kidney cancer, laryngeal cancer, chronic lymphocytic leukemia, chronic leukemia, myelogenous leukemia, lip and oral cavity cancer, lung cancer, non-small cell lung cancer, small cell lymphoma, AIDS-related lymphoma, cutaneous T-cell lymphoma, non-Hodgkin's lymphoma, macroglobulinemia, Waldenström macroglobulinemia, malignant fibrous histiocytoma of bone and osteosarcoma, medulloblastoma, medulloepithelioma, melanoma, intraocular Merkel cell carcinoma, mesothelioma, metastatic squamous neck cancer with occult primary, mouth cancer, multiple endocrine neoplasia syndrome, multiple myeloma/plasma cell neoplasm, mycosis fungoides, myelodysplastic syndromes, myelodysplastic/myeloproliferative diseases, myelogenous leukemia, multiple myeloproliferative disorders, nasal cavity and paranasal sinus cancer, nasopharyngeal cancer, neuroblastoma, non-small cell lung cancer, oral cancer, oral cavity cancer, lip and oropharyngeal cancer, osteosarcoma and malignant fibrous histiocytoma of bone, ovarian epithelial cancer, ovarian germ cell tumor, ovarian low malignant potential tumor, pancreatic cancer, papillomatosis, paranasal sinus and nasal cavity cancer, parathyroid cancer, penile cancer, pharyngeal cancer, pheochromocytoma, pineal parenchymal tumors of intermediate differentiation, pineoblastoma and supratentorial primitive neuroectodermal tumors, pituitary tumor, plasma cell neoplasm/multiple myeloma, pleuropulmonary blastoma, primary central nervous system lymphoma, prostate cancer, rectal cancer, renal pelvis and ureter raner, transitional cell cancer, respiratory tract carcinoma involving the NUT gene on chromosome 15, retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcoma, soft tissue sarcoma, uterine Sézary syndrome, skin cancer (nonmelanoma), skin carcinoma, small cell lung cancer, small intestine cancer, squamous cell carcinoma, squamous neck cancer with occult primary cancer, supratentorial primitive neuroectodermal tumors, T-cell lymphoma, testicular cancer, throat cancer, thymoma and thymic carcinoma, thyroid cancer, transitional cell cancer of the renal pelvis and ureter, gestational trophoblastic tumor, carcinoma of unknown primary site, urethral cancer, uterine cancer, endometrial uterine sarcoma, vaginal cancer, visual pathway and hypothalamic glioma, vulvar cancer, and Wilms' tumor.
Accordingly, an embodiment of the invention provides a method for determining the risk of cancer development of a tissue or organ of a subject based on accumulated mutations and/or rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in the cells of the tissue or organ of the subject. The method comprises the steps of:
a) preparing a standard scale for the cancer risk for the tissue or organ by determining the average accumulated mutations and/or the average rate of mutations in a target sequence in a target SINE, target LINE and/or the genome in the cells of the tissue or organ in individuals of varying ages that are free from cancer and/or are known to have a low risk of cancer development, or obtaining a pre-determined standard scale of cancer risk,
b) according to the clamp/dDNA combination assay of the invention, determining the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome in the cells of the tissue or organ of the subject, and
c) estimating the risk of cancer development of the tissue or organ of the subject based on the comparison of the accumulated mutations and/or the rate of mutations in the target sequence in the target SINE, target LINE and/or the genome in the cells of the tissue or organ of the subject with the standard scale of cancer risk for the tissue or organ.
A person of ordinary skill in the art would appreciate that the steps described above for estimating the risk of cancer development in the subject and the steps taken after a subject is identified as having a higher or lower risk of cancer development are relevant to cancer development of a tissue or organ and are within the purview of the invention as it relates to the estimation of a risk of cancer development of a tissue or organ.
Lifestyle changes are prescribed by medical professionals as means of improving overall health and well-being of humans, including decreasing the chances of cancer development or slowing down the rate of aging. An embodiment of the invention provides a method of determining whether a particular lifestyle change or a combination of lifestyle changes effectively reduced the risk of cancer development in a subject or effectively slowed down the rate of aging.
For example, a person's risk of cancer development, such as a cancer of a particular tissue or organ, can be determined before and after a lifestyle change is initiated. Similarly, a person's rate of aging, for example, rate of increase in the genomic age, can be determined before and after a lifestyle change is initiated.
Non-limiting examples of lifestyle changes that are recommended for reducing the risk of cancer and/or slowing down the rate of aging include weight loss, cessation of smoking, limiting the exposure to a known carcinogen, change of a profession or job to avoid exposure to a particular carcinogen, dietary changes, etc. Additional examples of lifestyle changes that can be prescribed to reduce the risk of cancer development and/or slow down the rate of aging are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
If the risk of cancer development and/or the rate of aging are reduced, indicated, for example, by a reduced rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the subject, the lifestyle change is considered to be successful in achieving the intended goal.
On the other hand, if the risk of cancer development and/or the rate of aging are not reduced or are not reduced to the desired extent, indicated, for example, by no reduction or lack of reduction to the preferred extent in the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the subject, the lifestyle change is considered to be unsuccessful. In such cases, different and/or additional lifestyle changes can be recommended to the subject for achieving the desired result.
Exposure to mutagens, changes in environment or changes in lifestyle can affect the overall health and well-being of humans, including altering the chances of cancer development and/or changing the rate of aging. An embodiment of the invention provides a method of determining whether a lifestyle change or a combination of lifestyle changes, exposure to mutagens, or changes in environment altered the risk of cancer development of a subject and/or changed the rate of aging. For example, a person's risk of cancer development, such as a cancer of a particular tissue or organ, and/or the rate of aging as indicated by the genomic age can be determined before and after the lifestyle change was initiated. Non-limiting examples of lifestyle changes which can alter a subject's risk of cancer development and/or rate of aging include weight gain, smoking, exposure to a known carcinogen, change of a profession or job causing increased exposure to a particular carcinogen, dietary changes, etc. Additional examples of lifestyle changes that can alter the risk of cancer development and/or change the rate of aging are well-known to a person of ordinary skill in the art and such embodiments are within the purview of the invention.
Accordingly, an embodiment of the invention provides a method of identifying the effect of a lifestyle change on the risk of cancer development and/or the rate of aging of a subject. The method comprises:
a) according to the clamp/dPCR combination assay, determining the rate of mutations in a target sequence within a target SINE or target LINE and/or in the genome of the subject immediately before the lifestyle change is initiated,
b) according to the clamp/dPCR combination assay, determining the rate of mutations in the target sequence within the target SINE or target LINE and/or in the genome of the subject after the lifestyle change is initiated, and
c) comparing the rates of mutations in the target sequence within the target SINE or target LINE and/or the genome of the subject before and after the lifestyle change is initiated to determine the effect of the lifestyle change on the risk of cancer development and/or the rate of aging of the subject.
If the risk of cancer development increases and/or the rate of aging increases, indicated, for example, by a higher rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the subject, the lifestyle change is considered to increase the risk of cancer development and/or increase the rate of aging. The subject can then be recommended to either eliminate the lifestyle change which increased the risk of cancer and/or increased the rate of aging. Alternately, another lifestyle change intended to counter the earlier lifestyle change can be recommended.
On the other hand, if the risk of cancer development and/or the rate of aging does not increase or does not increase to an alarming extent, indicated, for example, by no increase or lack of increase to an alarming extent in the rate of mutations in a target sequence in a target SINE, target LINE and/or the genome of the subject, the lifestyle change is considered to be harmless. In such cases, no unnecessary changes in lifestyle are recommended.
Similarly, an embodiment of the invention provides a method of identifying the effect of an exposure to a mutagen on the risk of cancer development and/or the rate of aging of a subject. The method comprises:
a) according to the clamp/dPCR combination assay, determining the rate of mutations in a target sequence within a target SINE or target LINE and/or in the genome of the subject immediately before the exposure to the mutagen,
b) according to the clamp/dPCR combination assay, determining the rate of mutations in the target sequence within the target SINE or target LINE and/or in the genome of the subject after the exposure to the mutagen, and
c) comparing the rates of mutations in the target sequence within the target SINE or target LINE and/or the genome of the subject before and after the exposure to the mutagen to determine the effect of the exposure to the mutagen on the risk of cancer development and/or the rate of aging of the subject.
A further embodiment of the invention provides a kit comprising reagents to carry out the clamp/dPCR assay of the invention. In one embodiment, the kit comprises primers and/or probes specific for a SINE of interest in a species of interest. The kit can also comprise chemicals for treating the tissue or the genomic DNA sample obtained from the subject, for example, deproteination, degradation of non-DNA nucleotides, removal of other impurities, etc. The kit can further contain reagents and/or instrumentation for fractionating the genomic DNA into fragments of a desired size. Additionally, the kit can contain reagents and/or instrumentation for conducting the dPCR reaction. A manual containing instructions to carry out various methods of the invention can also be included in the kit.
All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
Following are examples which illustrate procedures for practicing the invention. These examples should not be construed as limiting. All percentages are by weight and all solvent mixture proportions are by volume unless otherwise noted.
Protein-coding regions of the human genome occupy only ˜1.5% of the DNA, accounting for approximately 21,000 genes on the 23 chromosomes. A large component of the remaining DNA is composed of SINEs. Alu elements are the most abundant SINE in the human genome. Similarly, B1 elements are the most abundant SINE in the mouse genome. Alu elements are short with approximately 300-350 base pairs and contain a restriction enzyme site. With approximately 500,000 to 1,500,000 copies, B1 elements and Alu make up about 11% of the mouse and human genomes, respectively.
An embodiment of the invention provides assaying point mutations in 11% of the genome formed by the Alu elements. The rate of mutations in the genome-wide Alu elements can be used to obtain an accurate estimation of mutations in the genome.
Accordingly, in one embodiment, the invention provides a clamp/dPCR assay for Alu, a human SINE, to serially quantify the total Alu mutations in an organ in a subject, for example, examining 109 Alu loci in an hour. In another embodiment, the invention provides a method of determining the accumulated mutations in Alu. In an even further embodiment, the invention provides a method of using the accumulated mutations in conjunction with chronological age and Surveillance, Epidemiology and End Results (SEER) cancer statistics to quantitatively predict cancer risk (
The clamp/dPCR assay of the invention can detect 1-2 mutant DNA fragments in a pool of 100,000 wild-type fragments (
In a 45-min cycle of a dPCR, for example, ddPCR (BioRad QX200 AutoDG ddPCR, Hercules, Calif.), over 10 DNA fragments can be analyzed in each of the 8 channels for the presence of mutations. The clamp/dPCR assay of the invention allows using 100 or 1,000 fragments per drop instead of ˜1-2 fragments per drop because the target sequence clamp prevents amplification in most of the target SINE sequences that are likely to be wild-type. Since about 10% of those DNA fragments likely contain the target SINE, ˜105 target SINEs are analyzed at one fragment per well or drop. Therefore, the clamp/dPCR combination assay of the invention improves the screening of Alu fragments by 2 or 3 orders of magnitude, i.e., up to 108 Alu/channel. Assuming 10−6 mutations per cell division and/or per week of age, the assay has the capacity to estimate the rate of mutations in genome-wide Alu sequences.
Target sequence clamps for mouse B1 and human Alu with base sizes of 16-20 are provided. In a test run, excess idealized wild-type B1 with a trace of mutant B was tested using a 16-base Clamp2 (SEQ ID NO: 4) at the dilutions of 1:1000 and 1:10,000. Additional clamps can be designed, and alleles for any clamp can be prepared and multiplexed to eliminate common variations in SINE sequences.
According to an embodiment, any tissue can be tested with the clamp/dPCR combination assay. For example, in humans, a skin or mucosal scrape or 1-2 μl of blood is sufficient. Since reagents used in routine PCR are used, the assay provides high quality and reproducibility. Therefore, an assay can be repeated economically in the same subject or organ.
Like natural age-related genome-wide accumulated mutations and/or rate of mutations, radiation-related genome-wide accumulated mutations and/or rate of mutations occur at different rates in individuals and directly lead to different risks and rates for cancer. Radiation-induced mutations should be random with respect to their distribution across the genome. Although some location-specific effects likely arise due to repair mechanisms, testing for mutations near specific genes is likely to be futile. On the other hand, overall mutation load, e.g., the genome-wide accumulated mutations and/or the rate of mutations and the incidence of driver mutations have a linear relationship.
A dose- and LET-dependent increase in genome-wide mutations is expected. Basal and serial studies can be performed following 1H, 1n, 28Si, or 56Fe irradiation (dose ≤0.5 Gy). Both sexes and individual organs can be studied. Several assays can be designed based on Table 7.
Point and indel mutations increase with aging and radiation of different qualities and doses affect the rate of mutations. Point and indel mutation comprises over 90-95% of all mutations. The test provided in this example is simple and inexpensive and requires only 1-2 ng of DNA. As such, it can be performed on a number of mouse strains, on each individual animal and in different organs. Strains representing a range of cancer predilections (including sex-related cancers, such as breast and ovary) can also be studied. A preliminary evaluation of the various potential clamp sites can be made to determine the most robust set of target sequence clamps for use in an animal of interest, for example, a mouse strain.
Various tissues, including blood, muscle, brain, heart, lung, skin, breast, large bowel, liver, and spleen, can be tested for the effect of radiation. Because cancer rates increase in progeny after high LET, testes and ovarian tissues can be tested to evaluate germline genome-wide mutation levels. These organs are chosen because these organs are all known to have cancer predispositions following radiation exposure or, like skin, might sustain the highest radiation exposure.
Non-limiting examples of specific organs which can be tested according to this example of the invention include skin, lung, breast, and WBC. Skin receives a higher exposure than most organs and leukemia is common after irradiation. Also, WBC genome-wide mutations are needed for human comparison and lung and breast tumors are relatively common in mice and humans. These tissues are particularly preferred to study the effects of radiation.
The test can be carried on tissues obtained at 0 hours, 24 hours, 1 month, 6 months, 1 year, 2 to 5 years or longer after exposure to radiation. These analyses can be done on individual mice for animal-specific organ comparisons. Thus, intra- and inter-strain comparisons to compute the difference between the genomic age and the chronological age (Δage) can be calculated.
The number of the genome-wide mutations in a spontaneous tumor is similar to or larger than the number of these mutations in the normal tissue. A spontaneous tumor refers to a tumor which arises in a subject that is not exposed to known carcinogens or tumor-promoting factors, e.g., ionizing radiation, mutagens, oncogenic viruses, etc.
Tumors and the source tissue can be examined from the same subject. NIH Swiss white mice with a female to male ratio of 1:2 can be used. Having fewer females is also logical as breast and ovarian cancers are common in this strain, leading to good representation of females in the final tumor population. This strain has a ˜10-20% cumulative lifetime risk of malignancy, with lung>ovary>breast>leukemia>sarcoma>gastrointestinal (GI) cancers. The GI cancers include an even mix of stomach, colon and liver.
Genetically defined animals with cancer predilection can also be used. For example, the KrasJA1 model of lung cancer, the Apc heterozygote knockout model for GI cancers, as well as other cancer-predisposed models featuring Trp53−/− can be used. However, as most of these mice develop cancer without radiation and simply demonstrate shorter latency or more aggressive pathology when irradiated, they are good models for radiation-induced cancer progression but less effective for emulating spontaneous carcinogenesis. Therefore, the strain chosen for studying passenger DNA damage (PDD) in spontaneous tumors should be a typically healthy strain.
Spontaneous oncogenesis studies can be long-term, with latency to cancer of 300-800 days in mice, and can involve large and laborious animal cohorts as the lifetime risks are only 10-20% in non-irradiated and 15-30% in irradiated animals. Certain algorithms can be used which do not require validation by correlation with the rate of cancers in animals, thereby allowing the use of smaller cohorts. Also, algorithms can be used which require the accumulation of DNA damage after irradiation to be allometrically scalable between species so that genomic age and Δage can be calculated. Accordingly, human epidemiological statistics can be applied to predict driver frequency and human cancer risk.
PDD mutations increase with normal human aging processes and can progress in different subjects at different rates. Subjects in the various age groups/ranges can be studied. PDD mutations are expected to correlate with increasing age.
Framework of PDD and Cancer Risk:
Subject-specific rates of genomic aging as markers for cancer risk can be calculated. Genomic aging can be quantified through serial measurements of point and indel mutations. A subject's genomic aging status can be tracked against age-related cancer incidence trajectories at the population level for risk estimation. The rate of PDD accumulation is a primary quantity of interest. Germline variations can be implicitly adjusted to confound the risk estimates. Specifically, at birth, there exist some number of germline abnormalities (point and indels) in each cell in the body. Notably, the overwhelming majority of germline abnormalities behave as passengers and do not confer heightened cancer risk. A subject's neonatal blood can be used as the subject's mutation-free state, which can be used to fully disentangle PDD accumulation over the subject's lifetime from benign germline variations. However, sequential measurements of PDD over a period can be used to estimate rates of future damage accumulation that can be compared against population averages to determine relative risk for a subject.
Established age-specific cancer incidence data which show that most cancers follow a power law in chronological age can be used as a frame of reference. In particular, incidence curves are commonly observed as I(t)∝tk, where t and I(t) respectively denote chronological age and age-specific incidence. The constant k is estimated from population data and is specific to cancer type, sex, race, and other epidemiologic factors. Appropriate functional forms for population subgroups can be determined using the SEER data.
The occurrence of PDD is proportional to the occurrence of driver mutations and, hence, proportional to cancer incidence. Therefore,
where c is a constant (
can be calculated as a simple difference, and k and t have been derived from the population analysis. This analysis does not require an assumption-laden back-calculation to cellular abnormalities at birth. In short, subjects with large estimated values of c will be identified as having a high genomic age and an elevated future risk of cancer, since their high mutation accumulation rate implies a history of harmful exposure and an enhanced likelihood of driver mutations.
Genomic aging can predict adverse responses to sustained low-dose irradiation. Flexible regression strategies (e.g., spline fitting) will be used to optimally link molecular markers of radiation sensitivity to PDD. Data from the non-irradiated animals can be paired with analogous data from human subjects to permit allometric scaling of marker quantities from mouse to human. The rate of PDD accumulation at age t and also PDD itself at age t through the system of equations specified separately for chronological age can be estimated. PDD is expected to represent a more comprehensive measure of accumulated DNA damage.
It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and the scope of the appended claims. In addition, any elements or limitations of any invention or embodiment thereof disclosed herein can be combined with any and/or all other elements or limitations (individually or in any combination) or any other invention or embodiment thereof disclosed herein, and all such combinations are contemplated within the scope of the invention without limitation thereto.
This application claims the benefit of U.S. Provisional Application Ser. No. 62/318,879, filed Apr. 6, 2016, the disclosure of which is hereby incorporated by reference in its entirety, including all figures, tables and amino acid or nucleic acid sequences.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US17/26061 | 4/5/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62318879 | Apr 2016 | US |