Many exciting diagnostic and prognostic applications using cell-free DNA (cfDNA) have been developed for noninvasive prenatal testing and cancer liquid biopsies (Chiu et al., Proc Ntl Acad Sci USA. 2008; 105:20458-20463; Chan et al., N Engl J Med, 2017; 377:513-522). Plasma cfDNA is essentially a mixture of short DNA molecules with a modal size of 166 bp that are released from different tissues in the body, including but not limited to, hematopoietic tissues, brain, liver, lung, colon, pancreas and so on (Sun et al., Proc Natl Acad Sci USA. 2015; 112:E5503-12; Lehmann-Werman et al., Proc Natl Acad Sci USA. 2016; 113: E1826-34; Moss et al., Nat Commun. 2018; 9: 5068).
Exploiting the unique pattern of methylation in multiple cell types, cfDNA has been interrogated at differentially methylated regions to determine the tissue-of-origin of cfDNA molecules, where the increase of cfDNA from specific tissues can allow for the site of pathology to be localized (Sun et al., Proc Natl Acad Sci USA. 2015; 112:E5503-12; Guo et al., Nat Genet. 2017; 49:635-642). For example, genome-wide analysis of DNA methylation differences between cancer and normal cells has been utilized for cancer detection (Chan et al., Proc Natl Acad Sci USA. 2013; 110:18761-18768; Kang et al., Genome Bio. 2017:18).
While cfDNA methylation is a promising marker for cancer and tissue-of-origin testing, the field has only just begun to explore the biology behind the fragmentation of cfDNA. In this regard, the fragmentation of DNA into cfDNA has been found to be non-random and to reflect the underlying position of nucleosomes (Sun et al., Genome Res. 2019; 29:418-427; Snyder et al., Cell. 2016; 164:57-68; Ivanov et al., BMC Genomics. 2015; 16:S1; Chandrananda et al., BMC Med Genomics. 2015; 8:29). By studying the fragmentomics of cfDNA, we have previously shown that different nuclease deficiencies affect cfDNA fragment ends and size profiles (Serpas et al., Proc Natl Acad Sci USA. 2019; 116:641-649; Han et al., Am J Hum Genet. 2020; 106:202-214; Chan et al., Am J Hum Genet. 2020; 1-13). The fragmentomic profile of cfDNA has been revealed as an emerging biomarker for cancer (Jiang et al., Cancer Discov. 2020; 10:664-673).
Some embodiments of the present disclosure describes practical implementation of cfDNA methylation measurements for determining nuclease-mediated cfDNA fragmentation, which can be used for determining a level of cancer and a fractional concentration of cfDNA in a sample. Certain levels of nuclease activity may be correlated with certain levels of methylation in certain regions
As examples, methylation levels in certain genomic regions can be analyzed to classify nuclease activity. The relative abundance of fragments covering sites that are hypomethylated or hypermethylated can be used to determine a level of a condition (e.g., a disease or disorder) in a subject, a classification of nuclease activity, or a fractional concentration of clinically-relevant DNA molecules in a sample. The classification of whether a gene exhibits a genetic disorder or the efficacy of a treatment can also be determined using methylation levels at certain sites.
DNA fragments from a subject with a condition may have a greater tendency to be within certain regions (e.g., open chromatin regions). The number of copy number aberrations within these regions compared to copy number aberrations including those outside these regions can be used to determine whether a subject has a condition.
In other examples, methylation of cfDNA in a biological sample can also be used to provide information on the sample itself. Analyzing fragments from sites that are hypomethylated or hypermethylated in a reference genome can be used to estimate the fractional concentration of clinically-relevant DNA molecules in a biological sample.
In addition to using linear cfDNA, cfDNA from extrachromosomal circular DNA (eccDNA) can be used to analyze a biological sample. The size distribution of cfDNA fragments from eccDNA can be used to determine a classification of whether a gene exhibits a genetic disorder, a classification of nuclease activity, or an efficacy of treatment. A parameter value based on the amount of eccDNA in a sample can be used to determine whether a gene exhibits a genetic disorder. Embodiments described herein also include methods for determining a quantity in a mixture of cell-free DNA fragments from eccDNA.
These and other embodiments of the disclosure are described in detail below. For example, other embodiments are directed to systems, devices, and computer readable media associated with methods described herein.
A better understanding of the nature and advantages of embodiments of the present invention may be gained with reference to the following detailed description and the accompanying drawings.
A “tissue” corresponds to a group of cells that group together as a functional unit. More than one type of cells can be found in a single tissue. Different types of tissue may consist of different types of cells (e.g., hepatocytes, alveolar cells or blood cells), but also may correspond to tissue from different organisms (mother vs. fetus) or to healthy cells vs. tumor cells. “Reference tissues” can correspond to tissues used to determine tissue-specific methylation levels. Multiple samples of a same tissue type from different individuals may be used to determine a tissue-specific methylation level for that tissue type.
A “biological sample” refers to any sample that is taken from a subject (e.g., a human (or other animal), such as a pregnant woman, a person with cancer or other disorder, or a person suspected of having cancer or other disorder, an organ transplant recipient or a subject suspected of having a disease process involving an organ (e.g., the heart in myocardial infarction, or the brain in stroke, or the hematopoietic system in anemia) and contains one or more nucleic acid molecule(s) of interest. The biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g. of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g. thyroid, breast), intraocular fluids (e.g. the aqueous humor), etc. Stool samples can also be used. In various embodiments, the majority of DNA in a biological sample that has been enriched for cell-free DNA (e.g., a plasma sample obtained via a centrifugation protocol) can be cell-free, e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free. The centrifugation protocol can include, for example, 3,000 g×10 minutes, obtaining the fluid part, and re-centrifuging at for example, 30,000 g for another 10 minutes to remove residual cells. As part of an analysis of a biological sample, a statistically significant number of cell-free DNA molecules can be analyzed (e.g., to provide an accurate measurement) for a biological sample. In some embodiments, at least 1,000 cell-free DNA molecules are analyzed. In other embodiments, at least 10,000 or 50,000 or 100,000 or 500,000 or 1,000,000 or 5,000,000 cell-free DNA molecules, or more, can be analyzed. At least a same number of sequence reads can be analyzed.
A “sequence read” refers to a string of nucleotides sequenced from any part or all of a nucleic acid molecule. For example, a sequence read may be a short string of nucleotides (e.g., 20-150 nucleotides) sequenced from a nucleic acid fragment, a short string of nucleotides at one or both ends of a nucleic acid fragment, or the sequencing of the entire nucleic acid fragment that exists in the biological sample. A sequence read may be obtained in a variety of ways, e.g., using sequencing techniques or using probes, e.g., in hybridization arrays or capture probes as may be used in microarrays, or amplification techniques, such as the polymerase chain reaction (PCR) or linear amplification using a single primer or isothermal amplification. As part of an analysis of a biological sample, at least 1,000 sequence reads can be analyzed. As other examples, at least 10,000 or 50,000 or 100,000 or 500,000 or 1,000,000 or 5,000,000 sequence reads, or more, can be analyzed. An amount of sequence reads can be used as a proxy for the number of DNA fragments. To determine the number of DNA fragments from the amount of sequence reads, a calculation may be performed to account for paired-end sequencing and/or bias of sequencing techniques.
A sequence read can include an “ending sequence” associated with an end of a fragment. The ending sequence can correspond to the outermost N bases of the fragment, e.g., 1-30 bases at the end of the fragment. If a sequence read corresponds to an entire fragment, then the sequence read can include two ending sequences. When paired-end sequencing provides two sequence reads that correspond to the ends of the fragments, each sequence read can include one ending sequence.
An “endingposition” or “endposition” (or just “end) can refer to the genomic coordinate or genomic identity or nucleotide identity of the outermost base, i.e. at the extremities, of a cell-free DNA molecule, e.g. plasma DNA molecule. The end position can correspond to either end of a DNA molecule. In this manner, if one refers to a start and end of a DNA molecule, both would correspond to an ending position. In practice, one end position is the genomic coordinate or the nucleotide identity of the outermost base on one extremity of a cell-free DNA molecule that is detected or determined by an analytical method, such as but not limited to massively parallel sequencing or next-generation sequencing, single molecule sequencing, double- or single-stranded DNA sequencing library preparation protocols, polymerase chain reaction (PCR), or microarray. Such in vitro techniques may alter the true in vivo physical end(s) of the cell-free DNA molecules. Thus, each detectable end may represent the biologically true end or the end is one or more nucleotides inwards or one or more nucleotides extended from the original end of the molecule e.g. 5′ blunting and 3′ filling of overhangs of non-blunt-ended double stranded DNA molecules by the Klenow fragment. The genomic identity or genomic coordinate of the end position could be derived from results of alignment of sequence reads to a human reference genome, e.g. hg19. It could be derived from a catalog of indices or codes that represent the original coordinates of the human genome. It could refer to a position or nucleotide identity on a cell-free DNA molecule that is read by but not limited to target-specific probes, mini-sequencing, DNA amplification.
A “preferred end” (or “recurrent endingposition”) refers to an end that is more highly represented or prevalent (e.g., as measured by a rate) in a biological sample having a physiological (e.g. pregnancy) or pathological (disease) state (e.g. cancer) than a biological sample not having such a state or than at different time points or stages of the same pathological or physiological state, e.g., before or after treatment. A preferred end therefore has an increased likelihood or probability for being detected in the relevant physiological or pathological state relative to other states. The increased probability can be compared between the pathological state and a non-pathological state, for example in patients with and without a cancer and quantified as likelihood ratio or relative probability. The likelihood ratio can be determined based on the probability of detecting at least a threshold number of preferred ends in the tested sample or based on the probability of detecting the preferred ends in patients with such a condition than patients without such a condition. Examples for the thresholds of likelihood ratios include but not limited to 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.8, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 8, 10, 20, 40, 60, 80 and 100. Such likelihood ratios can be measured by comparing relative abundance values of samples with and without the relevant state. Because the probability of detecting a preferred end in a relevant physiological or disease state is higher, such preferred ending positions would be seen in more than one individual with that same physiological or disease state. With the increased probability, more than one cell-free DNA molecule can be detected as ending on a same preferred ending position, even when the number of cell-free DNA molecules analyzed is far less than the size of the genome. Thus, the preferred or recurrent ending positions are also referred to as the “frequent ending positions.” In some embodiments, a quantitative threshold may be used to require that ends be detected at least multiple times (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 50) within the same sample or same sample aliquot to be considered as a preferred end. A relevant physiological state may include a state when a person is healthy, disease-free, or free from a disease of interest. Similarly, a “preferred ending window” corresponds to a contiguous set of preferred ending positions.
A “rate” of DNA molecules ending on a position relates to how frequently a DNA molecule ends on the position. Such a rate can be referred to as an “end density.” The rate may be based on a number of DNA molecules that end on the position normalized against a number of DNA molecules analyzed. The normalization can also be based on the average, median, or total number of ends in the surrounding region. The surrounding region used for normalization may include, but is not limited to, 500, 1000, 3000, 5000, etc. bp upstream and/or downstream of the position.
The term “alleles” refers to alternative DNA sequences at the same physical genomic locus, which may or may not result in different phenotypic traits. In any particular diploid organism, with two copies of each chromosome (except the sex chromosomes in a male human subject), the genotype for each gene comprises the pair of alleles present at that locus, which are the same in homozygotes and different in heterozygotes. A population or species of organisms typically include multiple alleles at each locus among various individuals. A genomic locus where more than one allele is found in the population is termed a polymorphic site. Allelic variation at a locus is measurable as the number of alleles (i.e., the degree of polymorphism) present, or the proportion of heterozygotes (i.e., the heterozygosity rate) in the population. As used herein, the term “polymorphism” refers to any inter-individual variation in the human genome, regardless of its frequency. Examples of such variations include, but are not limited to, single nucleotide polymorphism, simple tandem repeat polymorphisms, insertion-deletion polymorphisms, mutations (which may be disease causing) and copy number variations. The term “haplotype” as used herein refers to a combination of alleles at multiple loci that are transmitted together on the same chromosome or chromosomal region. A haplotype may refer to as few as one pair of loci or to a chromosomal region, or to an entire chromosome or chromosome arm.
A “relative frequency” (also referred to just as “frequency”) may refer to a proportion (e.g., a percentage, fraction, or concentration). In particular, a relative frequency of a particular end motif (e.g., CCGA or just a single base) can provide a proportion of cell-free DNA fragments in a sample that are associated with the end motif CCGA, e.g., by having an ending sequence of CCGA.
An “aggregate value” may refer to a collective property, e.g., of relative frequencies of a set of ending positions. Examples include a mean, a median, a sum of relative frequencies, a variation among the relative frequencies (e.g., standard deviation (SD), the coefficient of variation (CV), interquartile range (IQR) or a certain percentile cutoff (e.g. 95th or 99th percentile) among different relative frequencies), or a difference (e.g., a distance) from a reference pattern of relative frequencies, as may be implemented in clustering.
A “calibration sample” can correspond to a biological sample whose desired measured value (e.g., nuclease activity, classification of a genetic disorder, or other desired property) is known or determined via a calibration method, e.g., using other measurement techniques such as clotting measurements for effective dosage or ELISA for measuring nuclease quantity or assays quantifying the rate of DNA digestion by nucleases for measuring nuclease activity (an example method can involve fluorometric or spectrophotometric measurement of DNA quantity before and after, or in real-time, the addition of a nuclease-containing sample; another example is using radial enzyme diffusion methods). A calibration sample can have separate measured values (e.g., an amount of fragments with a particular end motif or with a particular size) can be determined to which the desired measure value can be correlated.
A “calibration data point” includes a “calibration value” (e.g., an amount of fragments with a particular end motif or with a particular size) and a measured or known value that is desired to be determined for other test samples. The calibration value can be determined from various types of data measured from DNA molecules of the sample, (e.g., an amount of fragments with an end motif or with a particular size). The calibration value corresponds to a parameter that correlates to the desired property, e.g., classification of a genetic disorder, nuclease activity, or efficacy of anticoagulant dosage. For example, a calibration value can be determined from measured values as determined for a calibration sample, for which the desired property is known. The calibration data points may be defined in a variety of ways, e.g., as discrete points or as a calibration function (also called a calibration curve or calibration surface). The calibration function could be derived from additional mathematical transformation of the calibration data points.
A “site” (also called a “genomic site”) corresponds to a single site, which may be a single base position or a group of correlated base positions, e.g., a CpG site, TSS site, Dnase hypersensitivity site, or larger group of correlated base positions. A “locus” may correspond to a region that includes multiple sites. A locus can include just one site, which would make the locus equivalent to a site in that context.
A “separation value” corresponds to a difference or a ratio involving two values, e.g., two fractional contributions or two methylation levels. The separation value could be a simple difference or ratio. As examples, a direct ratio of x/y is a separation value, as well as x/(x+y). The separation value can include other factors, e.g., multiplicative factors. As other examples, a difference or ratio of functions of the values can be used, e.g., a difference or ratio of the natural logarithms (ln) of the two values. A separation value can include a difference and a ratio.
A “separation value” and an “aggregate value” (e.g., of relative frequencies) are two examples of a parameter (also called a metric) that provides a measure of a sample that varies between different classifications (states), and thus can be used to determine different classifications. An aggregate value can be a separation value, e.g., when a difference is taken between a set of relative frequencies of a sample and a reference set of relative frequencies, as may be done in clustering.
A “relative abundance” is a type of separation value that relates an amount (one value) of cell-free DNA molecules ending within one window of genomic position to an amount (other value) of cell-free DNA molecules ending within another window of genomic positions. The two windows may overlap, but would be of different sizes. In other implementations, the two windows would not overlap. Further, the windows may be of a width of one nucleotide, and therefore be equivalent to one genomic position. An end density is a type of relative abundance.
The term “classification” as used herein refers to any number(s) or other characters(s) that are associated with a particular property of a sample. For example, a “+” symbol (or the word “positive”) could signify that a sample is classified as having deletions or amplifications. The classification can be binary (e.g., positive or negative) or have more levels of classification (e.g., a scale from 1 to 10 or 0 to 1).
The terms “cutoff” and “threshold” refer to predetermined numbers used in an operation. For example, a cutoff size can refer to a size above which fragments are excluded. A threshold value may be a value above or below which a particular classification applies. Either of these terms can be used in either of these contexts. A cutoff or threshold may be “a reference value” or derived from a reference value that is representative of a particular classification or discriminates between two or more classifications. Such a reference value can be determined in various ways, as will be appreciated by the skilled person. For example, metrics can be determined for two different cohorts of subjects with different known classifications, and a reference value can be selected as representative of one classification (e.g., a mean) or a value that is between two clusters of the metrics (e.g., chosen to obtain a desired sensitivity and specificity). As another example, a reference value can be determined based on statistical simulations of samples. A particular value for a cutoff, threshold, reference, etc. can be determined based on a desired accuracy (e.g., a sensitivity and specificity).
A “level of pathology” (or level of a disorder) can refer to the amount, degree, or severity of pathology associated with an organism. An example is a cellular disorder in expressing a nuclease. Another example of pathology is a rejection of a transplanted organ. Other example pathologies can include autoimmune attack (e.g., lupus nephritis damaging the kidney or multiple sclerosis), inflammatory diseases (e.g., hepatitis), fibrotic processes (e.g. cirrhosis), fatty infiltration (e.g. fatty liver diseases), degenerative processes (e.g. Alzheimer's disease) and ischemic tissue damage (e.g., myocardial infarction or stroke). A heathy state of a subject can be considered a classification of no pathology. The pathology can be cancer.
The term “level of cancer” can refer to whether cancer exists (i.e., presence or absence), a stage of a cancer, a size of tumor, whether there is metastasis, the total tumor burden of the body, the cancer's response to treatment, and/or other measure of a severity of a cancer (e.g. recurrence of cancer). The level of cancer may be a number or other indicia, such as symbols, alphabet letters, and colors. The level may be zero. The level of cancer may also include premalignant or precancerous conditions (states). The level of cancer can be used in various ways. For example, screening can check if cancer is present in someone who is not previously known to have cancer. Assessment can investigate someone who has been diagnosed with cancer to monitor the progress of cancer over time, study the effectiveness of therapies or to determine the prognosis. In one embodiment, the prognosis can be expressed as the chance of a patient dying of cancer, or the chance of the cancer progressing after a specific duration or time, or the chance or extent of cancer metastasizing. Detection can mean ‘screening’ or can mean checking if someone, with suggestive features of cancer (e.g. symptoms or other positive tests), has cancer.
The “methylation index” or “methylation status” for each genomic site (e.g., a CpG site) can refer to the proportion of DNA fragments (e.g., as determined from sequence reads or probes) showing methylation at the site over the total number of reads covering that site. A “read” can correspond to information (e.g., methylation status at a site) obtained from a DNA fragment. A read can be obtained using reagents (e.g. primers or probes) that preferentially hybridize to DNA fragments of a particular methylation status. Typically, such reagents are applied after treatment with a process that differentially modifies or differentially recognizes DNA molecules depending of their methylation status, e.g. bisulfite conversion, or methylation-sensitive restriction enzyme, or methylation binding proteins, or anti-methylcytosine antibodies, or single molecule sequencing techniques that recognize methylcytosines and hydroxymethylcytosines.
The “methylation density” of a region can refer to the number of reads at sites within the region showing methylation divided by the total number of reads covering the sites in the region. The sites may have specific characteristics, e.g., being CpG sites. Thus, the “CpG methylation density” of a region can refer to the number of reads showing CpG methylation divided by the total number of reads covering CpG sites in the region (e.g., a particular CpG site, CpG sites within a CpG island, or a larger region). For example, the methylation density for each 100-kb bin in the human genome can be determined from the total number of cytosines not converted after bisulfite treatment (which corresponds to methylated cytosine) at CpG sites as a proportion of all CpG sites covered by sequence reads mapped to the 100-kb region. This analysis can also be performed for other bin sizes, e.g. 500 bp, 5 kb, 10 kb, 50-kb or 1-Mb, etc. A region could be the entire genome or a chromosome or part of a chromosome (e.g. a chromosomal arm). The methylation index of a CpG site is the same as the methylation density for a region when the region only includes that CpG site. The “proportion of methylated cytosines” can refer the number of cytosine sites, “C's”, that are shown to be methylated (for example unconverted after bisulfite conversion) over the total number of analyzed cytosine residues, i.e. including cytosines outside of the CpG context, in the region. The methylation index, methylation density and proportion of methylated cytosines are examples of “methylation levels.” Apart from bisulfite conversion, other processes known to those skilled in the art can be used to interrogate the methylation status of DNA molecules, including, but not limited to enzymes sensitive to the methylation status (e.g. methylation-sensitive restriction enzymes), methylation binding proteins, single molecule sequencing using a platform sensitive to the methylation status (e.g. nanopore sequencing (Schreiber et al. Proc Natl Acad Sci 2013; 110: 18910-18915) and by the Pacific Biosciences single molecule real time analysis (Tse et al. Proc Natl Acad Sci USA 2021; 118: e2019768118)
The term “hypomethylation” can refer to a site or set of sites (e.g., a region) that has below a specified value for a methylation level, e.g., at or below 80%, 75%, 70%, 65%, or 60% for the methylation level. The term “hypermethylation” can refer to a site or set of sites (e.g., a region) that has above a specified value for a methylation level, e.g., at or above 80%, 75%, 70%, 65%, or 60% for the methylation level.
The name of a gene is typically written in italics. A human gene is typically also written in all capital letters. A mouse gene may not be capitalized after the first letter. The protein is conventionally written in all capital letters and without italics. As examples, a mouse may have the Dnase1l3 gene and the DNASE1L3 protein, while a human may have the DNASE1L3 gene and the DNASE1L3 protein.
The term “about” or “approximately” can mean within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term “about” or “approximately” can mean within an order of magnitude, within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed. The term “about” can have the meaning as commonly understood by one of ordinary skill in the art. The term “about” can refer to +10%. The term “about” can refer to ±5%.
Cell-free DNA (cfDNA) is a powerful, non-invasive biomarker for cancer and prenatal testing and circulates in plasma (as well as other cell-free samples) as short fragments. Cell-free DNA includes both linear DNA and extrachromosomal circular DNA (eccDNA). eccDNA in plasma includes DNA from a subset of mitochondrial genomes of certain tissues. However, the cfDNA has not been used to understand nuclease activity in individuals. Different nuclease activities can indicate different levels of disease and different tissue types. Additionally, the effect of nuclease activity on DNA fragmentation and methylation had not previously been accounted for in analyzing cfDNA. In this disclosure, we apply relationships between nuclease activity, cfDNA methylation, and size profile in analyzing biological samples.
Different nuclease deficiencies may affect the apparent methylation level of plasma cfDNA on a genome-wide level. DNASE1L3 and DNASE1 in mice were studied as examples of nucleases. Different nuclease activity affected hypomethylation/hypermethation levels of fragments in certain genomic regions, for example, transcriptional start sites (TSSs). The relative abundance of fragments covering sites that are hypomethylated or hypermethylated can be used to determine a level of a condition (e.g., a disease or disorder) in a subject, a classification of nuclease activity, or a fractional concentration of clinically-relevant DNA molecules in a sample. The relative abundance may be determined based on the number of fragments at certain sites compared to the number of fragments at other sites. For example, more fragments from CpG sites that are hypomethylated may indicate that a condition exists.
A greater number of fragments from certain regions may indicate a level of a condition. Samples from a subject having a condition may have more fragments in certain regions or higher or lower methylation in certain regions. The regions may include open chromatin regions (OCRs), CpG islands (CGIs), or near TSSs. A number of copy number aberrations within certain regions may be used to determine whether a subject has a condition.
The fractional concentration of clinically-relevant DNA can be determined by analyzing fragments from sites that are hypomethylated or hypermethylated. For example, fetal DNA have fewer fragments from methylated CpG sites or from OCRs and CGIs.
Different sizes of cfDNA may be associated with different methylation levels depending on nuclease activity. Certain size fragments may be relatively hypermethylated, while other size fragments may be relatively hypomethylated. Thus, different genomic regions may not be represented evenly in the different sizes of cfDNA with different nuclease activity conditions.
eccDNA can be used to analyzed a biological sample. The size distribution of cfDNA fragments from eccDNA can be used to determine a classification of whether a gene exhibits a genetic disorder, a classification of nuclease activity, or an efficacy of treatment. Certain nuclease deficiencies may result in more longer cfDNA fragments. A parameter value based on the amount of eccDNA in a sample can be used to determine whether a gene exhibits a genetic disorder. Embodiments described herein also include methods for determining a quantity in a mixture of cell-free DNA fragments from eccDNA.
The effect of nucleases on cell-free DNA methylation is described. Cell-free DNA from organisms with nuclease deficiencies were analyzed. Changes in methylation and size profile were observed, including at certain genomic sites or genomic regions. Samples from mice and humans deficient in certain nucleases were studied. Results from example nucleases can be applied to other nucleases based on the cleaving and other characteristics of the other nucleases. Size profile, methylation amount, normalized end density were seen to vary based on nuclease deficiency, including at or near certain genomic sites.
A. Experimental Design and Results
In an example analysis, we performed whole genome bisulfite sequencing of pooled plasma cfDNA and buffy coat genomic DNA from mice deficient in either DNASE1L3 or DNASE1 and their wildtype counterparts to compare their cfDNA profiles (including size and methylation profiles). Similar results were found with samples from human subjects.
In the analysis of cfDNA from nuclease-deficient subjects, the overall CpG methylation percentage of each sample was determined. The method used to determine CpG methylation in this analysis was bisulfite sequencing. Other methods may involve direct electrochemical detection, single-molecule real-time detection, methylated DNA immunoprecipitation, microarray analysis, methylation-specific PCR, or matrix-assisted laser desorption ionization time-of-flight mass spectroscopy.
In bisulfite sequencing, bisulfite is used to convert cytosine into uracil, leaving methylated cytosines (Cs) intact. Subsequent PCR amplification of the modified DNA using methylation-specific and non-specific primer pairs replaces all uracil nucleotides with thymine (T). This produces methylation-specific single-nucleotide alterations that can be identified with sequencing and alignment against a reference sequence. The methylation percentage of a given cytosine in the reference genome is calculated by the sequenced number of C/(C+T) at that given cytosine. The overall methylation of a sample can be calculated using the sequenced portion of all fragments (i.e., read 1 and read 2) and determining the counts of Cs and Ts at each reference C. The methylation percentage may be limited to the Cs in CpG dinucleotides or may be a C followed by any other nucleotides (CHs, where H may be either adenosine, thymine, or cytosine).
Bisulfite sequencing can return a methylation status for each genomic site. The methylation status can be used to determine a methylation density of a region. A site or a region can be determined to be hypomethylated or hypermethylated based on the methylation density. The methylation analysis can be used along with fragment size analysis to determine characteristics of the sample or the subject having a nuclease deficiency.
1. Changes in Plasma Methylation for Nuclease-Deficient Mice
The overall methylation percentage of CpG sites was studied across different genotypes for nuclease deficiency and for different sample types. Additionally, CpG methylation percentage was measured for different genomic regions.
Comparing between the nuclease genotypes, plasma cfDNA from Dnase1l3-deficient mice was strikingly more hypomethylated than plasma cfDNA from WT mice (Wilcoxon rank-sum test, p=0.002). On the other hand, plasma cfDNA from Dnase1-deficient mice was relatively hypermethylated. In contrast to the differing methylation levels in plasma cfDNA, the CpG methylation percentages of genomic DNA between WT, Dnase1l3-deficient mice, and Dnase1-deficient mice were not appreciably different from each other. Altogether, these data suggested that while the methylation levels of DNA inside the buffy coat cells of different genotypes were largely unaffected by DNASE1L3 or DNASE1 deficiency, the apparent methylation of plasma cfDNA was affected by the absence of either one of these nucleases.
2. Effects of Nuclease Deficiency and Methylation on the cfDNA Size Profile
The effect of these different nucleases on the plasma cfDNA size profile has previously been characterized, and the median size profile of each genotype is shown in
Previously, our group also found that hypomethylated cfDNA was shorter than hypermethylated cfDNA in human plasma (Lun et al., 2013). We checked to see if this relationship was still true in the plasma of mice with different nuclease genotypes. We identified cfDNA fragments with at least three CpGs and categorized the fragments with zero out of these CpGs being methylated as 0% methylated fragments and the fragments with all of its CpGs being methylated as 100% methylated fragments. We compared the median size profiles of these 0% methylated fragments and 100% methylated fragments in each of the three genotypes (
Knowing that unmethylated fragments tended to be shorter and that cfDNA from Dnase1l3-deficient mice had more short fragments raised the possibility that cfDNA from Dnase1l3-deficient mice was more hypomethylated solely because of the increase in short fragments. To tease out the relationship between these interrelated factors, we examined the median cfDNA size profile of each genotype within the 0% and the 100% methylated fragments to control for the methylation level (
We then explored the median cfDNA size profile changes in 0% and 100% fragments from Dnase1-deficient mice (blue lines 512 and 524,
In summary, while hypomethylated cfDNA tended to have a shorter size profile than hypermethylated cfDNA, the absence of these nucleases also exerted an independent effect on the cfDNA size profile. The cfDNA from Dnase1-deficient mice revealed that the increased frequency of short ≤150 bp fragments, especially the ultrashort ≤80 bp, in the 0% methylated fragments was associated with DNASE1 activity.
3. The Role of OCR and CGI Fragments in cfDNA Methylation
We next explored the genomic origins of these DNASE1 activity-associated, short, unmethylated fragments in the cfDNA of Dnase1l3-deficient mice. We hypothesized that they might be associated with OCRs and CpG islands (CGIs) since these regions were known to be hypomethylated compared to the genome as a whole. We classified the ±500 bp regions flanking the center of TSS and Pol II regions, and regions with H3K27ac and/or H3K4me3 as OCRs and merged these regions with CGIs.
We observed that these OCR and CGI regions had increased end density in the cfDNA of Dnase1l3-deficient mice (e.g., red line 608) and decreased end density in the cfDNA of Dnase1-deficient mice (e.g., blue line 612) compared to WT (e.g., green line 604). In comparison, the normalized end density in random regions of the genome was similar and overlapping in the cfDNA of WT, Dnase1l3-deficient, and Dnase1-deficient mice (
The percentages of fragments within these selected OCRs and CGIs are shown in
To explore the contribution of these OCR and CGI fragments to the methylation differences of cfDNA from mice of the different nuclease genotypes, we recalculated the overall methylation levels of cfDNA in each of the genotypes after bioinformatically masking these fragments from the OCRs and CGIs (
4. Differential Methylation Levels and OCR and CGI Proportions by Fragment Size
We then analyzed the methylation level of cfDNA by fragment size. For all the fragments of a particular size, the CpG methylation percentage was calculated and the median of each genotype was plotted in
The troughs in cfDNA methylation percentage were around fragment sizes 270 bp and 460 bp. These troughs in cfDNA methylation corresponded to higher proportions of OCR and CGI fragments for all genotypes (
To tease out the relationship between the methylation level and OCR and CGI fragment proportion among different fragment sizes, we bioinformatically masked the OCR and CGI fragments and replotted the CpG methylation level of each fragment size after masking (
In
After masking, while the periodic pattern persisted, the peak-trough difference decreased and the methylation percentage of all fragment sizes increased for all genotypes (
Comparing between the nuclease genotypes in
In contrast, the relative hypermethylation of cfDNA from Dnase1-deficient mice occurred only in certain size ranges, most obviously around the 166 bp and 360 bp methylation peaks (
5. DNASE1L3 Cuts Methylated CpGs
While we had demonstrated that DNASE1 activity in the hypomethylated OCRs and CGIs was a major contributor to the relative hypomethylation in cfDNA of Dnase1l3-deficient mice, DNASE1 activity appeared to be only part of the whole picture. Even after masking the OCR and CGI fragments, the relative hypomethylation of cfDNA from Dnase1l3-deficient mice compared with cfDNA from WT mice persisted (Wilcoxon rank-sum test, p=0.008) (
We devised a method to interrogate whether, or not, DNASE1L3 could cut methylated CpGs. To do this, we first identified methylated and unmethylated CpGs. From a downloaded dataset comprised of bisulfite sequencing of eight different mouse tissues (bone marrow, thymus, spleen, kidney, heart, liver, large intestines, small intestines) with two replicates each, we mined for CpGs that were methylated in 90% of all tissue and replicate reads and identified them as putatively methylated CpGs (545,720 CpGs in total). Similarly, we also mined for CpGs that were unmethylated in 80% of the reads in the dataset and identified them as putatively unmethylated CpGs (7,140 CpGs in total). Using CpGs unmethylated in 90% of reads for subsequent analysis was more difficult due to the extremely low number of CpGs that fulfilled this condition (11 CpGs in total). With these putatively methylated and unmethylated CpGs identified from this downloaded dataset, we confirmed that the actual methylation level of these CpGs in our plasma dataset would be similar to its expected methylation level. For putatively methylated CpGs, these CpGs had a >90% methylation level in the plasma cfDNA of each sample, and for the putatively unmethylated CpGs, these CpGs had a <20% methylation level in our plasma cfDNA of each sample (
With these putatively methylated and unmethylated CpGs identified, we calculated the normalized end density over these CpGs and its surrounding region. When aggregated together placing the putatively methylated C at position 0, there was an end density pattern over the surrounding ±1000 bp that was strongly periodic, reminiscent of the nucleosomal array found around CTCF regions (
On the other hand, the putatively unmethylated CpGs appeared to originate from very different genomic regions compared with the putatively methylated CpGs. The surrounding region of the putatively unmethylated CpGs demonstrated a generalized increase in normalized end density in the −400 to +400 bp regions around the putatively unmethylated CpGs (
6. DNASE1L3-Deficient Human Subjects
To extrapolate our findings to human cfDNA, we performed bisulfite sequencing of plasma samples from three DNASE1L3-deficient subjects (H2, H4, and V11) and one heterozygous parent (H1) (Chan et al., 2020). Similar to Dnase1l3-deficient mice, the plasma cfDNA of DNASE1L3-deficient subjects was hypomethylated compared to both controls and the heterozygous parent (CpG methylation of DNASE1L3-deficient subjects H2: 69.66%, H4: 70.1%, and V11: 69.32%, vs. median of 8 controls: 74.90%, and H1: 73.84%) (
Similarly, the plasma cfDNA of DNASE1L3-deficient patients had a shorter size profile that is more exaggerated in 0% methylated fragments than in 100% methylated fragments (
The shorter size profile corresponds to an increase in normalized end density in hypomethylated open chromatin TSS regions (
The cutting preference of DNASE1L3 was also demonstrated in human cfDNA. Control plasma cfDNA was found to end preferentially at putatively methylated CpGs (
B. Size Profile and Methylation Changes
In this work, we have discovered that different nuclease deficiencies profoundly affect the apparent methylation level and size profile of plasma cfDNA on a genome-wide level. We have found that the plasma cfDNA of Dnase1l3-deficient mice and DNASE1L3-deficient humans is much more hypomethylated than cfDNA from control samples and has a shorter size profile with an increase in short ≤150 bp fragments and a decrease in 166 bp fragments. This is in contrast to the cfDNA of Dnase1-deficient mice, which is slightly more hypermethylated than WT cfDNA and has a slightly longer size profile with a decrease in short ≤150 bp fragments and an increase in 166 bp fragments. Since the methylation levels of the buffy coat genomic DNA are similar among the different genotypes, the differences in plasma cfDNA methylation are likely related to the nuclease activities during the DNA fragmentation process.
In our exploration of the cause of hypomethylation and hypermethylation in the plasma cfDNA of Dnase1l3-deficient and Dnase1-deficient mice, respectively, we found that cfDNA from Dnase1l3-deficient mice had more hypomethylated fragments originating from increased fragmentation of open chromatin regions and CpG islands across the whole genome. The reduction of these fragments in cfDNA of Dnase1-deficient mice revealed the culprit to be DNASE1. The absence of DNASE1 activity in Dnase1-deficient mice allowed us to deduce that DNASE1 increased the fragmentation of these OCRs and CGIs and gave rise to an increased proportion of short fragments, especially ultrashort fragments, in these regions. This understanding of DNASE1 activity is consistent with the whole field and technology of using DNASE1 to probe DNase I hypersensitivity regions in DNASE-seq (Boyle et al., 2008).
Bioinformatically masking these OCR and CGI fragments demonstrated that these regions were a major contributor to the relative hypomethylation seen in the plasma cfDNA of Dnase1l3-deficient mice. Furthermore, we found that these OCR and CGI fragments were relatively enriched in plasma cfDNA, generally, and that this enrichment explained the relative hypomethylation of plasma cfDNA compared to its genomic DNA (
The cfDNA size profile actually changes most dramatically in the absence of DNASE1L3. Our analysis with the putatively methylated and unmethylated CpGs shed some light on the reason. We demonstrated that the absence of DNASE1L3 decreased cuts at methylated CpGs. This is supported by existing literature showing that DNASE1L3 can cleave chromatin with high efficiency to almost undetectable levels without proteolytic help (Sisirak et al., 2016; Napirei et al., 2009).
Since the genome is >97% heterochromatin with most of its CpGs methylated, most of the genome is susceptible to DNASE1L3 activity but less so to DNASE1. Thus, it is not surprising that the absence of DNASE1L3 would markedly affect the cfDNA size profile. One of the more noticeable changes of the cfDNA size profile in cfDNA from Dnase1l3-deficient mice is the diminished prominence of the 166 bp peak. We hypothesize that the 166 bp fragment size may be produced by the relatively strong local preference for cutting these methylated Cs by DNASE1L3 in the linker regions of chromatin. It is striking to note that in the absence of DNASE1L3, two new fragment end preferences appear that are exactly 10 bp away from each other. This may also account for the increased prominence of the 10 bp periodicity in the cfDNA of Dnase1l3-deficient mice.
In fact, this preference by DNASE1L3 for creating 166 bp fragments is apparent in cfDNA from Dnase1-deficient mice. In such mice, both 0% and 100% methylated cfDNA were fragmented to a very similar size profile with a very sharp 166 bp peak and exhibited remarkably limited shortening of unmethylated fragments. Thus, in the absence of DNASE1, DNASE1L3 appears to have limited preference to cut unmethylated fragments into smaller fragments. In fact, the end density over the putatively unmethylated CpGs decreased in the cfDNA of Dnase1-deficient mice. These results suggest that DNASE1L3 cuts largely agnostically to DNA methylation status, which would increase the methylated portion of plasma cfDNA since methylated CpGs are more abundant than unmethylated CpGs in the genome (
This work also reveals that different sizes of cfDNA are associated with different methylation levels. cfDNA fragments with sizes that are widely presumed to be associated with mono-, di-, and tri-nucleosomes (around 170 bp, 360 bp, and 550 bp) are relatively hypermethylated, while fragments with intermediary sizes (around 270 bp and 460 bp) are relatively hypomethylated. Masking the OCR and CGI fragments demonstrated that the hypomethylation was disproportionally affected in fragments <=80 bp and around the troughs for all three genotypes. These fragment sizes actually have a higher proportion of OCR and CGI fragments and may reflect more the activity of DNASE1. We have thus demonstrated that different genomic regions are not represented evenly in the different sizes of cfDNA.
Examining the differences in the methylation level of each cfDNA size between the genotypes reveals that DNASE1L3 plays a role as well. DNASE1L3, which can cut methylated CpGs, appeared to give rise to more 166 bp fragments that are methylated in the cfDNA of Dnase1-deficient mice. Mono-nucleosomally sized fragments in the cfDNA of Dnase1-deficient mice appear to be the most methylated with the methylation level decreasing with each additional nucleosome, suggesting that DNASE1L3 contribution of methylated fragments is highest for mononucleosomes (
In this paper, we have been able to deduce the actions and preferences of DNASE1 and DNASE1L3. We have shown not only that nucleases affect the apparent cfDNA methylation level, but also how each nuclease affects it. We have also demonstrated that the cfDNA size profile, which is quintessentially the end product of the fragmentation process, reflects these differential nuclease activities on methylation. Thus, we have shed some light into these fundamental properties of cfDNA.
These findings have been replicated in human cfDNA with DNASE1L3-deficiency. Homozygous DNASE1L3-deficiency in humans results in familial autosomal recessive forms of childhood systemic lupus erythematosus (SLE) and vasculitis (Al-Mayouf et al., 2011; Ozcakar et al., 2013; Carbonella et al., 2017). The loss of DNA self-tolerance with Dnase1l3 deletion is presumably related to the disrupted clearance of nucleosomes by DNASE1L3 (Sisirak et al., 2016; Napirei et al., 2000). Even in SLE patients that do not have Dnase1l3-deficiency, we have previously found that they have an increased proportion of short, hypomethylated cfDNA similar to the profile seen in Dnase1l3-deficient patients (Chan et al., 2014). This may be related to a functional aberration in nuclease clearance of nucleosomes; more studies would help clarify the relationship between nuclease activity and the pathogenesis of SLE.
These observations have profound implications for the field of cfDNA. The fragmentation process of cfDNA contributes to the apparent methylation of cfDNA. The nuclease activity in a person could affect the overall cfDNA methylation and result in a false positive testing. Since certain fragment sizes have different methylation levels reflecting different proportions of different genomic regions, it may be advantageous to focus diagnostic testing on certain fragment sizes because of this underlying biology. As cfDNA fragmentomics are an emerging cancer biomarker, a deeper understanding of nuclease effect on cfDNA fragmentation is vital. Ultimately, the combination of size-based and nuclease-based analysis is a powerful approach for investigating cfDNA biology and may have diagnostic applications.
C. Using Methylation in Regions to Analyze Samples
Analyzing certain biological samples using only the methylation level may be difficult. For example, methylation level differences may not be significantly different between samples of subjects having different conditions. The amount of fragments in open chromatin regions (OCRs) and around certain CpG sites in a biological sample may vary depending on certain conditions of the subject. For example, the amount of fragments in OCRs and around CpG sites may differ depending on a cancer classification of the subject in the sample or whether the fragments in a samples are maternal or fetal. Analyzing CpG sites that are putatively methylated or putatively unmethylated may aid in analyzing biological samples to distinguish between different conditions or different tissue types.
1. Cancer
Measuring the proportion of fragments in OCR (500 bp upstream and downstream of TSS, H3K27ac, and H3K4me3 markers) and CGI regions results in statistically significant difference comparing cancer and non-cancer. In one embodiment, plasma cfDNA from 8 healthy controls, 17 patients infected with chronic hepatitis B virus (HBV), and 34 patients with HCC were bisulfite sequenced with a median of 38 million paired-end reads (range, 18-65 million).
On the other hand, in
2. Fetal Vs Maternal-Specific Fragments
3. SLE
Autoimmune disease occurs when the body's immune system loses the self-tolerance and mistakenly attacks the cells or tissues of the body itself. Systemic lupus erythematosus (SLE), in particular, is characterized by autoantibodies to double-stranded DNA (dsDNA). Levels of anti-DNA autoantibodies are correlated with disease activity, and the deposition of immune complexes formed by DNA and anti-DNA autoantibodies are associated with the development of lupus nephritis (Soni et al. Current Opinion in Immunology, 2018; 55:31-37). Previously, we have observed that the plasma of SLE patients have an increased proportion of short cfDNA, and high resolution analysis on the genomic and epigenetic signatures of plasma DNA has been shown to reflect disease activities of SLE patients (Chan et al. Proc. Natl. Acad. Sci USA 111, E5302-E5311). Plasma cfDNA of patients with SLE may show aberrant genomic representations (copy number changes) that may mimic that of patients with cancer. In the following, we show an example active SLE case with aberrant genomic representation and how analysis using OCR can be useful in distinguishing these aberrant genomic representations from those with cancer.
In SLE, especially active SLE, aberrant genomic representation across the genome is observable in cfDNA (
The measured genomic representation (MGR) is shown in
This is in contrast to
4. Coverage of Putatively Methylated or Unmethylated CpGs
The end density at putatively methylated or unmethylated CpGs was previously shown to demonstrate the cutting preference of specific nucleases at putatively methylated or unmethylated CpGs. Putatively methylated or unmethylated CpGs were identified from 9 human tissues that underwent whole genome bisulfite sequencing as part of the Roadmap Epigenomics Project. CpG sites what were methylated in ≥90% of all fragments in all tissues were considered putatively methylated CpGs, and CpGs that were methylated in ≤20% of all fragments in all tissues were considered putatively unmethylated CpGs. Fragments overlapping with either the C or G of the identified CpGs were considered as covering the CpG and included in calculating the coverage.
D. Example Methods
Methods may include using methylation statuses or levels in different ways to analyze a sample. Methylation levels may be determined from only particular sites. For example, the sites may include CpG sites that are all methylated or unmethylated and may include or exclude certain regions. The relative abundance of sequence reads from certain sites that are all hypomethylated or all hypermethylated can be used to analyze a sample. Relative abundance of sequence reads in a sample can be used to diagnose a subject or to determined the fractional concentration of clinically-relevant DNA in the sample. Additionally, copy number aberrations determined in regions excluding open chromatin regions and copy number aberrations determined using other regions may be used to determine whether a subject has a condition. A condition detected by any method may be treated in the subject by any treatment described herein (e.g., Section III).
1. Methylation Level to Analyze Biological Samples
The methylation levels determined using methylation statuses at certain sites can be used to determine various characteristics of a biological sample or the subject from which the biological sample is obtained. The certain sites used may be CpG sites that are all methylated or all unmethylated. The sites may include or exclude sites in OCRs or CGIs. The methylation levels may used to detect a genetic disorder for a gene associated with a nuclease, to determine an efficacy of a treatment for a blood disorder, or monitoring nuclease activity. Detecting a genetic disorder for a gene associated with a nuclease
At block 3902, first sequence reads obtained from sequencing first cell-free DNA fragments in a first biological sample of a subject are received. The sequencing may be performed in various ways, e.g., as described herein. The sequencing may be targeted sequencing as described herein. For example, biological sample can be enriched for DNA fragments from a particular region. The enriching can include using capture probes that bind to a portion of, or an entire genome, e.g., as defined by a reference genome.
The sequence reads may indicate methylation statuses at sites of the cell-free DNA fragments. For example, the methylation status at sites of the cfDNA fragments can be interrogated using bisulfate conversion, as described herein. Apart from bisulfite conversion, other processes known to those skilled in the art can be used to interrogate the methylation status of DNA molecules, including, but not limited to enzymes sensitive to the methylation status (e.g. methylation-sensitive restriction enzymes), methylation binding proteins, single molecule sequencing using a platform sensitive to the methylation status (e.g. nanopore sequencing (Schreiber et al. Proc Natl Acad Sci 2013; 110: 18910-18915) and by the Pacific Biosciences single molecule real time analysis (Tse et al. Proc Natl Acad Sci USA 2021; 118: e2019768118).
At block 3904, the methylation statuses of the sequence reads are used to determine a methylation level of the cell-free DNA fragments. The methylation level may be determined using all of the sequence reads or just certain ones that satisfy certain criteria, e.g., location or size. The methylation level may be determined using sequence reads at a plurality of sites. The sites may have specific characteristics, e.g., being CpG sites. The methylation level can be determined from the total number of cytosines not converted after bisulfite treatment (which corresponds to methylated cytosine). For example, at CpG sites methylated sites can be determined as a proportion of all CpG sites covered by sequence reads mapped to a region of interest (e.g., a 100-kb region). This analysis can also be performed for regions with other bin sizes, e.g. 500 bp, 5 kb, 10 kb, 50-kb or 1-Mb, etc. A region could be the entire genome or a chromosome or part of a chromosome (e.g. a chromosomal arm).
At block 3906, the methylation level of the cell-free DNA fragments are compared to a reference value to determine a classification of whether the gene exhibits the genetic disorder in the subject. The reference value may comprise or be used to determine a cutoff or a threshold value. The cutoff or threshold may be derived from a reference value that is representative of a particular classification or discriminates between two or more classifications. A subject with methylation levels above or below the cutoff (threshold) value may be classified as carrying a genetic disorder. The cutoff value may be defined by a statistical metric (e.g., significance, P-value, Z-score) relative to a reference value, e.g., so that the methylation level is statistically different. Alternatively, calibration value(s) may be used as the reference value. For example, methylation level of cfDNA in a calibration sample (whose classification is known) can be used to determine the classification of whether the gene exhibits the genetic disorder in the subject based on the methylation level of the cell-free DNA fragments. The calibration sample may have known methylation levels at certain positions, regions, or the entire genome, as well as having a known classification.
The reference value may be determined in various ways, as will be appreciated by the skilled person. For example, a reference value may be determined from a wildtype animal or a healthy human subject. A reference value may be determined from a tissue-specific sample or a portion of a sample obtained from the same subject (e.g., sequence reads obtained from plasma but at a different time or masked (e.g., for OCR or CGI) or a buffy coat portion of a sample), as shown, for example, in
The reference value may comprise a plurality of cutoffs or threshold values. Methylation levels may fall between two cutoffs or threshold values, denoting a subtype of the genetic disorder or a level of progression of the genetic disorder. For example, methylation level for two or more different cohorts of subjects with different known classifications can be determined, and a reference value can be selected as representative of one classification (e.g., a mean) or a value that is between two clusters of the metrics (e.g., chosen to obtain a desired sensitivity and specificity).
A presence of a genetic disorder or an extent of such disorder in the subject, based on methylation level of the cell-free DNA fragments comparison to a reference value, may be determined using statistical approaches or machine learning methods for example but not limited to, including logistic regression, support vector machines (SVM), decision tree, CART algorithm (Classification and Regression Trees), naïve Bayes classification, clustering algorithm, principal component analysis, singular value decomposition (SVD), t-distributed stochastic neighbor embedding (tSNE), artificial neural network, ensemble methods which construct a set of classifiers and then classify new data points by taking a weighted vote of their prediction, etc.
In some implementations, the cell-free DNA fragments are filtered before determining the methylation level(s) or the classification. for example, only fragments from a certain region (e.g., transcription start sites, RNA polymerase II sites, H3K4me3 marker regions, H3K27ac marker regions, or random regions) may be used to determine the methylation level or consequently the classification of the genetic disorder in the subject.
a) Determining an Efficacy of Treatment for a Blood Disorder
At block 4002, sequence reads obtained from sequencing cell-free DNA fragments in a blood sample of the subject are received. The blood sample is obtained after the subject that was administered a first dosage of an anticoagulant. The anticoagulant can be heparin. The sequence reads may indicate methylation statuses at sites of the cell-free DNA fragments. The blood sample can be obtained from the subject prior to receiving the sequence reads. Consequently, sequencing of the cell-free DNA fragments in the blood sample can be performed to obtain the sequence reads.
At block 4004, the methylation statuses of the sequence reads are used to determine a methylation level of the cell-free DNA fragments. The methylation level may be determined using sequence reads at a plurality of sites. The methylation level may be determined for cell-free DNA fragments that have all CpG sites methylated or unmethylated (e.g., as shown in
At block 4006, the methylation level of the cell-free DNA fragments is compared to a reference value to determine a classification of the efficacy of the treatment. A second dosage of the anticoagulant can be administered to the subject based on the comparison, the second dosage being greater than the first dosage. In other examples, the second dosage can be less than the first dosage, e.g., if the amount overshoots the reference value. Treatments may include hemodialysis, a kidney transplant, or any treatment described herein.
In some embodiments, genomic sites that are located in open chromatin regions or in CpG islands are excluded when determining the methylation level (e.g., as shown in
The reference value can correspond to a measurement previously performed in the subject before administering the anticoagulant. The change in the amount from the previous measurement can indicate an efficacy of the dosage of the anticoagulant. In another implementation, the reference value can correspond to the amount measured in a healthy subject. An efficacious dosage can be one that brings the amount to within a threshold of the reference value for the healthy subject. In yet another implementation, the reference value can correspond to the amount measured in a subject that has the blood disorder (e.g., as may be previously measured in the subject before administering the anticoagulant). For example, a reference value may comprise a wildtype animal or a healthy human subject. A reference value may comprise a tissue specific sample or a portion of a sample obtained from the same subject (e.g., sequence reads obtained from plasma or buffy coat portion of a sample), as shown, for example, in
b) Monitoring Nuclease Activity
At block 4102, sequence reads of cell-free DNA fragments in the biological sample of the subject may be received. The sequence reads may indicate methylation statuses at sites of the cell-free DNA fragments. Receiving may be similar to block 3902 or 4002.
At block 4104, the methylation statuses of the sequence reads are used to determine a methylation level of the cell-free DNA fragments. The methylation level may be determined using sequence reads at a plurality of sites. The methylation level may be determined for cell-free DNA fragments that have all CpG sites methylated or unmethylated (e.g., as shown in
At block 4106, the methylation level of the cell-free DNA fragments is compared to a reference value to determine a classification of the activity of the nuclease (e.g., as shown in
2. Region-Specific Sequence Read Quantification
The proportion of sequence reads from a certain set of CpG sites can be used to analyze a biological sample. The certain set of CpG sites may be CpG sites that are all hypomethylated or hypermethylated in a reference genome. The relative abundance of sequence reads covering these particular sites may differ for samples from subjects with different levels of a condition and for samples from subjects with different nuclease activities.
a) Differentiating Genotypes and Phenotypes
At block 4202, a first set of CpG sites that are all hypomethylated or all hypermethylated in a reference genome can be identified. The reference genome may be obtained from a healthy individual or a population of healthy individuals. The reference genome may contain regions that are predominantly (or putatively) unmethylated or predominantly methylated. The methylation level in the reference genome may be compared to a first threshold. A methylation level below the first threshold may indicate a hypomethylated site (e.g., in FIG. 15A, the first threshold may be set below the median methylation level for wildtype). In other embodiments, the methylation level in the reference genome may be compared to a second threshold. A methylation level above the second threshold may indicate a hypermethylated site (e.g., in
At block 4204, sequence reads obtained from sequencing cell-free DNA fragments in the biological sample of the subject are received. Block 4204 may be performed in a similar manner as described in block 3902.
At block 4206, the sequence reads are aligned to the reference genome to determine genomic positions in the reference genome corresponding to the cell-free DNA fragments. For example, a sequence read of the entire DNA fragment (or a pair of reads from the ends, or just one read from one end) can be aligned to the reference genome (e.g., hg19 or other reference) using any one of various alignment tools, such as BLAST, BOWTIE, or SOAP. As part of the alignment, a coordinate for at least one end of the DNA fragment can be determined. In this manner, the coverage (number of reads/fragments) can be determined for just end positions or for any position covered by a DNA fragment. Accordingly, a genomic position in the reference genome may correspond to one end of one or more cfDNA fragments or to other parts of one or more cfDNA fragments.
At block 4208, a relative abundance of the cell-free DNA fragments covering the first set of CpG sites is determined by using the aligned sequence reads. The relative abundance may be determined in various ways. For example, the relative abundance may comprise a percentage of putatively methylated (or hypermethylated) or unmethylated (or hypomethylated) CpG sites in the reference genome that may be covered by cfDNA fragments (e.g., as shown in
At block 4210, the relative abundance is then compared to a reference value to determine a level of a condition of the subject. A reference value may comprise the level of abundance determined using sequence reads from a biological sample from a healthy individual. The relative abundance of cfDNA fragments that cover positions that map to a CpG site in the first set of CpG sites may be different (e.g., significantly lower, or significantly higher) than the reference value for relative abundance. The observed difference may be used to determine a level or classification of a condition of the subject. The condition may comprise enzyme deficiencies. The condition may comprise a cancer (e.g., as shown in
b) Monitoring Nuclease Activity
At block 4302, a first set of CpG sites that are all hypomethylated or all hypermethylated in a reference genome can be identified. The reference genome may be obtained from a healthy individual or a population of healthy individuals. The reference genome may contain CpG sites that are predominantly unmethylated or predominantly methylated. An individual who is not carrying a genetic disorder or a disease of interest may be considered a healthy individual. Block 4302 may be performed in a similar manner as block 4202.
At block 4304, sequence reads obtained from sequencing cell-free DNA fragments in the biological sample of the subject are received.
At block 4306, the sequence reads are aligned to the reference genome to determine genomic positions in the reference genome corresponding to the cell-free DNA fragments. Block 4306 may be performed in a similar manner as block 4206.
At block 4308, a relative abundance of the cell-free DNA fragments covering the first set of CpG sites is determined by using the aligned sequence reads, as described in block 4208. The relative abundance may be a number of DNA molecules covering the first set of CpG sites normalized against the number of DNA molecules analyzed.
In another example, the relative abundance may be a percentage of cfDNA fragments covering a site (e.g., OCRs or CGIs) as shown, for example, in
The relative abundance may be an end density of sequence reads covering the first set of CpG sites. The biological sample obtained from the subject may contain greater or less cfDNA fragments that are enzymatically cleaved at CpG sites that are predominantly hypermethylated, hypomethylated, or unmethylated.
The relative abundance of the cell-free DNA fragments may be determined for the cell-free DNA fragments having a specified size (e.g., as shown in
At block 4310, the relative abundance is then compared to a reference value to determine a first classification of an activity of the enzyme. A reference value may comprise the level of abundance determined using sequence reads from a biological sample from a healthy individual. Activity of an enzyme (e.g., a nuclease such as DFFB, DNASE1, or DNASE1L3) can be classified and used to determine a genetic disorder or a change in a genetic disorder for the efficacy of a treatment as describe in 4104. The level of the condition may be determined based at least in part on a genetic disorder, where the gene is associated with a nuclease (e.g., as shown in
3. Fractional Concentration
At block 4402, a first set of CpG sites that are all hypomethylated or all hypermethylated in a reference genome can be identified. The reference genome may be obtained from a healthy individual or a population of healthy individuals and/or nonpregnant subjects. The reference genome may contain positions or regions that are predominantly unmethylated or predominantly methylated. An individual who is not carrying a genetic disorder or a disease of interest may be considered a healthy individual. The sites may be identified as described with block 4202.
At block 4404, sequence reads obtained from sequencing cell-free DNA fragments in the biological sample of the subject are received.
At block 4406, the sequence reads are aligned to the reference genome to determine genomic positions in the reference genome corresponding to the cell-free DNA fragments. Block 4406 may be performed in a similar manner as block 4206.
At block 4408, a relative abundance of the cell-free DNA fragments covering the first set of CpG sites is determined by using the aligned sequence reads. The relative abundance may comprise a percentage of fragments covering putatively methylated or unmethylated sites (e.g., OCR or CpG (or CGI) sites) (e.g., as shown in
At block 4410, a fractional concentration of the clinically-relevant DNA molecules in the biological sample can be estimated by comparing the relative abundance to one or more calibration values determined from one or more calibration samples whose fractional concentration of the clinically-relevant DNA molecules are known. As shown in
Calibration data points can include a relative abundance and a measured/known fraction of the clinically-relevant DNA. The comparison can involve comparing to a calibration curve (composed of the calibration data points), and thus the comparison can identify the point on the curve having the measured relative abundance for the test sample. The fractional concentration corresponding to the identified point can then be used to estimate the fractional concentration. For example, the relative abundance can be provided as an input to the calibration function (e.g., a linear or non-linear fit) to obtain an output of the fractional concentration.
4. Determining a Condition
At block 4502, sequence reads obtained from sequencing cell-free DNA fragments in a blood sample of the subject are received. The biological sample (e.g., whole blood, plasma, etc.), as described herein, can be obtained from the subject and a sequencing of the cell-free DNA fragments in the sample can be performed to obtain sequence reads.
At block 4504, genomic positions in the reference genome corresponding to at least one end of the cell-free DNA fragments are determined using the sequence reads.
At block 4506, a first amount of sequence reads in each segment of a plurality of segments are determined. For example, the reference genome may be divided to segments (or bins) with a specific size. As examples, the segment or bin size may be about 10 kb, 50 kb, 100 kb, 500 kb, 1 Mb or more. The segment size may be between any two sizes mentioned here. The frequency or copy numbers of sequence reads corresponding to a segment may be determined. In other embodiments, the amount can be a statistical value of a size distribution of the DNA fragments for that segment. Other properties can also be used, e.g., a methylation level for the region.
At block 4508, the first amount is compared to a first reference value to determine whether the segment has a copy number aberration. The reference value may comprise an amount of sequence reads obtained from a healthy individual that would correspond to each segment. In a segment, a difference between the amount of sequence reads from a sample obtained from the subject and the reference value may denote a copy number aberration. For example, measured genomic representation (e.g., as shown in
At block 4510, a first number of segments that have copy number aberrations are determined. For a plurality of segments, the processes mentioned in blocks 4506 and 4508 can be repeated to determine a first number of segments that have copy number aberrations.
At block 4512, a second amount of sequence reads in each masked segment of a plurality of segments that are masked to exclude open chromatin regions. The masking may comprise in silico masking (e.g., using bioinformatic methods to exclude a segment). Certain regions can be masked, e.g., as shown in
At block 4514, the second amount is compared to a second reference value to determine whether the masked segment has a copy number aberration. The second reference value may comprise an amount of sequence reads obtained from a healthy individual that are in the masked segment. A difference in the amount of reads for a masked segment obtained from the subject and the reference value (for the same segment) can be used to determine a copy number aberration.
At block 4516, steps described in blocks 4510 and 4512 can be repeated for one or more masked segments to determine a second number of masked segments that have copy number of aberrations. Accordingly, a measured genomic representation before and after masking, described herein (e.g., as shown in
At block 4518, a condition of a subject is determined based at least on the first number and the second number. The first number and the second number can be used in a variety of ways. For example, a difference between the two numbers and/or an analysis of the individual numbers can be performed. For example, an initial classification can be made that a condition exists using the first number (e.g., that an auto-immune or a cancer exists), e.g., using a cutoff of a few percentage of gins (e.g., 3 or 5%). Then, the second number can be used to determine the specific type of condition, e.g., whether it an auto-immune or cancer, e.g., a cutoff of about 25% can distinguish between SLE and cancer. Thus, the condition may be an auto-immune disease, e.g., SLE. For example, a percentage of bins with aberrant MGRs determined before masking (as described at blocks 4504-4510) and after masking (as described at blocks 4512-4516) can be used to determine a condition (e.g., SLE, or HCC) in a subject.
Cell-free DNA (cfDNA) molecules are present in plasma either in linear or circular forms (T. Paulsen, P. Kumar, M. M. Koseoglu, A. Dutta, Discoveries of Extrachromosomal Circles of DNA in Normal and Tumor Cells. Trends Genet. 34, 270-278 (2018), Y. M. D. Lo, D. S. C. Han, P. Jiang, R. W. K. Chiu, Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science (80-. ). 372, eaaw3616 (2021)). A subset of mitochondrial genomes of certain tissues-of-origin exhibited as circular forms in plasma (M.-J. L. Ma, et al., Topologic Analysis of Plasma Mitochondrial DNA Reveals the Coexistence of Both Linear and Circular Molecules. Clin. Chem., clinchem. 2019.308122 (2019)). Additionally, cell-free extrachromosomal circular DNA (eccDNA) was also detectable in plasma of pregnant women, normal subjects and patients with cancer, albeit of lower abundance than their linear counterparts (S. T. K. Sin, et al., Identification and characterization of extrachromosomal circular DNA in maternal plasma. Proc. Natl. Acad. Sci. U.S.A 117, 1658-1665 (2020), J. Zhu, et al., Molecular characterization of cell-free eccDNAs in human plasma. Sci. Rep. 7, 10968 (2017), P. Kumar, et al., Normal and Cancerous Tissues Release Extrachromosomal Circular DNA (eccDNA) into the Circulation. Mol. Cancer Res. 15, 1197-1205 (2017)).
In contrast to the linear cfDNA with one predominant peak at 166-bp, size profiles of eccDNA in plasma exhibited two major peak clusters with summits at around 202- and 338-bp and sharp 10-bp periodicities within both clusters, reflecting possible involvements of nucleosomal structures (Sin et al., PNAS (2020)). Fetal-specific eccDNA molecules were detectable in the maternal plasma of pregnant women, which were shorter and less methylated than the maternal ones (Sin et al., PNAS (2020)), S. T. K. Sin, et al., Characteristics of Fetal Extrachromosomal Circular DNA in Maternal Plasma: Methylation Status and Clearance. Clin. Chem. 67 (2021)). Therefore, the biological properties of eccDNA molecules might be dependent on their tissues-of-origin.
The fragmentation of linear cfDNA is a non-random process. Multiple lines of evidence suggested that such fragmentation patterns could be linked to the activity of various nucleases (D. S. C. Han, Y. M. D. Lo, The Nexus of cfDNA and Nuclease Biology. Trends Genet. 0 (2021)). For instance, it has been reported that deoxyribonuclease 1 like 3 (DNASE1L3) contributes to cell-free DNA fragmentation and preferentially generates fragments with CC-ends both in mouse and in human (Serpas et al., PNAS (2019), R. W. Y. Chan, et al., Plasma DNA Profile Associated with DNASE1L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction. Am. J. Hum. Genet., 1-13 (2020)). Han et al. systematically studied the effects of deoxyribonuclease 1 (DNASE1), DNASE1L3, and DNA fragmentation factor subunit beta (DFFB) on cell-free DNA fragmentation and found that these enzymes act on DNA degradation during cell apoptosis in a stepwise manner (D. S. C. Han, et al., The Biology of Cell-free DNA Fragmentation and the Roles of DNASE1, DNASE1L3, and DFFB. Am. J. Hum. Genet. 106, 202-214 (2020)). In addition, fragmented cfDNA was undetectable in mice with double deletion of Dnase1l3 and Dffb (T. Watanabe, S. Takada, R. Mizuta, Cell-free DNA in blood circulation is generated by DNase1L3 and caspase-activated DNase. Biochem. Biophys. Res. Commun. 516, 790-795 (2019)). It is therefore important to find out if certain nucleases might also play roles in the generation and/or degradation of eccDNA in plasma.
Herein, knockout mouse models were used to explore whether nucleases such as DNASE1L3 and DNASE1 would affect the biological properties of plasma eccDNA. By comparing the extents of eccDNA size shifts between plasma and tissue eccDNA in mice deficient in either nuclease, the ability of these nucleases to act on eccDNA in intracellular or extracellular manners were analyzed. Furthermore, by applying a mouse pregnancy model, the effects of extracellular DNASE1L3 on cell-free eccDNA were examined. Further evidence of nuclease effects on eccDNA in human was provided by comparing the cell-free eccDNA profiles between DNASE1L3-mutated patients and healthy controls.
Overall, Dnase1l3 deletion can lengthen eccDNA in plasma. EccDNA size profiles of mouse tissues were seemingly not affected by Dnase1l3 deletion, suggesting that the size alterations of cell-free eccDNA by DNASE1L3 can be related to the degradation instead of the generation of eccDNA. Such mechanistic insight was further highlighted by data from a mouse pregnancy model that the extracellular DNASE1L3 released by the fetuses can digest maternal cell-free eccDNA. Notably, human subjects with DNASE1L3 deficiency exhibited longer size distributions than healthy controls, which was consistent with the effects of Dnase1l3 deficiency in mice. The methods provided herein can use cell-free eccDNA as biomarkers for DNASE1L3 deficiency-related diseases, such as systemic lupus erythematosus and certain types of cancer. The experimental design, materials and methods, and results are described in more details herein.
A. Experimental Design and Results
Knockout mouse models were employed to investigate the effects of deoxyribonuclease 1 (DNASE1) and deoxyribonuclease 1 like 3 (DNASE1L3) on the characteristics of plasma extrachromosomal circular DNA (eccDNA). The plasma eccDNA counts were found to be elevated in Dnase1l3−/− mice when compared to wild-type mice, with no significant change observed in Dnase1−/− mice. The cell-free eccDNA in Dnase1l3−/− mice exhibited larger size distributions than those of wild-type mice. Notably, such size alterations were not found in tissue eccDNA of either Dnase1−/− or Dnase1l3−/− mice. These data suggest that DNASE1L3 may digest cell-free eccDNA extracellularly. Intracellular eccDNA may be digested at a lower rate than cell-free eccDNA. This may be partially due to accessibility of the extracellular eccDNA compared to the intracellular eccDNA. Profiling plasma eccDNA in a mouse pregnancy model showed that in Dnase1l3−/− mice pregnant with Dnase1l3+/− fetuses, the eccDNA in the maternal plasma was shortened compared with that of Dnase1l3−/− mice carrying Dnase1l3−/− fetuses. Therefore, DNASE1L3 released from the Dnase1l3+/− fetuses into the maternal blood circulation can have systemic activity. This pregnancy model highlighted that circulating DNASE1L3 could degrade the maternal eccDNA molecules in a cell-extrinsic manner. Furthermore, plasma eccDNA in human subjects with DNASE1L3 loss-of-function mutations also exhibited longer size distributions compared to that of control subjects (e.g., healthy individuals).
1. Study Design
2. Amounts of eccDNA in Mouse Plasma
The frequency or amount of eccDNA may indicate whether a gene exhibits a genetic disorder. Plasma eccDNA libraries with a median of 17,463,304 paired-end reads (range: 11,845,852-27,836,098) were sequenced using the tagmentation-based eccDNA library preparation protocol as previously described (4), and identified a median of 15,337 eccDNA loci (range: 3,309-94,248).
3. EccDNA Size Distributions in Mouse Plasma
Size distributions of eccDNA can distinguish between organisms with and without certain nuclease deficiencies. Plasma eccDNA molecules were pooled into three groups according to the mouse genotypes. Their size frequencies (Y-axis) are plotted in
In summary, these data indicated that the plasma eccDNA molecules of Dnase1l3−/− mice were longer than those of wild-type and Dnase1−/− mice. These figures show that the size distributions of plasma eccDNA molecules can be used to distinguish mice having certain nuclease deficiencies from mice not having the nuclease deficiencies.
4. EccDNA Size Distributions in Mouse Tissues
To explore whether the size differences of plasma eccDNA among different genotypes of mice described above occurred intracellularly or extracellularly, the eccDNA extracted from the liver and buffy coat collected from wild-type, Dnase1−/−, and Dnase1l3−/− mice were profiled. Two approaches for tissue eccDNA identification were applied in parallel: the tagmentation-based approach and the rolling circle amplification (RCA)-based approach. Tissue eccDNA molecules were pooled for size profiling according to the mouse genotypes and tissue types. EccDNA size differences are observed among different genotypes in plasma but not in tissue.
For both tagmentation and RCA approaches, eccDNA identified in each tissue type was pooled for each genotype of mice and size profiled. The eccDNA molecules originating from these tissues all displayed bimodal size distributions with the two summits at around 200 bp and 350 bp. Of note, the two peak clusters of the liver eccDNA were sharper than those of the buffy coat eccDNA. The 10-bp periodic oscillations were also apparent in the liver eccDNA (reminiscent of plasma eccDNA patterns) but relatively obscure in the buffy coat eccDNA. Such variations possibly hinted that the characteristics of eccDNA might depend on their tissues of origin. No obvious difference in eccDNA size distributions could be observed among wild-type, Dnase1−/− and Dnase1l3−/− mice for either the liver or buffy coat.
The results showing that the eccDNA size differences among genotypes were observed in plasma but not in tissue suggested that the effects of DNASE1L3 on intracellular eccDNA may be insignificant. In contrast, this enzyme may be able to act on eccDNA after these molecules were released into the blood circulation.
5. Dnase1l3−/− Mouse Pregnancy Model
To test the hypothesis that the size differences of eccDNA observed in plasma between wild-type and Dnase1l3−/− mice were due to extracellular DNASE1L3 effects, the Dnase1l3−/− mouse pregnancy model was employed. In this model female mice of the C57BL/6 (B6) strain with or without Dnase1l3 deficiency were crossed with wild-type mice from the BALB/c genomic background. As such, three mating groups were generated: (1) wild-type females pregnant with wild-type fetuses; (2) Dnase1l3−/− females pregnant with Dnase1l3−/− fetuses; (3) Dnase1l3−/− females pregnant with Dnase1l3+/− (heterozygous) fetuses. The genomic differences between the B6 and BALB/c strains may also be used to distinguish fetal-specific molecules from those shared by the mother and the fetuses (i.e., shared molecules) (see details in Materials and Methods). The results show that DNASE1L3 released by the fetus can digest the eccDNA in maternal plasma.
6. Human Subjects with DNASE1L3 Deficiency
The effects of DNASE1L3 deficiency on plasma eccDNA were further investigated in patients with DNASE1L3 loss-of-function mutations (i.e., DNASE1L3−/−) Detailed sample information of these subjects is described in the Materials and Methods.
B. Size Profile Changes
Biological properties of eccDNA could be affected by the activity of nucleases. By utilizing knockout mouse models of Dnase1 and Dnase1l3, it is shown herein that the deficiency of Dnase1l3 would significantly lengthen the plasma eccDNA in mice. However, such effects were not observed in mice with Dnase1 deficiency. DNASE1L3 may be one of the main contributors affecting the size characteristics of cell-free eccDNA.
Intriguingly, the plasma eccDNA size distributions in wild-type mice (
Notably, at the intracellular level, neither Dnase1−/− mice nor Dnase1l3−/− mice showed observable change in eccDNA size profiles (
The extracellular function of DNASE1L3 on cell-free eccDNA was further evidenced by our findings from the Dnase1l3 pregnancy mouse model. It has previously been established that in Dnase1l3-deficient mice pregnant with Dnase1l3−/− fetuses, DNASE1L3 released from the fetuses could degrade linear cfDNA molecules in a systemic manner (Serpas et al., PNAS (2019)). Similarly, a partial restoration of eccDNA size profiles towards the wild-type patterns in the maternal plasma under the same pregnancy setting was observed here. This finding suggested that the extracellular DNASE1L3 produced by the fetuses may act on the eccDNA in the maternal blood circulation, mediating the degradation of maternal cell-free eccDNA. As to the local effects of DNASE1L3 (Serpas et al., PNAS (2019)), a shortening of eccDNA derived from the Dnase1l3−/− fetuses when compared to their Dnase1l3−/− mothers was not observed. We speculated that there might be certain features of cell-free eccDNA that remain to be unveiled, such as whether those eccDNA molecules would be associated with extracellular vesicles or histone proteins.
A biological link between nuclease activity and the properties of cell-free eccDNA is established using mouse and human models with DNASE1L3 deficiency. Since aberrant expression of DNASE1L3 has been reported in multiple disorders such as systemic lupus erythematosus (R. W. Y. Chan, et al., Plasma DNA Profile Associated with DNASE1L3 Gene Mutations: Clinical Observations, Relationships to Nuclease Substrate Preference, and In Vivo Correction. Am. J. Hum. Genet., 1-13 (2020), J. Hartl, et al., Autoantibody-mediated impairment of DNASE1L3 activity in sporadic systemic lupus erythematosus. J. Exp. Med. 218 (2021)) and certain types of cancer (S. D, T. S, Characterization of human DNase I family endonucleases and activation of DNase gamma during apoptosis. Biochemistry 40, 143-152 (2001), M. Napirei, S. Wulf, D. Eulitz, H. G. Mannherz, T. Kloeckl, Comparative characterization of rat deoxyribonuclease 1 (Dnase1) and murine deoxyribonuclease 1-like 3 (Dnase1l3). Biochem. J. 389, 355-64 (2005)), size pattern analyses of cell-free eccDNA can be used for biomarker developments for these diseases.
C. Example Methods
The size profile of eccDNA can be used to determine various characteristics of a biological sample or the subject from which the biological sample is obtained. The amounts of certain sizes of eccDNA may be used to compare different size profiles. The raw amounts of certain sizes or the ratios of different sizes can be used. Genetic disorders, including disorders related to under- or over-production of enzymes, may be detected. The activity of nucleases may be monitored. The condition of a subject can be determined based on nuclease activity. The amounts of eccDNA can be used to determine a genetic disorder.
1. Determining a Genetic Disorder Associated with a Nuclease Using eccDNA Size Distribution
At block 5502, sequence reads obtained from sequencing cell-free DNA fragments from eccDNA in the biological sample of the subject are received. The sequencing may be performed in various ways, e.g., as described herein. The sequencing may be targeted sequencing as described herein. For example, biological sample can be enriched for eccDNA first and/or also enriched for fragments from a particular region or regions within eccDNA. Different methods for tissue eccDNA identification and enrichment are shown in
At block 5504, the sequence reads are used to determine a size value of a size distribution of the cell-free DNA fragments. A size value may characterize a size distribution. A size value may characterize an amount of cell-free DNA fragments at one or more sizes. In some embodiments, the size distribution is of the cell-free eccDNA.
In some embodiments, the size value is a ratio of a first amount of the cell-free DNA fragments having a first size relative to a second amount of the cell-free DNA fragments having a second size (e.g.,
In some embodiments, the size value is based on an amount of cell-free DNA fragments in any of the sizes described herein rather than a ratio of amounts of different sizes. For example, the size value may be the amount of cell-free DNA fragments having sizes of 300 bp to 400 bp. The size value may be a frequency (e.g., percentage) or a count of DNA fragments. The frequency may be an area under curve (AUC) of frequency for a size range. For example, the AUC may be the AUC under one of the two peaks in
At block 5506, the size value from the sample (e.g., from a human subject, or another mammal) are then be compared to a reference size value obtained from one or more reference samples. The samples may be obtained from subjects pregnant with fetus.
The reference samples may comprise a sample obtained from a subject pregnant with a fetus. The reference samples may comprise a sample obtained from a healthy subject, for example a subject that does not have a nuclease activity deficiency, or any genetic disorder for a gene associated with a nuclease. The healthy subject may have normal nuclease activity. The reference samples may comprise a sample from a subject that has a nuclease activity deficiency, or a genetic disorder for a gene associated with a nuclease. The reference sample may be obtained from a tissue or blood (e.g., plasma or serum) or any biological sample described herein. The reference sample may be from subjects at a same or similar gestational age (e.g., same trimester or a gestational age within 1, 2, 3, or 4 weeks of the subject).
The reference value may be determined in the same manner as the size value. The difference between a size value of a sample and a reference value of the reference sample may be used to classify whether a gene exhibits the genetic disorder. The reference value may be a cutoff value that determines a statistically significant difference from the reference samples. For example, the reference value may be one, two, or three standard deviations away from an average of reference subjects with or without the genetic disorder. In some cases, the reference sample is obtained from the subject prior or after a treatment, where the treatment affects the activity of a nuclease. In some embodiments, the treatment is hemodialysis.
In some embodiments, the comparison to the reference can involve a machine learning model, e.g., trained using supervised learning. The size values (and potentially other criteria, such as copy number and methylation levels) and the known conditions of training subjects from whom training samples were obtained can form a training data set. The parameters of the machine learning model can be optimized based on the training set to provide an optimized accuracy in classifying the level of the condition. Example machine learning models include neural networks, decision trees, clustering, and support vector machines. Comparisons may be carried out as described in block 3906 of
At block 5506, a classification of whether the gene associated with a nuclease exhibits a genetic disorder is determined based on the comparison of the size values. In some embodiments, the subject is pregnant with a fetus. The sample may contain cell-free eccDNA from the subject and the fetus. The size value comparison may then be used to determine a classification of whether the fetus has a nuclease activity deficiency, or a genetic disorder for a gene associated with a nuclease. In some cases, the same sample may be used to determine a classification of whether the pregnant subject has a nuclease activity deficiency, or a genetic disorder for a gene associated with a nuclease based on the comparison. The genetic disorder may be a disorder of the DNASE1L3 gene. Genetic disorders may include disorders of one or more of the following genes: DNASE1, DFFB, TREX1 (Three Prime Repair Exonuclease 1), AEN (Apoptosis Enhancing Nuclease), EXO1 (Exonuclease 1), DNASE2 (Deoxyribonuclease 2), ENDOG (Endonuclease G), APEX1 (Apurinic/Apyrimidinic Endodeoxyribonuclease 1), FEN1 (Flap Structure-Specific Endonuclease 1), DNASE1L1 (Deoxyribonuclease 1 Like 1), DNASE1L2 (Deoxyribonuclease 1 Like 2), and EXOG (Exo/Endonuclease G).
In some cases, the sample may be used to detect if a maternal allele, a paternal allele (e.g., a fetal-specific allele), or both alleles of a gene, associated with a nuclease, exhibit a genetic disorder (e.g.,
2. Determining an Efficacy of Treatment for a Blood Disorder
At block 5710, sequence reads obtained from sequencing cell-free DNA fragments in a blood sample of the subject are received. The blood sample of the subject may be obtained after the subject has undergone a treatment (e.g., a first dosage of a treatment). Treatments may include anticoagulants, hemodialysis, a kidney transplant, or any treatment described herein. The sequence reads may be obtained similar to the manner described with block 4002 of
At block 5720, a size value of a size distribution of the cell-free DNA fragments is determined. The size value characterizes an amount of cell-free DNA fragments at one or more sizes. Block 5720 may be performed in a similar manner as block 5504.
At block 5730, the size value is compared to a reference value to determine a classification of the efficacy of the treatment. A second dosage of the anticoagulant can be administered to the subject based on the comparison, the second dosage being greater than the first dosage. In other examples, the second dosage can be less than the first dosage, e.g., if the amount overshoots the reference value. Treatments may be continued, increased, or discontinued based on the comparison.
The reference value can correspond to a measurement previously performed in the subject before administering the treatment. The change in the amount from the previous measurement can indicate an efficacy of the treatment. In another implementation, the reference value can correspond to the amount measured in a healthy subject. An efficacious treatment can be one that brings the amount to within a threshold of the reference value for the healthy subject. In yet another implementation, the reference value can correspond to the amount measured in a subject that has the blood disorder (e.g., as may be previously measured in the subject before administering the treatment). For example, a reference value may comprise a wildtype animal or a healthy human subject. A reference value may comprise a tissue specific sample or a portion of a sample obtained from the same subject (e.g., sequence reads obtained from plasma or buffy coat portion of a sample), as shown, for example, in
3. Monitoring Activity of a Nuclease Using eccDNA
At block 5802, similar to block 5502, sequence reads obtained from sequencing cell-free DNA fragments from eccDNA in the biological sample of the subject are received.
At block 5804, similar to block 5504, the sequence reads are used to determine a size value of a size distribution of the cell-free DNA fragments.
At block 5806, similar to block 5506, the size value from the sample (e.g., from a human subject, or another mammal) are compared to a reference size value obtained from one or more reference samples. A classification of the activity of the nuclease may then be determined based on the comparison. The nuclease may be DNASE1L3, DNASE1, DFFB, TREX1 (Three Prime Repair Exonuclease 1), AEN (Apoptosis Enhancing Nuclease), EXO1 (Exonuclease 1), DNASE2 (Deoxyribonuclease 2), ENDOG (Endonuclease G), APEX1 (Apurinic/Apyrimidinic Endodeoxyribonuclease 1), FEN1 (Flap Structure-Specific Endonuclease 1), DNASE1L1 (Deoxyribonuclease 1 Like 1), or DNASE1L2 (Deoxyribonuclease 1 Like.
In some cases, the pregnant subject, the fetus, or both have a nuclease activity deficiency, or a genetic disorder for a gene associated with a nuclease. In some other cases, only one of the pregnant subject or the fetus has a nuclease activity deficiency, or a genetic disorder for a gene associated with a nuclease. The gene may be any member of the DNase I family (e.g., in humans). In some embodiments, the gene is DNASE1 or DNASE1L3. A loss of both alleles (i.e., homozygosity for a null allele) or one of the two alleles (i.e., heterozygosity) of the gene may be associated with a disease. For example, homozygosity for a null allele in DNASE1L3 may be associated with a condition such as systemic lupus erythematosus. In another example, heterozygosity of the DNASE1L3 may be associated with condition such as rheumatoid arthritis. The size value comparison may be used to determine a nuclease activity deficiency in the subject (i.e., the sample obtained from the subject).
Genotype information of the fetus may be obtained from comparing the size value to the reference value without genotyping the mother. However, in some embodiments, the mother may be genotyped. The mother may be homozygous (e.g., loss of both alleles or wild type for both alleles) or heterozygous. A plurality of reference values may be obtained for groups where (1) the mother is homozygous (wild type) and does not have a deficiency of a gene and the fetus is wild type (e.g.,
The classification may be that the nuclease activity is deficient. The size value may indicate longer cell-free DNA fragments than the reference value. For example, the size value may be a ratio of a cluster of longer sizes to shorter sizes, and the ratio may be larger than the reference value.
In some cases, the nuclease activity deficiency is a hallmark of a condition such as a cancer. As such, in some embodiments, the size value comparison described herein is used to classify a subject (i.e., a sample obtained from the subject) as having the condition (e.g., a cancer). The classification may be that the subject has the condition (e.g., cancer) characterized by the deficient nuclease activity. The reference value may be determined from one or more reference subjects having the condition or from one or more reference subjects without the condition.
4. Determining a Genetic Disorder Using eccDNA Amount
At block 5902, similar to blocks 5502 or 5802, sequence reads obtained from sequencing cell-free DNA fragments from eccDNA in the biological sample of the subject are received.
At block 5904, the sequence reads are used to determine a value of a parameter corresponding to an amount of eccDNA in the biological sample. The parameter corresponding to the amount of eccDNA in the biological sample may be, for example, a ratio of the amount of eccDNA to a total amount of mappable sequence reads from the biological sample. For example, the parameter may be the eccDNA per million mappable reads (EPM) described in
At block 5906, the value of the parameter corresponding to an amount of eccDNA in a sample may be compared to a reference value of the parameter of eccDNA in a reference subject to determine a classification of whether the gene exhibits the genetic disorder in the subject.
The reference subject may be any of the reference subjects described herein (e.g., a subject with the gene associated with a nuclease that does not exhibit a genetic disorder, a subject that does not have a nuclease activity deficiency). The biological sample may be processed to enrich cell-free DNA fragments from eccDNA. The processing may include physical treatment (e.g., filtering, centrifugation, etc.), chemical treatment (e.g., enzymatic digestion), or a combination thereof. In some embodiments, the sample is treated with a nuclease to remove linear DNA before the sequencing of cell-free DNA fragments from eccDNA. The genetic disorder may be any disorder described herein, including a disorder related to DNASE1L3. The sample may be further processed by treating the sample with a nuclease followed by sequencing the cell-free DNA fragments to produce the sequence reads. In some cases, the nuclease is exonuclease V.
5. Quantifying eccDNA in a Sample
The absolute quantity of eccDNA in a sample can be determined by spiking in known quantities of circular DNA. The amount of sequence reads corresponding to the the known quantities of spike-in molecules can then be used to determine the quantity of eccDNA in the sample. A calibration curve relating known quantities of spike-in molecules to amounts of sequence reads may be used to determine the quantity of eccDNA.
Cell-free DNA may be extracted to form the biological sample. The extraction may be similar to step 5601 of
The mixture of DNA from the biological sample and the spiked-in circular DNA may then be treated with exonuclease V for linear DNA digestion (e.g., step 5615 of
A conversion formula derived from the calibration curve can be applied to convert the read counts of eccDNA of various sizes to absolute quantities. For example, if 1 ng of spiked-in circular DNA (e.g., 200 bp) gives 10,000 reads, then 10,000 reads of 400 bp eccDNA of interest would correspond to 2 ng of such molecules in the samples. Such conversion formula might also take into account factors such as, but not limited to, sizes of eccDNA identified, sequencing depth, sequencing length, DNA mappability, and PCR duplication rates. The quantities of eccDNA in a sample may be used to distinguish between healthy control and patient groups using this parameter without considering batch-to-batch variations.
At block 6010, a first set of sequence reads obtained from sequencing cell-free DNA fragments from extrachromosomal circular DNA (eccDNA) in a mixture prepared from the biological sample of the subject is received. The first set of sequence reads may be obtained by any method described herein, including for example, block 3902 from
The known quantity of circular DNA may be added to the biological sample to obtain and then processed as described with
The same size of circular DNA may be added to each mixture. The size may be a size including, but not limited to, 100 nt, 200 nt, 500 nt, 1000 nt, 2000 nt, 3000 nt, 4000 nt, 5000 nt, or a size within a range specified by within any two of these sizes.
The mixture and the one or more additional mixtures may be processed as described with
At block 6020, sizes of the cell-free DNA fragments are measured using the first set of sequence reads. These fragments include the fragments from the eccDNA and from the spiked-in circular DNA. The sizes of the cell-free DNA fragments may be measured using alignment of the fragments to a reference genome. The genomic positions of the outermost nucleotides at the ends of a fragment may be determined. The size of the fragment may be calculated using the difference between the genomic positions. In some embodiments, the fragment may be sequence, and the size may be determined by counting the nucleotides in the fragment.
At block 6030, a first amount of the first set of sequence reads corresponding to a first size is determined. The first size may be fragments from the eccDNA and not the circular DNA. The first size may be a specific size or a size range. For example, the size range may be a range from 50 bp to about 250 bp, about 50 bp to about 100 bp, about 100 bp to about 150 bp, about 150 bp to about 200 bp, or from about 200 bp to about 250 bp, about 250 bp to about 300 bp, 300 bp to about 350 bp, about 350 bp to about 400 bp, about 400 bp to about 450 bp, or about 450 bp to about 500 bp, about 500 bp to about 550 bp, 550 bp to about 600 bp, 600 bp to about 650 bp, 500 to about 650 bp, 650 bp to about 700 bp, 700 bp to about 750 bp, 750 bp to about 800 bp, 700 to 800 bp, or 800 to 850 bp The first amount may be a number of fragments or a total length of fragments.
At block 6040, a second amount of the first set of sequence reads corresponding to a second size is determined. The second size may be the particular size of the known quantity of circular DNA in the mixture. The second size may be any size described herein for the sizes of circular DNA. The second size may be a size that is different from sizes resulting from fragments of eccDNA.
At block 6050, the first amount is compared to a calibration data point. The calibration data point may be determined using the second amount of a second set of sequence reads corresponding to the second size. The calibration data point may include a coordinate for the amount as the number of sequence reads and another coordinate for the quantity in the mixture or the biological sample. The calibration data point may be a point of a calibration curve. The calibration curve may be determined using one or more additional amounts of sequence reads corresponding to the second size. Each of the one or more additional amounts may correspond to one or more additional known quantities of the circular DNA in the one or more additional mixtures. The additional known quantities may be different from the known quantity. The known quantity and additional known quantities may be any described herein.
The calibration curve may be a curve determined by a plurality of calibration data points. The plurality of calibration data points may be from a plurality of amounts of sequence reads and a plurality of known quantities of circular DNA. The calibration curve may be a curve that relates the amounts of sequence reads to the known quantities. The calibration curve may be a fit or a regression to a plurality of calibration data points. In some embodiments, the calibration curve may be a function relating amounts of sequence reads to quantities of the circular DNA in a mixture or a biological sample.
At block 6060, a quantity of cell-free DNA fragments from eccDNA corresponding to the first size in the mixture is determined using the comparison. The quantity may be a mass, number of fragments, or length of fragments. The determination of the quantity may include taking the known quantity associated with the calibration data point and adjusting the known quantity by factors including sizes of eccDNA, sequencing depth, sequencing length, DNA mappability, and PCR duplication rates. For example, the known quantity associated with the calibration data point may be multiplied by a ratio of the size of the eccDNA of interest over the size of the circular DNA.
The quantity of sizes of eccDNA other than the first size may be determined. The calibration data point may be a first calibration data point. The quantity may be a first quantity. The known quantity of the first calibration data point may be a first known quantity. A third amount of sequence reads corresponding to a third size of cell-free DNA fragments from eccDNA in the mixture may be determined. The third size may be different from the first size and the second size. The third amount may be compared to a second calibration data point. The second calibration data point may be determined using a fourth amount of a third set of sequence reads corresponding to the second size of the added circular DNA. For example, the second calibration data point may relate the fourth amount with a second known quantity of the second size. The second known quantity and the third amount may be from a second mixture. The third amount may be closer to the fourth amount than the second amount, so the second calibration data point is used in place of or in addition to the first calibration data point. A second quantity of cell-free DNA fragments from eccDNA corresponding to the third size may be determined using the comparison. In some embodiments, the eccDNA of the third size may be in a different mixture than the eccDNA of the first size.
A value of a parameter may be determined using the quantity of cell-free DNA fragments from eccDNA corresponding to the first size in the mixture. The parameter may be a normalized value of the quantity. For example, the parameter may be the quantity divided by the volume or mass of the mixture or the biological sample. The parameter may be a concentration. In some embodiments, the parameter may be determined using one or more physical characteristics of the subject from which the biological sample is obtained. For example, the parameter may use the weight or height of the subject. The value of the parameter may be determined using the second quantity of the cell-free DNA fragments from eccDNA corresponding to the third size. Additional sizes in a size range may be used for the parameter, including any size range described herein. In some embodiments, the parameter may be the quantity of a size or sizes.
The value of the parameter may be compared to a reference value to determine a classification of whether a the gene exhibits the genetic disorder in the subject. The gene and genetic disorder may be any described herein. The reference value may be determined from subjects with the gene not exhibiting a genetic disorder for a gene associated with a nuclease or from subjects exhibiting the genetic order. The reference value may be a cutoff value or a threshold value indicating a statistically different value of the parameter for the reference subjects. The genetic disorder may be characterized by deficiency in the nuclease. The classification may be that the gene exhibits the genetic order if the value of the parameter is greater than the reference value or if the value of the parameter is less than the reference value.
Embodiments may also include treating the genetic disorder. Treatment may include any treatment described herein.
Process 6000 may include additional implementations, such as any single implementation or any combination of implementations described below and/or in connection with one or more other processes described elsewhere herein.
Although
D. Materials and Methods
Experiments were performed using both mice models and human subjects to study the differences related to differences in the DNASE1L3 gene.
1. Animal Models
Mice with deletion of the Dnase1 gene (Dnase1−/−) were obtained from the Knockout Mouse Project Repository of the University of California at Davis; mice with deletion of the Dnase1l3 gene (Dnase1l3−/−) were obtained from the Jackson Laboratory. Mice were maintained in the Laboratory Animal Center of The Chinese University of Hong Kong (CUHK) with all experimental procedures approved by the Animal Experimentation Ethics committee of CUHK in compliance with the Guide for the Care and Use of Laboratory Animals (8th ed., 2011) established by the National Institutes of Health.
2. Human Subjects
Four healthy human subjects were recruited with written informed consent. Three human subjects with DNASE1L3 mutations were recruited from the Istituto Giannina Gaslini (Italy). One of these three DNASE1L3-mutated subjects provided blood samples at both pre- and post-hemodialysis timepoints. Thus, four blood samples in total were obtained from this patient cohort.
3. Mouse Sample Collection and Processing
Blood samples were collected from 12 wild-type, 11 Dnase1−/− and 11 Dnase1l3−/− mice by cardiac punctures and centrifuged at 1,600×g for 10 min at 4° C., followed by another centrifugation step at 16,000×g for 10 min at 4° C. of the plasma portion to remove cell debris. The buffy coat portion was centrifuged at 5,000×g for 5 min at room temperature to remove residual plasma. Mouse liver tissues were collected and immediately stored at −80° C. Plasma DNA was extracted using QIAamp Circulating Nucleic Acid Kits (Qiagen). Buffy coat (6 wild-type, 4 Dnase1−/− and 5 Dnase1l3−/− mice) and liver (5 wild-type, 5 Dnase1−/− and 5 Dnase1l3−/− mice) tissue DNA was extracted using QIAamp DNA Mini Kits (Qiagen).
4. EccDNA Library Preparation and Sequencing
EccDNA library constructions from plasma samples were performed using the tagmentation-based method as detailed previously (Sin et al., PNAS (2020)). For eccDNA enrichment from tissue DNA of the liver and buffy coat, we employed a dual size selection approach using solid phase reversible immobilization (SPRI) beads (Beckman Coulter).
The following provide more information regarding obtaining circular DNA. Additional information regarding circular DNA can be found in US Patent Publication No. 2020/0407799 A1, filed Mar. 25, 2020, the contents of which are incorporated herein by reference for all purposes.
In step 5615, the workflow first reduces (e.g., to essentially eliminate) linear DNA in the plasma DNA samples by exonuclease digestion (e.g., using exonuclease V). Other techniques can also be used to reduce linear DNA, e.g., cesium chloride-ethidium bromide (CsCl-EB) density gradient centrifugation.
We then followed up this with an approach to open up the circles (e.g., of eccDNA) to form linearized DNA molecules. The linearization of the eccDNA can be performed in various ways. In one example, we utilize restriction enzyme digestion to open up the circles at particular cleavage sites having a cutting sequence motif, which is a type of cutting tag. In another example, we use a transposase (e.g., via tagmentation [step 5620]) for opening up the circles, e.g., to insert a cutting tag that is recognizable like the cutting sequence motif for restriction enzyme digestion. Library preparation and next-generation sequencing of the resultant linearized DNA can then be performed.
Among the various examples using enzyme digestion, one implementation can use the restriction enzyme MspI (cutting of CCGG sequence; methylation-insensitive). In another implementation, we used the restriction enzyme HpaII (cutting of CCGG sequence; methylation-sensitive). In yet another implementation, we combined data generated through the use of MspI and HpaII to arrive at novel insights of eccDNA.
Restriction enzymes other than MspI and HpaII can be used. As an illustration, DpnI and DpnII, both recognize GATC sequence, could also be used. DpnI cleaves only when the recognition site (A base) is methylated. On the other hand, DpnII is not sensitive to methylation status. The number of bases recognized and cut can vary. For example, both MspI and HpaII are 4-base cutters. Restriction enzymes other than a 4-base cutter can be used, such as 6-base cutters.
When compared to rolling circle amplification of eccDNA (Shibata et al. Science. 2012; 336:82-86) and shearing (e.g., by a nebulizer) to form linearized DNA, an approach using cutting tags (e.g., restriction enzyme or transposase approach) can provide more stringent criteria in the definition (identification) of eccDNA reads. For example, an eccDNA molecule can be accurately identified using two more anchors comprising the known sequence (cutting tag) where a cut has been made (e.g., CCGG fragment ends) and the absence of a gap between the two end sequences of the sequence read(s). Such a signature anchors can be used to accurately identify eccDNA reads and for determining their location in a reference genome. The absence of a gap can be determined using the reference genome via an alignment procedure, as described in more detail below.
This information from the cutting tag (e.g., CCGG read ends) not only facilitates more accurate identification of eccDNA, the complementing information provided by the number of eccDNA detected from methylation-insensitive and methylation-sensitive restriction enzymes also allows one to deduce the methylation levels of the eccDNA. Such information was not available through previously documented approaches. Moreover, the inexistence of CCGG fragment ends in the eccDNA fragments (or other recognition sequences specific for other types of restriction enzymes, i.e., other types of cutting tags) can provide insights of the pre-existence of eccDNA damage, which refers to linearization of eccDNA prior to restriction enzyme cutting. Such linearization might result from mechanical shearing during DNA processing, nuclease attacks in blood stream, etc. Such eccDNA molecules, although detected with junctional sites, often lack restriction enzyme cutting motifs at one or both ends of the fragment. Such cases can be referred to as “pre-existent eccDNA damage.” Such information was also not obtainable by previously documented approaches. Such information could provide valuable knowledge for the biological mechanisms of eccDNA generation and processing in vivo.
The use of restriction enzyme digestion has been used in the creation of recombinant plasmids for molecular cloning. However, there are clear differences between such an application and the present disclosure. Firstly, eccDNA molecules are generated from the genome of organisms with clear start and end positions when mapped to the genome, whereas such concepts do not exist in a bacterial plasmid. Secondly, the restriction enzyme approaches for eccDNA study can provide insights of the host genome sequences. But for the bacterial plasmid DNA, restriction enzyme digestion approaches only allow one to peek into the plasmid DNA information and not the host genome itself (Shintani et al. Front Microbiol. 2015; 31; 6:242).
The restriction enzyme approach uses the presence of specific recognition sites on the eccDNA in order for its digestion and linearization. A tagmentation approach, which makes use of random cutting of DNA by a transposase, does not require specific DNA sequences. Therefore, the tagmentation approach could potentially provide a higher number of linearized eccDNA for library construction and sequencing. In a previous report, the use of tagmentation for eccDNA analysis in tissues was described (Shoura et al. G3 (Bethesda). 2017; 7(10):3295-3303). Shoura et al used cesium chloride-ethidium bromide density gradient centrifugation to enrich eccDNA from tissue genomic DNA. In contrast, such a step does not need to be performed. Therefore, a tagmentation approach of the present disclosure can be more suitable for plasma DNA and other bodily fluids or stool that include circulating DNA.
a) Principle and Bioinformatics Approach for eccDNA Identification
An eccDNA 6110 is shown having a circular junction locus 6112 that includes the two regions 6102 and 6106 from genome 6100. The ends of region 6102 and 6106 include nucleotides at two separated genomic locations that are immediately adjacent to one another in eccDNA 6110 to form circular junction locus 6112. At step 6120, digestion is performed at site 6104 to generate linearized DNA molecule 6125. At step 6130, end repair is performed, e.g., as described above, to generate end-repaired DNA molecule 6135. At step 6140, sequencing (e.g., paired-end sequencing or single molecule sequencing) is performed to obtain sequence 6145, which includes circular junction locus 6112. As shown, sequence 6145 can include read1 and read2.
If we sequenced read1 and read2 with a sufficient read length, there is a high likelihood to have sequence reads across the circular junction locus 6112 (indicated by the chimeric arrows) in the step of paired-end sequencing. Read1 extends from the left end of linearized DNA molecule 6125, where read1 is blue on the left side of circular junction locus 6112 and red to the right of circular junction locus 6112. Read2 extends from the right end of linearized DNA molecule 6125, where read2 is red on the right side of circular junction locus 6112 and blue to the left of circular junction locus 6112.
At step 6150, alignment is performed to the reference genome. When read1 and/or read2 cover the circular junction locus 6112, in the alignment results, we would observe read1 and read2 sequences of linearized molecules (e.g., cutting by MspI) mapping to a reference genome in unique mapping directionalities. For illustration purpose, we define an unmapped segment 6152 (red arrow after the alignment step, “b→a” segment) in read1, which would correspond the sequence across the junction derived from the other genomic region being joint to form a circular DNA. Similarly, we define an unmapped segment 6154 (blue arrow after the alignment step, “e→f” segment) in read2, which would correspond the sequence across the junction derived from the other genomic region being joint to form a circular DNA molecule.
Such unique mapping directionalities are covered by the below two scenarios that involve a reversed direction between the read and the reference genome:
Such unique mapping directionalities were different from conventional mapping directions for a pair of paired-end reads originating from an initially linear DNA. Thus, such criteria can be used to identify a circular molecule. For example, read1 is fully aligned in a forward strand and read2 is fully aligned in a reversed strand when read1 smallest mapping coordinate is equal to or smaller than read2 smallest mapping coordinates; or read1 is fully aligned in a reversed strand and read2 is fully aligned in a forward strand when read2 smallest mapping coordinates are equal to or smaller than read1 smallest mapping coordinates. Bioinformatically, searching the mapping sites in the reference genome of the unmapped segments present in read1 and/or read2 would allow for delineating the junctions. The distance between junction sites deduced from the unmapped segments from a fragment would indicate the size of a circular DNA. For example, the distance between region 6102 and site 6104 provide the size of the circular DNA.
Another feature is that there were two nucleotides overlapped between the mapped read1 and read2 if a circular DNA was cut only once. Such two nucleotides overlapped sequence between read1 and read2 was introduced by the staggered ends (i.e. jagged end) created by MspI or HpaII, or other digestion enzyme. MspI or HpaII would make two staggered single-stranded breaks and the distance between two breaks would be 2 bp. Such 5′ protruding 2-nt single-stranded ends (complementary to each other) would be filled to form blunt ends during the end-repair step. Therefore, the resultant DNA sequences would carry 2 bp overlap between ends of read1 and read2 sequences. In other words, during the library preparation step, there will be an “end repair” step, which will complete the jagged ends into blunt ends by adding two nucleotides to each end. Therefore, the resultant DNA sequences will have two blunt ends instead of two jagged ends. When the two sequencing reads are aligned to the genome, the two nucleotides added during the end repair steps will appear as two extra base pairs that overlap between two reads, which can be used in addition or alternatively to identify a circular NDA molecule.
Taken together, in an example eccDNA identification approach, there can be four “diagnostic features”, including:
Such diagnostic features can greatly improve the specificity in identifying the genome-wide eccDNA molecules in plasma DNA. In some implementations, sequencing reads fulfilling at least one of these “diagnostic features” can be defined as a candidate circular DNA. For a circular DNA being cut multiple times by a restriction enzyme, read1 and read2 would not bear repeated sequences (overlapped bases) between each other. In other implementations, only one read from a pair might cross the junction site and the other would not carry the junction. As another example, both reads from a pair would not carry a junction, but show unique mapping directions implying a circular DNA. In yet another example, even though one could not directly observe the complete restriction enzyme cutting tags in the sequencing reads, one could retrieve the reference sequence from the reference genome between these deduced junction sites of one circular DNA. Then one could bioinformatically investigate if any restriction enzyme cutting tags (motifs) exists in such a retrieved reference sequence. Such inferred restriction enzyme cutting motifs would increase the confidence that the identification of a circular DNA species was indeed correct.
Accordingly, a method can use a restriction enzyme as part of analyzing eccDNA. Such a technique can be used in combination with other methods described herein, e.g., for analysis of eccDNA as well as mtDNA. Downstream analysis can include measurement of properties of the sample using the detection of the circular DNA.
In a first step, a biological sample of an organism can be received. Examples of biological samples are provided herein, such as plasma and serum. The biological sample includes a plurality of extrachromosomal circular DNA (eccDNA) molecules. The eccDNA may be from any number of chromosomes, including the autosomes and/or sex chromosomes. Each of the plurality of eccDNA molecules includes a junction at which nucleotides at two separated genomic locations are immediately adjacent to one another. Circular junction locus 6112 is an example of such a junction with regions 6102 and 6106 including such two separated genomic locations that are immediately adjacent to one another.
In a second step (e.g., step 6120), digestion is performed using a restriction enzyme. In some implementations, more than one type of restriction enzyme can be used. Digesting the plurality of eccDNA molecules can form a set of linearized DNA molecules that each includes the junction. Each restriction enzyme can cut at a different motif, with the resulting linearized DNA fragments having a different cutting tag. The term “linearized DNA fragments” differs from a “linear DNA fragment,” which was already linear before any digestion.
In a third step (e.g., step 6140), for each of the linearized DNA molecules, sequencing of at least both ends of the linearized DNA molecules can be performed to obtain one or more sequence reads. The one or more sequence reads may or may not include the junction. If a read does not include the junction, an eccDNA molecule can still be identified using the directionality of the mapping. In some embodiments, two sequence reads (one for each end) can be obtained. In other embodiments, a single sequence read of the entire linearized DNA molecule can include both ends, as is described herein.
After the sequence reads are obtained, the sequence reads can be mapped (aligned) to a reference genome, e.g., to see if they map in a reverse orientation. If they do map in a reverse orientation (example criterion), then the correspond linearized DNA molecule can be identified as originally being circular. Accordingly, for each of the linearized DNA molecules, a pair of end sequences for the linearized DNA molecule from the one or more sequence reads can be selected. The pair of end sequences do not include the junction. An example of such end sequences are end sequence 6146 and end sequence 6148 in
The mapped reversed end sequences can be analyzed to measure a property of the biological sample. Examples of such measurements are provided herein. Such analysis can use a collective value (e.g., count, size, or methylation) of the detected eccDNA. Accordingly, the method can further include identifying the linearized DNA molecule as originating from an eccDNA molecule based on the pair of reversed end sequences mapping to the reference genome, and determining a collective value of the identified eccDNA molecules, wherein analyzing the mapped reversed end sequences to measure the property of the biological sample uses the collective value.
b) Identification Technique
As explained above, various criteria can be used to identify the circular DNA molecules. Additionally, various procedures may be used in the analysis of the raw sequence reads (e.g., read1 and read2 from
The raw sequence reads can be pre-processed. For example, the duplicated reads, sequencing adapters, and low-quality bases on the 3′ end of a sequencing read can be removed. Further, a specified number of bases of paired-end reads (or from the ends of a single-molecule read) can be selected for alignment.
(1) Putative eccDNA Identification
The bioinformatically truncated read1 and read2 consisting of the first 50 bp of read1 and read2 in pre-processed paired-end reads can be used for alignment to a human reference genome using an alignment procedure, e.g., Bowtie 2 (Langmead et al. Nat Methods. 2012; 9:357-9) in a paired-end mode. Other alignment techniques can also be used. Other lengths of each read may be used besides 50 bp, e.g., at least 20, 25, 30, 35, 40, or 45 bp. A first pass at alignment can try a standard orientation, e.g., read1 is aligned with the left end at a lower genomic position than the last based in the read. For those paired-end reads that are aligned normally (i.e., in a forward direction), the mapping directionality regarding read1 and read2 would be determined in a first pass. In contrast to conventionally properly mapped paired ends, if a fragment's read1 and read2 corresponded to circular DNA, the forward orientation would not provide proper alignment of the pair, as such reads have circular DNA specific mapping directions (
If the pair of reads are not aligned with a forward orientation, a reverse orientation can be tried in a second alignment pass. As shown in
(2) Probing the Junctions of eccDNA Molecules
To accurately locate the genomic location of an eccDNA with single base resolution, some implementations fine-tuned the realignment for putative read, separately. Taking read1 as an example, the first 20 bp and the last 20 bp from read1 sequences were used as seeds (seed A and seed B, respectively) to determine the candidate genomic regions perhaps carrying a junction. The shortened reads used for searching candidate locations helped to minimize the likelihood a read contained a junctions, which would affect the alignment accuracy and the precise determination of a junction site. In this step, multiple hits (e.g., no more than 10 hits for each seed) may be allowed, so as to maximize the sensitivity to detect the junctions. If seed B sequence was not placed in the downstream of seed A mapping position in the same direction, it would suggest that such read1 would carry a junction.
Next, we used a searching approach to probe the junction in a single base resolution for the read1 that was identified as potentially carrying a junction.
In
Such searching was also applied to read2 sequences independently. The read2 sequence would be used for further improving the specificity. For example, the read2 sequence would have two scenarios: (1) read2 sequence carried a junction as read1. Such junction information should be compatible with the results deduced from read1 sequence. (2) read2 sequence did not carry a junction. In this case, read2 sequence should be fully aligned within the regions demarcated by the sequences at either end of the junction site, which was deduced from the read1 sequences (i.e., part A and part B). The processing orders for read1 and read2 would be exchangeable. In yet another embodiment, the total number of mismatches along the whole read carrying the deduced junction was required to be no more than a specified number (e.g., 2).
5. EccDNA Identification and Size Profiling
Details of the bioinformatics principles for mouse eccDNA identification and size profiling were modified from a previous study (Sin et al., PNAS (2020)) with minor adjustments, including the fact that mouse genomes were used as reference genomes. For the mouse pregnancy model, mating pairs were set up as follows: female mice of the C57BL/6 genomic background (wild-type or Dnase1l3−/−) were crossed with male mice from either the BALB/c (wild-type) or the C57BL/6 (Dnase1l3−/−) genomic background (
6. Statistical Analysis
Kruskal-Wallis test followed by Dunn's Multiple Comparison Test was applied to compare three groups of data. Wilcoxon rank-sum test was applied to compare two groups of data. These statistical tests were performed using GraphPad Prism 8.0 (GraphPad Software). Statistical significance was defined as P<0.05.
Embodiments may further include treating the genetic disorder or low nuclease activity (e.g., lower than a threshold) in the patient after determining a classification for the subject. The classification for the subject after treatment may or may not involve adding anticoagulants in vivo or in vitro to enhance the cfDNA end profile. Further, the treatment can be determined as an alternative to a current treatment (e.g., an anticoagulant) when the current dosage has low efficacy, e.g., an increase in dosage or a different anticoagulant can be used. Treatment can be provided according to a determined level of a disorder, any identified mutations, and/or a tissue of origin. For example, an identified mutation (e.g., for polymorphic implementations) can be targeted with a particular drug or chemotherapy. The tissue of origin can be used to guide a surgery or any other form of treatment. And, the level of a disorder can be used to determine how aggressive to be with any type of treatment, which may also be determined based on the level of disorder. A disorder (e.g., cancer) may be treated by chemotherapy, drugs, diet, therapy, and/or surgery. In some embodiments, the more the value of a parameter (e.g., amount or size) exceeds the reference value, the more aggressive the treatment may be.
Treatment may include chemotherapy, which is the use of drugs to destroy cancer cells, usually by keeping the cancer cells from growing and dividing. The drugs may involve, for example but are not limited to, mitomycin-C (available as a generic drug), gemcitabine (Gemzar), and thiotepa (Tepadina) for intravesical chemotherapy. The systemic chemotherapy may involve, for example but not limited to, cisplatin gemcitabine, methotrexate (Rheumatrex, Trexall), vinblastine (Velban), doxorubicin, and cisplatin.
In some embodiments, treatment may include immunotherapy. Immunotherapy may include immune checkpoint inhibitors that block a protein called PD-1. Inhibitors may include but are not limited to atezolizumab (Tecentriq), nivolumab (Opdivo), avelumab (Bavencio), durvalumab (Imfinzi), and pembrolizumab (Keytruda).
Treatment embodiments may also include targeted therapy. Targeted therapy is a treatment that targets the cancer's specific genes and/or proteins that contributes to cancer growth and survival. For example, erdafitinib is a drug given orally that is approved to treat people with locally advanced or metastatic urothelial carcinoma with FGFR3 or FGFR2 genetic mutations that has continued to grow or spread of cancer cells.
Some treatments may include radiation therapy. Radiation therapy is the use of high-energy x-rays or other particles to destroy cancer cells. In addition to each individual treatment, combinations of these treatments described herein may be used. In some embodiments, when the value of the parameter exceeds a threshold value, which itself exceeds a reference value, a combination of the treatments may be used. Information on treatments in the references are incorporated herein by reference.
Experimental techniques used in studying the nuclease effect on linear cfDNA and eccDNA are described. These techniques can be applied to any method described herein.
In example murine models, mice with a CRISPR/Cas9-targeted deletion of exon 5 in Dnase1l3 (mm9 Chr14: 8,809,531-8,810,216) on a C57BL/6NJ background were generated by The Jackson Laboratory. Mice carrying a targeted allele of Dnase1 [Dnase1tm1.1(KOMP)Vlcg] on B6 background and WT control mice on B6 background were obtained from the Knockout Mouse Project Repository of the University of California at Davis. All experimental procedures were approved by the Animal Experimentation Ethics committee of The Chinese University of Hong Kong (CUHK) and performed in compliance with “Guide for the Care and Use of Laboratory Animals” (8th edition, 2011) established by the National Institutes of Health. The mice were maintained in the Laboratory Animal Center of CUHK. Male and female mice aged 14-20 weeks were used for experiments. An analysis on the influence of sex and gender on the results was not done since their blood samples were pooled together.
In example murine sample collection, mice were euthanized and exsanguinated by cardiac puncture. Whole blood was placed into EDTA-containing tubes (1.3 mL K3E microtubes from Sarstedt) and immediately separated by a double centrifugation protocol (1,600×g for 10 minutes at 4° C., then recentrifugation of the plasma at 16,000×g for 10 minutes at 4° C.) (Chiu et al., 2001). Plasma from 3-4 mice were collected into each pool, yielding 1.1-1.9 mL plasma per pool. In total, we created 6 pools of WT from plasma of 20 WT mice, 6 pools of Dnase1l3−/− from plasma of 20 Dnase1l3−/− mice, and 2 pools of Dnase1−/− from plasma of 8 Dnase1−/− mice.
In example human subjects, 3 subjects (H2, H4, and V11) were recruited with DNASE1L3 deficiency and 1 heterozygous parent (H1) from the Istituto Giannina Gaslinin (Italy) and The Hospital for Sick Children (SickKids) (Canada) with written informed consent. The 3 DNASE1L3-deficient subjects (H2, H4, and V11) have homozygous frameshift c.290_291delCA (p.Thr97Ilefs*2) mutation, and H1 is the heterozygous parent of H2 and H4. Plasma data of 8 healthy individuals from a previously published dataset were used as controls (Chan et al., 2013). Plasma was collected for all human subjects, but paired buffy coat was available only for H1, H2, and H4. The study was approved by the Joint Chinese University of Hong Kong-Hospital Authority New Territories East Cluster Clinical Research Ethics Committee, the ethics committee of the Istituto Giannina Gaslini (approval BIOL 6/5/04), and the SickKids Research Ethics Board.
A. DNA Extraction and Bisulfite DNA Sequencing
In an example, plasma DNA was extracted with the QIAamp Circulating Nucleic Acid Kit (Qiagen), and buffy coat DNA was extracted with the QIAamp DNA Blood Mini Kit (Qiagen) then sonicated to a median size of 350 bp (Covaris). Indexed DNA libraries were constructed using the TruSeq DNA Nano Library Prep Kit (Illumina) with bisulfite modification using the EpiTect Bisulfite Kit (Qiagen). The bisulfite-converted DNA libraries were enriched with 12 cycles of PCR and analyzed on Agilent 4200 TapeStation (Agilent Technologies) using the High Sensitivity D1000 ScreenTape System (Agilent Technologies) for quality control and gel-based size determination. Libraries were quantified by the Qubit dsDNA high sensitivity assay kit (Thermo Fisher Scientific) before sequencing. 2×75 bp paired-end sequencing was performed on the HiSeq 4000 platform (Illumina) for the plasma libraries and on the NextSeq 500 platform (Illumina) for the buffy coat libraries.
B. Quality Control, Trimming, and Alignment of Bisulfite Sequencing Data
In an example, sequences were assigned to their corresponding samples based on their six-base index sequence. The adapter sequences were removed and low quality bases with Phred score below 20 were trimmed from the paired-end bisulfite sequencing reads. Cleaned reads were aligned to the reference genome (mouse: NCBI MGSCv37/UCSC mm9; human: NCBI GRCh37/USCS hg19; non-repeat-masked) with a maximum of two mismatches. Paired-end reads sharing the same start and end genomic coordinates were deemed PCR duplicates and were discarded from downstream analysis. The methylation densities of all CpG sites across the genome were generated by Methy-Pipe (Jiang et al., 2010).
C. Calculation of End Density and Methylation Level Around Different Regions
In an example, RNA polymerase II (Pol II), H3K4me3, H3K27ac regions were downloaded from the Human and Mouse ENCODE project (Shen et al., 2012; Dunham et al., 2012). The transcriptional start sites (TSSs) of all genes and the CpG islands (CGI) were downloaded from UCSC. 10,000 random non-overlapping regions of 10,000 bp length were randomly selected across the whole genome by BEDTools (v2.27.1) (Quinlan and Hall 2010). Using a visualization window size of ±1000 bp, the fragment end counts was normalized by the median end counts in the ±3000 bp region to obtain the normalized end density. The methylation level of these regions were calculated from the CpG sites in the corresponding regions. The respective sample medians were calculated and plotted.
D. cfDNA Size of 0% and 100% Methylated Fragments
In an example, the genome coordinates of the aligned ends were used to deduce the size of the whole fragment of the sequenced cfDNA. To identify 0% and 100% methylated fragments, fragments with three or more CpG sites were used to calculate the methylation percentage. Those with zero out of at least three CpGs methylated were labelled as a 0% methylated fragment, and those with all out of at least three CpGs methylated were labelled as 100% methylated fragments. The median size of each genotype in these fragment types was plotted.
E. OCR and CGI Fragment Analysis
In an example, the region ±500 bp around the center of TSS, PoI II, H3K4me3 and H3K27ac regions were merged with CGI regions. Fragments were considered within these regions if at least one base overlapped with these regions. The fragment percentage and the size profile of the fragments within these regions were calculated, and the methylation level and size profile was recalculated after masking these regions. For the circos plot, the reference genome was split into 1 Mb bins, and each dot in the circos plot represents the methylation level of each bin deduced from all the CpG sites within the 1 Mb bin.
F. Analysis of Putatively Methylated and Unmethylated CpGs
In example murine models, whole-genome bisulfite sequencing (WGBS) data for 8 mouse tissues with 2 biological replicates were obtained from the ENCODE portal (https://www.encodeproject.org/) using the following identifiers: ENCFF874IPH, ENCFF249MKR, ENCFF916JME, ENCFF012ENO, ENCFF283GDL, ENCFF348XNA, ENCFF978EJO, ENCFF282MIR, ENCFF779LLA, ENCFF060ISR, ENCFF853NGK, ENCFF373MDU, ENCFF306KYH, ENCFF663AVX, ENCFF678IZX, ENCFF918TYN, ENCFF098RUM, ENCFF585VLM, ENCFF847MPY, ENCFF980YJZ, ENCFF073OSB, ENCFF804QBF, ENCFF192LZC, ENCFF442AJP, ENCFF541AEY, ENCFF753BBR, ENCFF798LHE, ENCFF082ZSO, ENCFF623FPU, ENCFF422TOH, ENCFF240XBY, ENCFF566GDN, ENCFF340YVI, ENCFF703DEV, ENCFF802SFU, ENCFF306ZPW. WGBS data for 9 human tissues were obtained from the Roadmap Epigenomics Project using the following identifiers: GSM1010983, GSM1010981, GSM983648, GSM983649, GSM1010984, GSM983650, GSM916049, GSM983647, GSM983651, GSM1010987, GSM983645, GSM983646, GSM983652, GSM1120324, GSM1010978, GSM1058027, GSM1059433, GSM1120321. Alignment and methylation analysis of these dataset was performed by Bismark with the ENCODE WGBS single-end pipeline (Krueger and Andrews 2011).
Putatively unmethylated and methylated CpG sites were identified from these datasets with methylation level cutoffs at ≤20% and >90%, respectively. From the mouse dataset, 545,720 putatively methylated CpGs, and 7,140 putatively unmethylated CpGs were identified. From the human dataset, 439,114 putatively methylated CpGs were identified.
For the end density analysis, the respective CpG sites were aggregated and the normalized end density within ±1000 bp and a ±20 bp window is shown. The normalized end density is the end count divided by the median end counts of the ±1000 bp region. Fragments with any of its bases covering either the C or G of the identified CpGs were used in the calculation of the CpG methylation at these putatively unmethylated or methylated CpG sites.
G. Statistical Analysis
Analysis was performed by in-house bioinformatics programs, which were written in Perl and R languages. A P value of less than 0.05 was considered statistically significant and all probabilities were two-tailed.
Logic system 6530 may be, or may include, a computer system, ASIC, microprocessor, graphics processing unit (GPU), etc. It may also include or be coupled with a display (e.g., monitor, LED display, etc.) and a user input device (e.g., mouse, keyboard, buttons, etc.). Logic system 6530 and the other components may be part of a stand-alone or network connected computer system, or they may be directly attached to or incorporated in a device (e.g., a sequencing device) that includes detector 6520 and/or assay device 6510. Logic system 6530 may also include software that executes in a processor 6550. Logic system 6530 may include a computer readable medium storing instructions for controlling measurement system 6500 to perform any of the methods described herein. For example, logic system 6530 can provide commands to a system that includes assay device 6510 such that sequencing or other physical operations are performed. Such physical operations can be performed in a particular order, e.g., with reagents being added and removed in a particular order. Such physical operations may be performed by a robotics system, e.g., including a robotic arm, as may be used to obtain a sample and perform an assay.
System 6500 may also include a treatment device 6560, which can provide a treatment to the subject. Treatment device 6560 can determine a treatment and/or be used to perform a treatment. Examples of such treatment can include surgery, radiation therapy, chemotherapy, immunotherapy, targeted therapy, hormone therapy, and stem cell transplant. Logic system 6530 may be connected to treatment device 6560, e.g., to provide results of a method described herein. The treatment device may receive inputs from other devices, such as an imaging device and user inputs (e.g., to control the treatment, such as controls over a robotic system).
Any of the computer systems mentioned herein may utilize any suitable number of subsystems. Examples of such subsystems are shown in
The subsystems shown in
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface 81, by an internal interface, or via removable storage devices that can be connected and removed from one component to another component. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.
Aspects of embodiments can be implemented in the form of control logic using hardware circuitry (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software stored in a memory with a generally programmable processor in a modular or integrated manner, and thus a processor can include memory storing software instructions that configure hardware circuitry, as well as an FPGA with configuration instructions or an ASIC. As used herein, a processor can include a single-core processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked, as well as dedicated hardware. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present disclosure using hardware and a combination of hardware and software.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission. A suitable non-transitory computer readable medium can include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk) or Blu-ray disk, flash memory, and the like. The computer readable medium may be any combination of such devices. In addition, the order of operations may be re-arranged. A process can be terminated when its operations are completed, but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective step or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at a same time or at different times or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means of a system for performing these steps.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the disclosure. However, other embodiments of the disclosure may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
The above description of example embodiments of the present disclosure has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form described, and many modifications and variations are possible in light of the teaching above.
A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary. Reference to a “first” component does not necessarily require that a second component be provided. Moreover, reference to a “first” or a “second” component does not limit the referenced component to a particular location unless expressly stated. The term “based on” is intended to mean “based at least in part on.”
All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art. Where a conflict exists between the instant application and a reference provided herein, the instant application shall dominate.
This application claims the benefit of U.S. Provisional Patent Application No. 63/615,468, entitled “CELL-FREE DNA METHYLATION AND NUCLEASE-MEDIATED FRAGMENTATION,” filed on Mar. 1, 2022, and U.S. Provisional Patent Application No. 63/172,542, entitled “CELL-FREE DNA METHYLATION AND NUCLEASE-MEDIATED FRAGMENTATION,” filed on Apr. 8, 2021, both of which are hereby incorporated by reference in its entirety and for all purposes.
Number | Date | Country | |
---|---|---|---|
63315468 | Mar 2022 | US | |
63172542 | Apr 2021 | US |