Cell-free DNA (cfDNA) is a rich source of information that can be applied to the diagnosis and prognostication of many physiological and pathological conditions such as pregnancy and cancer (Chan, K. C. A. et al. (2017), New England Journal of Medicine 377, 513-522; Chiu, R. W. K. et al. (2008), Proceedings of the National Academy of Sciences of the United States of America 105, 20458-20463; Lo, Y. M. D. et al., (1997), The Lancet 350, 485-487). Though circulating cfDNA is now commonly used as a non-invasive biomarker and is known to circulate in the form of short fragments, the physiological factors governing the fragmentation and molecular profile of cfDNA remain elusive.
Recent works have suggested that the fragmentation of cfDNA is a non-random process associated with the positioning of nucleosomes (Chandrananda, D. et al., (2015), BMC Medical Genomics 8, 29; Ivanov, M. et al., (2015), BMC genomics 16, 51; Lo, Y. M. D. et al. (2010), Science Translational Medicine 2, 61ra91-61ra91; Snyder, M. W. et al., (2016), Cell 164, 57-68; Sun, K. et al., (2019), Genome Research 29, 418-427)). Previously, we have demonstrated that the DNASE1L3 nuclease contributes to the size profile of cfDNA in plasma (Serpas, L. et al. (2019), Proceedings of the National Academy of Sciences 116, 641-649).
Various embodiments use quantitative fragmentation information of cell-free DNA (cfDNA) for detecting a genetic disorder in a gene associated with a nuclease, for determining an efficacy of a dosage of an anticoagulant, and for monitoring an activity of a nuclease. Measured parameter values can be compared to a reference value to determine classifications of a genetic disorder, efficiency, or activity. An amount of a particular base (e.g., in an end motif) at fragment ends, an amount of a particular base at fragment ends of a particular size, or a total amount of cell-free DNA fragments (e.g., as a concentration) can be used. Certain samples may be treated with an anticoagulant, and different incubation times can be used in some embodiments,
Some embodiments are provided for detecting a genetic disorder for a gene, e.g., using an amount of a particular base at fragment ends relative to a reference value, using an amount of a particular base at fragment ends of a particular size in a sample treated with an anticoagulant, and comparing amounts of a particular base at fragment ends for samples incubated with an anticoagulant over different times.
Some embodiments are provided for determining an efficacy of a dosage of an anticoagulant, e.g., using an amount of a particular base at fragment ends in a sample of a subject administered an anticoagulant and using an amount of a particular base at fragment ends of a particular size in a sample of a subject administered an anticoagulant.
Some embodiments are provided for monitoring an activity of a nuclease, e.g., using an amount of a particular base at fragment ends in a sample relative to a reference value and using an amount of a particular base at fragment ends of a particular size in a sample.
These embodiments and other embodiments of the disclosure are described in detail below. For example, other embodiments are directed to systems, devices, and computer readable media associated with methods described herein.
A better understanding of the nature and advantages of embodiments of the present disclosure may be gained with reference to the following detailed description and the accompanying drawings.
A “tissue” corresponds to a group of cells that group together as a functional unit. More than one type of cells can be found in a single tissue. Different types of tissue may consist of different types of cells (e.g., hepatocytes, alveolar cells or blood cells), but also may correspond to tissue from different organisms (mother vs. fetus) or to healthy cells vs. tumor cells. “Reference tissues” can correspond to tissues used to determine tissue-specific methylation levels. Multiple samples of a same tissue type from different individuals may be used to determine a tissue-specific methylation level for that tissue type.
A “biological sample” refers to any sample that is taken from a subject (e.g., a human (or other animal), such as a pregnant woman, a person with cancer or other disorder, or a person suspected of having cancer or other disorder, an organ transplant recipient or a subject suspected of having a disease process involving an organ (e.g., the heart in myocardial infarction, or the brain in stroke, or the hematopoietic system in anemia) and contains one or more nucleic acid molecule(s) of interest. The biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g. of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g. thyroid, breast), intraocular fluids (e.g. the aqueous humor), etc. Stool samples can also be used. In various embodiments, the majority of DNA in a biological sample that has been enriched for cell-free DNA (e.g., a plasma sample obtained via a centrifugation protocol) can be cell-free, e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free. The centrifugation protocol can include, for example, 3,000 g×10 minutes, obtaining the fluid part, and re-centrifuging at for example, 30,000 g for another 10 minutes to remove residual cells. As part of an analysis of a biological sample, a statistically significant number of cell-free DNA molecules can be analyzed (e.g., to provide an accurate measurement) for a biological sample. In some embodiments, at least 1,000 cell-free DNA molecules are analyzed. In other embodiments, at least 10,000 or 50,000 or 100,000 or 500,000 or 1,000,000 or 5,000,000 cell-free DNA molecules, or more, can be analyzed. At least a same number of sequence reads can be analyzed.
A “sequence read” refers to a string of nucleotides sequenced from any part or all of a nucleic acid molecule. For example, a sequence read may be a short string of nucleotides (e.g., 20-150 nucleotides) sequenced from a nucleic acid fragment, a short string of nucleotides at one or both ends of a nucleic acid fragment, or the sequencing of the entire nucleic acid fragment that exists in the biological sample. A sequence read may be obtained in a variety of ways, e.g., using sequencing techniques or using probes, e.g., in hybridization arrays or capture probes as may be used in microarrays, or amplification techniques, such as the polymerase chain reaction (PCR) or linear amplification using a single primer or isothermal amplification. As part of an analysis of a biological sample, at least 1,000 sequence reads can be analyzed. As other examples, at least 10,000 or 50,000 or 100,000 or 500,000 or 1,000,000 or 5,000,000 sequence reads, or more, can be analyzed.
A sequence read can include an “ending sequence” associated with an end of a fragment. The ending sequence can correspond to the outermost N bases of the fragment, e.g., 1-30 bases at the end of the fragment. If a sequence read corresponds to an entire fragment, then the sequence read can include two ending sequences. When paired-end sequencing provides two sequence reads that correspond to the ends of the fragments, each sequence read can include one ending sequence.
A “sequence motif” may refer to a short, recurring pattern of bases in DNA fragments (e.g., cell-free DNA fragments). A sequence motif can occur at an end of a fragment, and thus be part of or include an ending sequence. An “end motif” can refer to a sequence motif for an ending sequence that preferentially occurs at ends of DNA fragments, potentially for a particular type of tissue. An end motif may also occur just before or just after ends of a fragment, thereby still corresponding to an ending sequence. A nuclease can have a specific cutting preference for a particular end motif, as well as a second most preferred cutting preference for a second end motif.
The term “alleles” refers to alternative DNA sequences at the same physical genomic locus, which may or may not result in different phenotypic traits. In any particular diploid organism, with two copies of each chromosome (except the sex chromosomes in a male human subject), the genotype for each gene comprises the pair of alleles present at that locus, which are the same in homozygotes and different in heterozygotes. A population or species of organisms typically include multiple alleles at each locus among various individuals. A genomic locus where more than one allele is found in the population is termed a polymorphic site. Allelic variation at a locus is measurable as the number of alleles (i.e., the degree of polymorphism) present, or the proportion of heterozygotes (i.e., the heterozygosity rate) in the population. As used herein, the term “polymorphism” refers to any inter-individual variation in the human genome, regardless of its frequency. Examples of such variations include, but are not limited to, single nucleotide polymorphism, simple tandem repeat polymorphisms, insertion-deletion polymorphisms, mutations (which may be disease causing) and copy number variations. The term “haplotype” as used herein refers to a combination of alleles at multiple loci that are transmitted together on the same chromosome or chromosomal region. A haplotype may refer to as few as one pair of loci or to a chromosomal region, or to an entire chromosome or chromosome arm.
A “relative frequency” (also referred to just as “frequency”) may refer to a proportion (e.g., a percentage, fraction, or concentration). In particular, a relative frequency of a particular end motif (e.g., CCGA or just a single base) can provide a proportion of cell-free DNA fragments in a sample that are associated with the end motif CCGA, e.g., by having an ending sequence of CCGA.
An “aggregate value” may refer to a collective property, e.g., of relative frequencies of a set of end motifs. Examples include a mean, a median, a sum of relative frequencies, a variation among the relative frequencies (e.g., entropy, standard deviation (SD), the coefficient of variation (CV), interquartile range (IQR) or a certain percentile cutoff (e.g. 95th or 99th percentile) among different relative frequencies), or a difference (e.g., a distance) from a reference pattern of relative frequencies, as may be implemented in clustering.
A “calibration sample” can correspond to a biological sample whose desired measured value (e.g., nuclease activity, classification of a genetic disorder, or other desired property) is known or determined via a calibration method, e.g., using other measurement techniques such as clotting measurements for effective dosage or ELISA for measuring nuclease quantity or assays quantifying the rate of DNA digestion by nucleases for measuring nuclease activity. An example measurement can involve fluorometric or spectrophotometric measurement of cfDNA quantity, which may be done on its own or before, after, and/or in real-time with, the addition of a nuclease-containing sample. Another example is using radial enzyme diffusion methods. A calibration sample can have separate measured values (e.g., an amount of fragments with a particular end motif or with a particular size) can be determined to which the desired measure value can be correlated.
A “calibration data point” includes a “calibration value” (e.g., an amount of fragments with a particular end motif or with a particular size) and a measured or known value that is desired to be determined for other test samples. The calibration value can be determined from various types of data measured from DNA molecules of the sample, (e.g., an amount of fragments with an end motif or with a particular size). The calibration value corresponds to a parameter that correlates to the desired property, e.g., classification of a genetic disorder, nuclease activity, or efficacy of anticoagulant dosage. For example, a calibration value can be determined from measured values as determined for a calibration sample, for which the desired property is known. The calibration data points may be defined in a variety of ways, e.g., as discrete points or as a calibration function (also called a calibration curve or calibration surface). The calibration function could be derived from additional mathematical transformation of the calibration data points.
A “site” (also called a “genomic site”) corresponds to a single site, which may be a single base position or a group of correlated base positions, e.g., a CpG site, TSS site, Dnase hypersensitivity site, or larger group of correlated base positions. A “locus” may correspond to a region that includes multiple sites. A locus can include just one site, which would make the locus equivalent to a site in that context.
A “cfDNA profile” may refer to the relationship of ending sequences (e.g., 1-30 bases) of cfDNA fragments (also just referred to as DNA fragments) in a sample. Various relationships can be provided, e.g., an amount of cfDNA fragments with a particular ending sequence (end motif), a relative frequency of cfDNA fragments with a particular ending sequence compared to one or more other ending sequences, as well as include other parameters, such as size. A cfDNA profile can be provided for various sizes of cfDNA fragments. Such a cfDNA profile (sometimes referred to as a cfDNA size profile) can be provided in various ways that illustrate an amount of cfDNA fragments having one or more particular ending sequences for a given size (single length or size range).
A “separation value” corresponds to a difference or a ratio involving two values, e.g., two fractional contributions or two methylation levels. The separation value could be a simple difference or ratio. As examples, a direct ratio of x/y is a separation value, as well as x/(x+y). The separation value can include other factors, e.g., multiplicative factors. As other examples, a difference or ratio of functions of the values can be used, e.g., a difference or ratio of the natural logarithms (ln) of the two values. A separation value can include a difference and a ratio.
A “separation value” and an “aggregate value” (e.g., of relative frequencies) are two examples of a parameter (also called a metric) that provides a measure of a sample that varies between different classifications (states), and thus can be used to determine different classifications. An aggregate value can be a separation value, e.g., when a difference is taken between a set of relative frequencies of a sample and a reference set of relative frequencies, as may be done in clustering.
The term “classification” as used herein refers to any number(s) or other characters(s) that are associated with a particular property of a sample. For example, a “+” symbol (or the word “positive”) could signify that a sample is classified as having deletions or amplifications. The classification can be binary (e.g., positive or negative) or have more levels of classification (e.g., a scale from 1 to 10 or 0 to 1).
The terms “cutoff” and “threshold” refer to predetermined numbers used in an operation. For example, a cutoff size can refer to a size above which fragments are excluded. A threshold value may be a value above or below which a particular classification applies. Either of these terms can be used in either of these contexts. A cutoff or threshold may be “a reference value” or derived from a reference value that is representative of a particular classification or discriminates between two or more classifications. Such a reference value can be determined in various ways, as will be appreciated by the skilled person. For example, metrics can be determined for two different cohorts of subjects with different known classifications, and a reference value can be selected as representative of one classification (e.g., a mean) or a value that is between two clusters of the metrics (e.g., chosen to obtain a desired sensitivity and specificity). As another example, a reference value can be determined based on statistical simulations of samples. A particular value for a cutoff, threshold, reference, etc. can be determined based on a desired accuracy (e.g., a sensitivity and specificity).
A “level of pathology” (or level of a disorder) can refer to the amount, degree, or severity of pathology associated with an organism. An example is a cellular disorder in expressing a nuclease. Another example of pathology is a rejection of a transplanted organ. Other example pathologies can include autoimmune attack (e.g., lupus nephritis damaging the kidney or multiple sclerosis), inflammatory diseases (e.g., hepatitis), fibrotic processes (e.g. cirrhosis), fatty infiltration (e.g. fatty liver diseases), degenerative processes (e.g. Alzheimer's disease) and ischemic tissue damage (e.g., myocardial infarction or stroke). A heathy state of a subject can be considered a classification of no pathology. The pathology can be cancer.
The term “level of cancer” can refer to whether cancer exists (i.e., presence or absence), a stage of a cancer, a size of tumor, whether there is metastasis, the total tumor burden of the body, the cancer's response to treatment, and/or other measure of a severity of a cancer (e.g. recurrence of cancer). The level of cancer may be a number or other indicia, such as symbols, alphabet letters, and colors. The level may be zero. The level of cancer may also include premalignant or precancerous conditions (states). The level of cancer can be used in various ways. For example, screening can check if cancer is present in someone who is not previously known to have cancer. Assessment can investigate someone who has been diagnosed with cancer to monitor the progress of cancer over time, study the effectiveness of therapies or to determine the prognosis. In one embodiment, the prognosis can be expressed as the chance of a patient dying of cancer, or the chance of the cancer progressing after a specific duration or time, or the chance or extent of cancer metastasizing. Detection can mean ‘screening’ or can mean checking if someone, with suggestive features of cancer (e.g. symptoms or other positive tests), has cancer.
The term “about” or “approximately” can mean within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term “about” or “approximately” can mean within an order of magnitude, within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed. The term “about” can have the meaning as commonly understood by one of ordinary skill in the art. The term “about” can refer to ±10%. The term “about” can refer to ±5%.
Cell-free DNA (cfDNA) is a powerful non-invasive biomarker for cancer and prenatal testing and circulates in plasma (as well as other cell-free samples) as short fragments. In this disclosure, we investigated the respective roles of DNASE1, DNASE1L3, and DNA fragmentation factor subunit beta (DFFB, also known as Caspase-Activated DNase) in cfDNA fragmentation. To elucidate the biology of cfDNA fragmentation, we analyzed the roles of DNASE1, DNASE1L3, and DNA fragmentation factor subunit beta (DFFB) with mice deficient in each of these nucleases.
In an example analysis, we compared the cfDNA profiles (including cfDNA size profiles) between mice deficient in each type of nuclease and their wildtype counterparts, including the ending base of cfDNA fragments. The ending base of a DNA fragment is a type of end motif, and measurements of relative amounts (e.g., proportions) of cfDNA fragments ending with a particular base can provide information about cfDNA fragments, the source of cfDNA fragments related to the tissue nuclease activity, nucleases function, and disorders affecting nucleases. We found that each nuclease served a different but complementary role in cfDNA fragmentation.
By analyzing the ends of cfDNA fragments in each type of nuclease-deficient mice with those in wildtype mice, we show that each nuclease has a specific cutting preference (e.g., a particular end motif) that reveals the stepwise process of cfDNA fragmentation. We demonstrate that cfDNA is generated first intracellularly with DFFB, intracellularly with DNASE1L3, and other nucleases. Then, cfDNA fragmentation continues extracellularly with circulating DNASE1L3 and DNASE1. With the use of heparin to disrupt the nucleosomal structure, we also showed that the 10 bp periodicity originated from the cutting of DNA within an intact nucleosomal structure. Altogether, this disclosure establishes a model of cfDNA fragmentation.
Various embodiments are provided for detecting a genetic disorder in a gene associated with a nuclease, for determining an efficacy of a dosage of an anticoagulant, and for monitoring an activity of a nuclease.
Various techniques are provided for detecting a genetic disorder for a gene, e.g., using an amount of a particular base at fragment ends relative to a reference value, using an amount of a particular base at fragment ends of a particular size in a sample treated with an anticoagulant, and comparing amounts of a particular base at fragment ends for samples incubated with an anticoagulant over different times.
Various techniques are provided for determining an efficacy of a dosage of an anticoagulant, e.g., using an amount of a particular base at fragment ends in a sample of a subject administered an anticoagulant and using an amount of a particular base at fragment ends of a particular size in a sample of a subject administered an anticoagulant.
Various techniques are provided for monitoring an activity of a nuclease, e.g., using an amount of a particular base at fragment ends in a sample relative to a reference value and using an amount of a particular base at fragment ends of a particular size in a sample.
An end motif relates to the ending sequence of a cell-free DNA fragment, e.g., the sequence for the K bases at either end of the fragment. The ending sequence can be a k-mer having various numbers of bases, e.g., 1, 2, 3, 4, 5, 6, 7, etc. The end motif (or “sequence motif”) relates to the sequence itself as opposed to a particular position in a reference genome. Thus, a same end motif may occur at numerous positions throughout a reference genome. The end motif may be determined using a reference genome, e.g., to identify bases just before a start position or just after an end position. Such bases will still correspond to ends of cell-free DNA fragments, e.g., as they are identified based on the ending sequences of the fragments.
As shown in
At block 120, the DNA fragments are subjected to paired-end sequencing. In some embodiments, the paired-end sequencing can produce two sequence reads from the two ends of a DNA fragment, e.g., 30-120 bases per sequence read. These two sequence reads can form a pair of reads for the DNA fragment (molecule), where each sequence read includes an ending sequence of a respective end of the DNA fragment. In other embodiments, the entire DNA fragment can be sequenced, thereby providing a single sequence read, which includes the ending sequences of both ends of the DNA fragment. The two ending sequences at both ends can still be considered paired sequence reads, even if generated together from a single sequencing operation.
At block 130, the sequence reads can be aligned to a reference genome. This alignment is to illustrate different ways to define a sequence motif, and may not be used in some embodiments. For example, the sequences at the end of a fragment can be used directly without needing to align to a reference genome. However, alignment can be desired to have uniformity of an ending sequence, which does not depend on variations (e.g., SNPs) in the subject. For instance, the ending base could be different from the reference genome due to a variation or a sequencing error, but the base of in the reference may be the one counted. Alternatively, the base on the end of the sequence read can be used, so as to be tailored to the individual. The alignment procedure can be performed using various software packages, such as (but not limited to) BLAST, FASTA, Bowtie, BWA, BFAST, SHRiMP, SSAHA2, NovoAlign, and SOAP.
Technique 140 shows a sequence read of a sequenced fragment 141, with an alignment to a genome 145. With the 5′ end viewed as the start, a first end motif 142 (CCCA) is at the start of sequenced fragment 141. A second end motif 144 (TCGA) is at the tail of the sequenced fragment 141. When analyzing the end predominance of cfDNA fragments, this sequence read would contribute to a C-end count for the 5′ end. Such end motifs might, in one embodiment, occur when an enzyme recognizes CCCA and then makes a cut just before the first C. If that is the case, CCCA will preferentially be at the end of the plasma DNA fragment. For TCGA, an enzyme might recognize it, and then make a cut after the A. When a count is determined for the A, this sequence read would contribute to an A-end count.
Technique 160 shows a sequence read of a sequenced fragment 161, with an alignment to a genome 165. With the 5′ end viewed as the start, a first end motif 162 (CGCC) has a first portion (CG) that occurs just before the start of sequenced fragment 161 and a second portion (CC) that is part of the ending sequence for the start of sequenced fragment 161. A second end motif 164 (CCGA) has a first portion (GA) that occurs just after the tail of sequenced fragment 161 and a second portion (CC) that is part of the ending sequence for the tail of sequenced fragment 161. Such end motifs might, in one embodiment, occur when an enzyme recognizes CGCC and then makes a cut just before the G and the C. If that is the case, CC will preferentially be at the end of the plasma DNA fragment with CG occurring just before it, thereby providing an end motif of CGCC. As for the second end motif 164 (CCGA), an enzyme can cut between C and G. If that is the case, CC will preferentially be at the end of the plasma DNA fragment. For technique 160, the number of bases from the adjacent genome regions and sequenced plasma DNA fragments can be varied and are not necessarily restricted to a fixed ratio, e.g., instead of 2:2, the ratio can be 2:3, 3:2, 4:4, 2:4, etc.
The higher the number of nucleotides included in the cell-free DNA end signature, the higher the specificity of the motif because the probability of having 6 bases ordered in an exact configuration in the genome is lower than the probability of having 2 bases ordered in an exact configuration in the genome. Thus, the choice of the length of the end motif can be governed by the needed sensitivity and/or specificity of the intended use application.
As the ending sequence is used to align the sequence read to the reference genome, any sequence motif determined from the ending sequence or just before/after is still determined from the ending sequence. Thus, technique 160 makes an association of an ending sequence to other bases, where the reference is used as a mechanism to make that association. A difference between techniques 140 and 160 would be to which two end motifs a particular DNA fragment is assigned, which affects the particular values for the relative frequencies. But, the overall result (e.g., detecting a genetic disorder, determining efficacy of a dosage, monitoring activity of a nuclease, etc.) would not be affected by how the a DNA fragment is assigned to an end motif, as long as a consistent technique is used, e.g., for any training data to determine a reference value, as may occur using a machine learning model.
The counted numbers of DNA fragments having an ending sequence corresponding to a particular end motif (e.g., a particular base) may be counted (e.g., stored in an array in memory) to determine an amount of the particular end motif. The amount can be measured in various ways, such as a raw count or a frequency, where the amount is normalized. The normalization may be done using (e.g., dividing by) a total number of DNA fragments or a number in a specified group of DNA fragments (e.g., from a specified region, having a specified size, or having one or more specified end motifs). Differences in amounts of end motifs have been detected when a genetic disorder exists, as well as when an effective dose of an anticoagulant has been administered, as well as when the activity of a nuclease changes (e.g., increases or decreased).
Circulating cfDNA can be found directly from a sample obtained from a subject, e.g., blood or plasma. Such circulating cfDNA exists in cell-free form in the body. Thus, the cell-free DNA was produced (e.g., via apoptosis or necrosis) from cells within the body, and then the cell-free DNA began to circulate (e.g., in blood). In contrast, fresh cfDNA is obtained from cells from the body, and then the cell-free DNA is generated while the cell is outside the body, e.g., by having the cell die in any of various ways, such as incubation. Differences in preferred ending sequence(s) were observed.
A. C-End Preference in Typical Circulating cfDNA
We analyzed the base content proportions at the 5′ end of cfDNA fragments in different genomic regions in wildtype (WT) mice to test the hypothesis that cfDNA fragmentation is not random. For blood samples, EDTA can be used as an anticoagulant and inhibit plasma nucleases to preserve the size profile, frequencies of end motifs, and the concentration of cell-free DNA relatively close to an initial state when kept at cool temperatures, e.g., standard refrigerator temperatures, such as between −5° C. to 20° C. If incubated at a higher temperature (e.g., room temperature), fresh cfDNA will be generated at an amount dependent on the amount of incubation time. A time of 0 indicates that no incubation at room temperature.
1. Defining Base Content Percentage for End Motif of Fragments
A vertical line 260 illustrates how a percentage is determined for each position. The percentage is of reads labeled with a particular base, which as mentioned above, corresponds to the ending base at the 5′ end. Thus, the calculation of the percentage at a given position uses all of the fragments that end at that position. In
2. End Base Content Relative to General Base Content of Reference
Accordingly, this pattern of asymmetric representation was also seen in cfDNA aligning to TSS and Pol II regions. Because CTCF regions contain an array of well-positioned nucleosomes flanking the CTCF binding site and because TSS and Pol II regions are known open chromatin regions, both nucleosomal and open regions of the genome display the same C-end overrepresentation.
3. End Base Content for Different Fragment Sizes
B. Fragmentation Pattern in Fresh cfDNA (e.g., for DFFB)
Fresh DNA can be obtained from cells in a whole blood sample, where the cells are caused to die by incubating the whole blood at room temperature in EDTA for a period of time. In this manner, the resulting plasma sample can be enriched for fresh DNA.
We explored whether, or not, this typical cfDNA profile (i.e., as shown in previous section) was created ‘as is’ from cellular sources, or produced after further digestion within the plasma. Thus, we sought to capture and analyze cfDNA that was freshly generated from dying cells and to compare its profile with the typical C-end predominant cfDNA profile that are shown above.
1. Changes in Amounts of cfDNA with Incubation
Such behavior in
2. A-End and G-End Preference in Fresh cfDNA
Besides an increase in cfDNA as a result of incubation with EDTA, changes in base end content was also investigated. The incubation of the blood sample with EDTA results in increases to the A-end and G-end content relative to the typical base end content in blood samples that have not been incubated. This increase is seen in various regions, including random regions, CTCF regions, TSS regions, and Pol II regions.
Therefore, fresh cfDNA after whole blood incubation were enriched for A- and G-end fragments when compared to typical cfDNA. Since the fresh cfDNA profile from dying cells does not appear similar to the typical C-end predominant cfDNA found in baseline samples, we inferred that the typical C-end predominant cfDNA would be created in a subsequent step. Since the fragment end preference (e.g., for enrichment of A-ends) after incubation is different (e.g., A-end vs C-end), we also reasoned that the generation of fresh cfDNA likely originated from a different mechanism than that which created the typical cfDNA. The enrichment for A-ends occurs in longer cfDNA as shown in later sections.
3. A-Ends and G-Ends Among Fresh cfDNA of Different Sizes
We also explored the base end preference by fragment size. We identified fragments by their two end nucleotides and analyzed the fragments in which both ends terminated with A, G, C, or T. These fragments where both ends were identified were denoted with their end nucleotides and the symbol < > in between, such that a fragment with both ends as A would be designated as A< >A. We compared the proportional representation of A< >A, G< >G, C< >C, and T< >T fragments among different sizes reasoning that any preference for cutting a particular nucleotide would be most well-visualized with these fragment types where both ends encompassed the same nucleotide preference. Of these four types of fragments, 6 h samples enriched with fresh cfDNA had a significantly higher proportion of A< >A fragments in sizes >150 bp and increased further in long fragments≥250 bp. On the other hand, G< >G, C< >C, and T< >T fragments did not differ significantly by size. Thus, fresh cfDNA was enriched for A-end fragments that were longer than 150 bp.
Surprisingly, the increase in long A-end fragments was concentrated at specific size ranges, with peaks at ˜200 bp and 400 bp that were reminiscent of nucleosomal ladder sizes. G-end fragments also had a similar but weaker periodicity at these sizes. We hypothesized that these A-end (and G-end) cfDNA fragments were likely created by cleaving between nucleosomes, such that the full length of an intact nucleosomal DNA was retained. The peaks in periodicity would support a true preference for cutting at the inter-nucleosomal regions 5′ to an A with a slightly smaller preference for cutting 5′ to a G.
4. Effects of DFFB on cfDNA with A-Ends
Since A-end long fragments were generated freshly from dying cells, we examined the role of apoptosis in their generation. Since DFFB is the major intracellular nuclease involved in DNA fragmentation during apoptosis, we investigated samples from Dffb-deficient mice, which have that gene knocked out in both alleles, signified by Dffb−/−.
We further investigated the overall change in cfDNA after incubation and for fragment size, as well as for different regions. There was essentially no change after incubation.
If the change in
While the above analysis characterizes the end base content and size profiles of freshly generated cfDNA, this section analyzes the process in which the typical C-end predominance was produced in plasma cfDNA. This clear preference for C-ends in all sizes of circulating cfDNA fragments seen in
In
These results suggest that DNASE1L3 generates both C- and T-end fragments, with a greater preference for C-ends since C< >C fragment percentages are more significantly reduced.
Hence, it appeared that DNASE1L3 deficiency resulted in exposing the profile of fresh cfDNA. In a substrate-enzyme-product relationship, when the enzyme is deficient, the product would decrease and the substrate would increase. Thus, DNASE1L3-deficient cfDNA seemed to have revealed its substrate cfDNA profile, which appeared to be the cfDNA profile created by DFFB. This suggests that at least some cutting by DNASE1L3 occurs in circulating blood while DFFB cutting tends to occur within the cell.
With a more detailed look at the fragment types using both ends of a cfDNA fragment, we found that only A< >A, A< >G, and A< >C fragments demonstrated this nucleosomal periodic pattern in both Dnase1l3-deficient samples and WT EDTA 6 h samples enriched with fresh cfDNA.
There were a number of notable differences between the fragments of these two sample types. In Dnase1l3-deficient mice, the periodic pattern of the A< >A, A< >G, and A< >C fragments was very prominent (
On the other hand, the periodic pattern seen in the fresh cfDNA was attenuated, which was especially noticeable amongst A< >C fragments (
While we have demonstrated the steps involved in creating a typical cfDNA fragment with C-end predominance, we also explore how a cfDNA fragment might be further digested, so that a full picture of the homeostasis of cfDNA can be constructed. While C-end fragments continue to be the most prevalent even in short fragments<150 bp, we noted an enrichment of T-end fragments in sizes ˜50-150 bp and ˜250 bp in the typical cfDNA profile (
A. Effect of Deletion in Dnase1
To identify DNASE1's cutting preference, we collected whole blood from Dnase1−/−, Dnase1−/−, and WT mice, pooled the samples within a type, and equally distributed each pool into tubes for 0 h or 6 h incubation with heparin. Heparin was used instead of EDTA since it is known to enhance DNASE1 activity while inhibiting DNASE1L3 (Napirei, M. et al., (2005), The Biochemical journal 389, 355-364). Heparin has also been shown to displace nucleosomes.
To show that this effect is due to Dnase1, the blue curve 1720 (WT heparin 6 h) can be compared to the red curve 1740 (Dnase1−/−, which is plasma collected from mice with homozygous knockout of Dnase1). When Dnase1 is not present, there is no increase in the very short DNA molecules. And there is still an emergence (although less) of the very short DNA molecules in the green curve 1730 for Dnase1+/−, which is heterozygous such that only one allele has the gene missing. The logarithmic plot helps to show the change in the amounts of longer fragments.
Accordingly, embodiments can detect a disorder in Dnase1 (e.g., a deletion) by treating a sample with heparin and comparing the sample to a WT size distribution.
We also examined these samples for a difference in fragment end proportions.
In WT and Dnase1+/− mice after 6 h heparin incubation, T-end fragment proportions increased in fragments sized ˜50-150 bp (
Combining the increase in cfDNA amount in all three genotypes with the literature on heparin incubation inducing apoptosis (Manaster, J. et al., (1996), British Journal of Haematology 94, 48-52), the presence of the A-end DFFB signature from freshly apoptotic cfDNA was consistent. An increase of cfDNA with fresh A-end fragments from DFFB were quickly digested to short T-end fragments (due to heparin enhancement of DNASE1 in WT mice), suggesting that DNASE1 preferred to cut 5′ to T.
B. Periodicity from Fragments Cut from Nucleosomes
We analyzed the periodicity of fragments with EDTA, heparin, and varying time of incubation. The results are consistent with DNASE1 having a preference to cut T-ends, and with heparin disrupting the nucleosome structure in plasma.
Other than the C-end preference for all cfDNA sizes, there was no particular end preference related to the 10 bp period fragments. Thus, it would be unlikely that a single particular nuclease would be responsible for the 10 bp periodicity. In fact, the prevailing theory for the 10 bp periodicity is that the 10 bp periodicity is a result of nuclease digestion of DNA within an intact nucleosome. This was postulated from the combined effect of restricted nuclease access to the DNA wrapped around histones with the periodic exposure of one strand of DNA over the other due to 10 bp per turn structure of the DNA helix (Klug, A., and Lutter, L. C. (1981), Nucleic Acids Res 9, 4267-4283).
A CTCF region is special in that the nucleosomal spacing is very clear. Looking at the gray lines 2320 (EDTA and heparin with no incubation), there is a very good periodicity, but the wave pattern is reduced in the presence of heparin (red line 2310), which disrupts the nucleosomal structure so that cutting may occur at places in the nucleosomal DNA that are usually relatively inaccessible. Accordingly, at the well-phased nucleosomes in the CTCF region, fragment ends within the nucleosome increase with heparin 6 h incubation in WT. Thus, the disrupted nucleosome structure (as a result of heparin incubation) resulted in intra-nucleosomal DNA being cut.
Since the linker areas are already cut by other enzymes (C/G/A ends) and the T-cutting enzyme is a weak competitor, the linker regions are still richer in C/G ends compared with T ends. (This internucleosomal cutting in the cell is still guided by the presence of nucleosomes). However, once the nucleosomes are in plasma and exposed to heparin, the structure gets disrupted, and then the intranucleosomal regions can be cut by the heparin-enhanced DNASE1 with a large T-end preference.
The other bases (i.e., not T) in
The above observations allow a determination of the base end cutting preferences for DFFB, DNASE1, and DNASE1L3, as well as whether the nucleases have a prevalence for cutting within a cell or within an extracellular environment, such as plasma.
From this work on cfDNA fragment ends in different mouse models, we can piece together a model outlining the fragmentation process that generated cfDNA. In our analysis of the newly released cfDNA spontaneously created after incubating whole blood in EDTA, we have demonstrated that the fresh longer cfDNA are enriched for A-end fragments. In particular, A< >A, A< >G, and A< >C fragments demonstrate a strong nucleosomal periodicity at ˜200 bp and 400 bp. When this same experimental model is applied to the whole blood of Dffb-deficient mice, no long A-end fragment enrichment is seen. Thus, we can conclude that DFFB is likely responsible for generating these A-end fragments.
This hypothesis is substantiated by literature published on the DFFB enzyme, which plays a major role in DNA fragmentation during apoptosis (Elmore, S. (2007), Toxicologic pathology 35, 495-516; Larsen, B. D. and Sorensen, C. S. (2017), The FEBS Journal 284, 1160-1170). Enzyme characterization studies have shown that DFFB creates blunt double-strand breaks in open internucleosomal DNA regions with a preference for A and G nucleotides (purines) (Larsen, B. D. and Sorensen, C. S. (2017), The FEBS Journal 284, 1160-1170; Widlak, P., and Garrard, W. T. (2005), Journal of cellular biochemistry 94, 1078-1087; Widlak, P. et al., (2000), The Journal of biological chemistry 275, 8226-8232)). This biology of blunt double-stranded cutting only at internucleosomal linker regions would explain the nucleosomal patterning in A< >A, A< >G, and A< >C fragments, e.g., as exemplified by
In this work, we have also demonstrated that typical cfDNA in plasma obtained before incubation predominantly end in C across all fragment sizes; this C-end overrepresentation is consistent in multiple different regions across the genome. Because the typical profile of cfDNA is so different from fresh cfDNA, we can infer that 1) one or more other nucleases (i.e., other than DFFB) create(s) this profile, 2) this nuclease or these nucleases dominate(s) the cleaving process in typical cfDNA, and 3) this process largely occurs after the generation of fresh A-end fragments (e.g., from DFFB).
Since this C-end predominance is lost in Dnase1l3-deficient mice, we believe that one nuclease responsible for creating this C-end fragment overrepresentation is DNASE1L3. While there is no existing enzymatic study that investigates the specific nucleotide cleavage preference of DNASE1L3, DNASE1L3 is known to cleave chromatin with high efficiency to almost undetectable levels without proteolytic help (Napirei, M. et al., (2009), The FEBS Journal 276, 1059-1073); Sisirak, V. et al. (2016), Cell 166, 88-101). The fairly uniform abundance of C-end fragments among all fragment sizes suggests that DNASE1L3 can cleave all DNA, even intranucleosomal DNA efficiently.
DNASE1L3 has interesting properties: it is expressed in the endoplasmic reticulum to be secreted extracellularly as one of the major serum nucleases, and it translocates to the nucleus upon cleavage of its endoplasmic reticulum-targeting motif after apoptosis is induced (Errami, Y. et al. (2013), The Journal of biological chemistry 288, 3460-3468); Napirei, M. et al., (2005), The Biochemical journal 389, 355-364)). In its role as an apoptotic intracellular endonuclease, it has been suggested that DNASE1L3 cooperates with DFFB in DNA fragmentation (Errami, Y. et al. (2013), The Journal of biological chemistry 288, 3460-3468); Koyama, R. et al., (2016), Genes to Cells 21, 1150-1163)). When comparing the fragment end profiles of fresh cfDNA (e.g., in
As a plasma nuclease, DNASE1L3 would help digest the DNA in circulation that had escaped phagocytosis after apoptosis. Hence, DNASE1L3 would likely exert its effect on fragmented cfDNA after intracellular fragmentation had occurred. In a two-step process, inhibiting the second step should reveal the usually transient outcome of the first step (i.e., the intracellular fragmentation). The plasma of Dnase1l3-deficient mice would have this second step of DNASE1L3 action inhibited and expose the cfDNA profile of the first step, the intracellular DNA fragmentation from apoptosis. This is exactly what we found, with the cfDNA fragment profile of Dnase1l3-deficient mice (e.g.,
While we previously found that the size profile of cfDNA from Dnase1-deficient mice did not appear to be substantially different from that of WT mice (
The use of heparin incubation and end analysis have also provided a unique insight into the origin of the 10 bp periodicity. Since every fragment type demonstrates a 10 bp periodicity (
Recently, Watanabe et al. induced in vivo hepatocyte necrosis and apoptosis with acetaminophen overdose and anti-Fas antibody treatments in mice deficient in Dnase1L3 and Dffb (Watanabe, T. et al., (2019), Biochemical and biophysical research communications 516, 790-795). While Watanabe et al. claims to have shown that cfDNA is generated by DNASE1L3 and DFFB, their data only shows that serum cfDNA does not appear to increase after hepatocyte injury in Dnase1l3- and Dffb-double knockout mice. Even then, the degree of hepatocyte injury from their methods is hugely variable even in wildtype with surprisingly low correlation with cfDNA amount in their apoptotic anti-Fas antibody experiments. In addition to these inconsistencies that gives uncertainty to the degree of apoptosis induced in their knockout mice, they have none of the detail on fragment ends offered in this study.
In this study, we have demonstrated that the typical cfDNA fragment might be created in two major steps: 1) intracellular DNA fragmentation by DFFB, intracellular DNASE1L3, and other apoptotic nucleases, and 2) extracellular DNA fragmentation by serum DNASE1L3. Then, likely with in vivo proteolysis, DNASE1 can further degrade cfDNA into short T-end fragments (compare difference T-end graphs between
With this link between nuclease biology and cfDNA physiology established, there are many important and practical implications to the field of cfDNA. Firstly, aberrations in nuclease biology with pathological consequences may be reflected in abnormal cfDNA profiles (Al-Mayouf et al. (2011), Nat Genet 43, 1186-1188; Jimenez-Alcazar, M. et al. (2017), Science (New York, N.Y.) 358, 1202-1206; Ozcakar, Z. B. et al., (2013), Arthritis Rheum 65, 2183-2189)). Secondly, plasma end motif analysis is a powerful approach for investigating cfDNA biology and may have diagnostic applications. And lastly, the pre-analytical variables such as anticoagulant type and time delay in blood separation are vital confounders to bear in mind when mining cfDNA for epigenetic and genetic information. Example applications for such cfDNA profiling are described below.
Additionally, even though the data is provided for mice, such biological functionality is common to all organisms that have blood or other cell-free samples.
As described above, various techniques can be used to detect genetic disorders, e.g., associated with a nuclease. The genetic disorders can relate to a mutation (e.g., a deletion) of a nuclease corresponding to a particular gene. Such a mutation can cause the nuclease to not exist or to function in an irregular manner. A normal/reference cfDNA profile (e.g., by fragment ends and/or by size) can be determined for when the genetic disorder does not exist, and a comparison can be made for a new sample. The normal/reference cfDNA profiles can be determined from other subjects or for the same subject, but with different conditions (e.g., sample taken at an earlier time or with a different amount of incubation). Examples of such methods are described in the following flowcharts. Techniques described for one flowchart are applicable to other flowcharts, and are not repeated for the sake of being concise.
A. Detecting Genetic Disorder Using Incubation Over Time
Different amounts of incubation of a sample can result in different cfDNA profiles depending on whether the genetic disorder exists. As a particular cfDNA profile behavior can depend on whether a particular nuclease expressed and functioning properly, a change in such behavior from normal can indicate the genetic disorder exists.
At block 2710, first sequence reads are obtained from sequencing first cell-free DNA fragments in a first biological sample of a subject are received. Example biological samples are provided herein, e.g., blood, plasma, serum, urine, and saliva. The sequencing may be performed in various ways, e.g., as described herein. Example sequencing techniques include massively parallel sequencing or next-generation sequencing, using single molecule sequencing, and/or using double- or single-stranded DNA sequencing library preparation protocols. The skilled person will appreciate the variety of sequencing techniques that may be used. As part of the sequencing, it is possible that some of the resulting sequence reads may correspond to cellular nucleic acids.
The sequencing may be targeted sequencing as described herein. For example, a biological sample can be enriched for DNA fragments from a particular region, such as CTCF regions, TSS regions, Dnase hypersensitivity sites, or Pol II regions. The enriching can include using capture probes that bind to a portion of, or an entire genome, e.g., as defined by a reference genome. As another example, the enriching can use primers to amplify (e.g., via PCR, rolling circle amplification, or multiple displacement amplification (MDA) certain regions of the genome.
The first biological sample can be treated with an anticoagulant and incubated for a first length of time. The incubation can be at a certain temperature or higher, e.g., above 5°, 10°, 15°, 20°, 25°, or 30° Celsius. Storage at lower temperatures may not count as part of the incubation time. The first length of time can be zero. In other implementations, the first biological sample is incubated for the first length of time without being treated with an anticoagulant. As examples, the anticoagulant can be EDTA or heparin. The EDTA can help to inhibit plasma nucleases (e.g., DNASE1 and DNASE1L3) to preserve cfDNA for analysis.
At block 2720, the first sequence reads are used to determine a first amount of the first cell-free DNA fragments that end with a particular base. The particular base can be determined by identifying an end of the first sequence read corresponding to an end of the fragment, which for paired end sequence can be determined using an orientation of the of the read (e.g., the first base sequenced). A particular fragment end can be used, e.g., the 5′ end or the 3′ end. The first amount can be determined for a particular end motif that includes the particular base. Thus, the first amount can be for a particular ending sequence that may be for more than one base. The first amount is an example of a parameter value.
In some embodiments, the first amount can be for DNA fragments that have a first end motif (e.g., a first base) at one end of the fragment and that have a second end motif (e.g., a second base) at the other end of the fragment.
In some implementations, the first cell-free DNA fragments are filtered before determining the first amount, e.g., only fragments from a certain region (e.g., CTCF) may be used to determine the first amount. The first sequence reads may be aligned to a reference genome. Then, a first set of sequence reads can be identified that end at a particular location or at a specified distance from the particular location in the reference genome, where the particular location corresponds to a particular coordinate or a genomic position with a specified property in the reference genome. The first amount can then be determined as an amount of the first set of sequence reads that end with the particular base. The genomic position can be a center of a CTCF region. As other examples, genomic positions can be associated with open chromatin regions, Pol II regions, TSS regions, and/or hypersensitive sites for a particular enzyme (e.g., a particular DNase).
At block 2730, second sequence reads obtained from sequencing second cell-free DNA fragments in a second biological sample of the subject are received. The second biological sample can be treated with the anticoagulant and incubated for a second length of time that is greater than the first length of time. In other implementations, the second biological sample can be incubated without being treated by the anticoagulant. The length of time can include a temperature factor, e.g., a higher temperature can act as a weighting factor multiplied by a time unit to obtain the length of time. In this manner, a greater/same amount of cell death can occur in a sample/shorter amount of time due to the incubation at a higher temperature.
At block 2740, the second sequence reads are used to determine a second amount of the second cell-free DNA fragments that end with the particular base. In some implementations, the first amount and the second amount are of cell-free DNA fragments having both ends with the particular base. The second amount can also be determined for a particular end motif that includes the particular base. Thus, the second amount can be for a particular ending sequence that may be for more than one base. In some embodiments, the first amount can be for DNA fragments that have a first end motif (e.g., a first base) at one end of the fragment and that have a second end motif (e.g., a second base) at the other end of the fragment.
The amounts can be determined as a percentage, also referred to herein as a base content or a frequency. In other implementations, the amounts can be raw amounts that are not directly normalized using (e.g., dividing by) a measured amount of DNA fragments (e.g., as measured by sequence reads). Instead, indirect normalization can occur by using a same size sample or by sequencing a same number of DNA fragments for the two samples.
The amounts can relate to sizes of the DNA fragments. For instance, the first sequence reads can be used to determine first sizes of the first cell-free DNA fragments that end with the particular base or larger end motif. The first amount can be determined using a first set of the first cell-free DNA fragments having a particular size. The second sequence reads can be used to determine second sizes of the second cell-free DNA fragments that end with the particular base or larger end motif. The second amount can be determined using a second set of the second cell-free DNA fragments having the particular size. The particular size can be a size range. Example uses of size can be found in
At block 2750, the first amount is compared to the second amount to determine a classification of whether the gene exhibits the genetic disorder in the subject. In some implementations, comparing the first amount to the second amount includes determining whether the first amount differs from the second amount by at least a threshold amount, and can include which amount is larger than the other when there is a statistically significant difference or other separation value. Accordingly, the classification can be that the genetic disorder exists when the first amount is within a threshold of the second amount.
In some embodiments, the comparison of the amounts can include determining a separation value between the first amount and the second amount. The separation value can be compared to a reference value (e.g., a cutoff) to determine the classification. The reference value can be a calibration value determined using calibration (reference) samples, which have known classifications and can be analyzed collectively to determine a reference value or calibration function (e.g., when the classifications are continuous variables). The first amount and second amounts are examples of a parameter value that can be compared to a reference/calibration value. Such techniques can be used for all methods herein, and further details are provided in other sections.
The classification can be a level or severity of the disorder, e.g., from whether a coding gene for the nuclease is missing in both chromosomes, in only one chromosome, are missing in only certain tissue, or the mutation reduces expression but does not eliminate the existence of the nuclease. Such a partial reduction in the expression of the nuclease can occur when the mutation (e.g., a deletion) is only in certain tissue or when the mutation is within a supporting region, e.g., in a non-coding region such as miRNA that affects the level of expression of the nuclease. The different levels or severity of the genetic disorder, as a result of differing amounts of difference relative to the reference level. Multiple reference levels can be used to determine the difference classifications.
In some examples, when the first amount is within a threshold of the second amount, the classification can be that the genetic disorder exists, e.g., as in
In other examples, when the second amount is less than the first amount by at least a threshold (e.g., for T-ends), the classification can be that the genetic disorder exists, e.g., as in
In other examples, both the WT and the mutation can cause a same change (e.g., an increase or a decrease) of DNA fragments with a particular end motif, but the amount of change can be different. For example,
The type of genetic disorder being tested can provide the type of criteria used for determining whether the disorder exists, as the cfDNA behavior will be different.
As an example, the genetic disorder can include a deletion of the gene. As examples, the genes can be DFFB, DNASE1L3, or DNASE1. The nuclease can be one that cuts intracellular DNA, e.g., DFFB or DNASE1L3. The nuclease can be one that cuts extracellular DNA, e.g., DNASE1 or DNASE1L3.
B. Detecting Genetic Disorder Using Reference Value
As described above, a difference or other separation value (e.g., whether small or large) in a particular base content between samples with different incubations can be used to classify a genetic disorder for a gene associated with a nuclease. Alternatively, the measured amount of a particular base can be compared to a reference value. Such a reference value can correspond to the amount of the particular base measured in a healthy subject.
For instance, a comparison of
Another example can be seen in
At block 2810, first sequence reads obtained from sequencing first cell-free DNA fragments in a first biological sample of a subject are received. The sequencing may be performed in various ways, e.g., as described herein. The first biological sample can be treated with an anticoagulant and incubated for at least a specified amount of time, e.g., as described for
At block 2820, the first sequence reads are used to determine a first amount of the first cell-free DNA fragments that end with a particular base. Similar techniques as used for block 2720 may be used in block 2820. For example, certain sizes of sequence reads can be used for determining the amount that end with a particular base. As another example, the amount can be determined for a particular end motif that includes the particular base.
At block 2830, the first amount is compared to a reference value to determine a classification of whether the gene exhibits the genetic disorder in the subject. In various embodiments, comparing the first amount to the second amount can include: (1) determining whether the first amount differs from the reference value by at least a threshold amount or the difference is less than the threshold amount; (2) determining whether the first amount is less than the reference value by at least a threshold amount; or (3) determining whether the first amount is greater than the reference value by at least a threshold amount. The first amount is an example of a parameter value and the reference value can be a calibration value or determined from calibration values of calibration samples. Further details are provided for other methods but equally apply to method 2800.
C. Detecting Genetic Disorder Using Size
As described above, fragments of a certain size can be used to determine the amount of sequence reads with the particular base. In some implementations, size may be used along without a determination of a base content or other measured amount of fragments that end in a particular base. Such an example is shown in
At block 2910, first sequence reads obtained from sequencing first cell-free DNA fragments in a first biological sample of a subject are received. The biological sample can be treated with an anticoagulant and incubated for at least a specified amount of time. As example, the anticoagulant can be heparin.
At block 2920, the first sequence reads can be used to determine a first amount of the first cell-free DNA fragments that have a particular size, e.g., as described in
At block 2930, the first amount is compared to a reference value to determine a classification of whether the gene exhibits the genetic disorder in the subject. A separation value can be determined between the first amounts and the reference value. In one example, the gene is DNASE1. The classifications of method 2900 can be the same as described for other methods, e.g., being of different levels or severity of the genetic disorder, as a result of differing amounts of difference relative to the reference level. Multiple reference levels can be used to determine the difference classifications.
The first amount is an example of a parameter value. The reference value can be part of a calibration data point that is determined from one or more calibration samples having known efficacy for a given measurement of the parameter (e.g., for a given calibration value). The known efficacy can be determined using blood clotting tests, as described later.
In various embodiments of methods 2700-2900, wherein the reference value can be determined from one or more reference samples that do not have the genetic disorder and/or determined from one or more reference samples that have the genetic disorder.
Some people are treated with anticoagulants, e.g., for deep venal thrombosis (DVT), which results in clots in some veins. One treatment is heparin. Some embodiments can determine whether the anticoagulant is working. As examples, the effect of heparin can be seen with an increase in cfDNA quantity and/or an increase in DNASE1 activity and/or an increase in short fragments. This can be seen in the size profile or the shift in median size or the increase in fragments of a particular size, e.g., less than 150 bp.
A. Determining Efficacy Using Amount of a Particular Base at Fragment Ends
In some embodiments, the efficacy can be determined using an amount (e.g., base content) of a particular base at fragment ends.
At block 3010, sequence reads obtained from sequencing cell-free DNA fragments in a blood sample of the subject are received. The blood sample is obtained after the subject that was administered a first dosage of an anticoagulant. The anticoagulant can be heparin. Method 3000 can include administering the first dosage of the anticoagulant to the subject.
Prior to receiving the sequence reads, the blood sample can be obtained from the subject, and a sequencing of the cell-free DNA fragments in the blood sample can be performed to obtain the sequence reads.
At block 3020, the sequence reads can be used to determine an amount of the cell-free DNA fragments that end with a particular base. As examples, the amount can be at a particular size (e.g., as shown in
Besides an amount of the cell-free DNA fragments that end with a particular base, a total amount of cfDNA (i.e., for any ends) can be determined and used, e.g., as shown later in
At block 3030, the amount can be compared to a reference value to determine a classification of the efficacy of the treatment. The reference value can be determined in various ways, e.g., as described herein. For instance, an expected amount can be determined for patients that respond as desired. The amount of difference between the amount and the reference value can provide the classification. If the difference is sufficient small (e.g., less than a cutoff), then the first dosage can be classified as effective. If the difference is greater than the cutoff, then the first dosage can be determined as not effective. There may be different levels of ineffective dosage, e.g., intermediate or large inefficacy, which may be determined by using one or more additional cutoff values.
If the amount does not match the reference value (e.g., within a specified range of the reference value), a second dosage of the anticoagulant can be administered to the subject based on the comparison, the second dosage being greater than the first dosage. In other examples, the second dosage can be less than the first dosage, e.g., if the amount overshoots the reference value.
The amount is an example of a parameter value. The reference value can be part of a calibration data point that is determined from one or more calibration samples having known efficacy for a given measurement of the parameter (e.g., for a given calibration value). The known efficacy can be determined using blood clotting tests, as described later. Further details are provided for other methods and sections but equally apply to method 3000.
As an example, the reference value can correspond to a measurement previously performed in the subject before administering the anticoagulant. The change in the amount from the previous measurement can indicate an efficacy of the dosage of the anticoagulant. In another implementation, the reference value can correspond to the amount measured in a healthy subject. An efficacious dosage can be one that brings the amount to within a threshold of the reference value for the healthy subject. In yet another implementation, the reference value can correspond to the amount measured in a subject that has the blood disorder (e.g., as may be previously measured in the subject before administering the anticoagulant or measured in another subject who has the blood disorder).
B. Determining Efficacy Using Size of Fragments
In some embodiments, the efficacy can be determined using the sizes of fragment ends.
At block 3110, sequence reads obtained from sequencing cell-free DNA fragments in a blood sample of the subject are received. The blood sample is obtained after the subject that was administered a first dosage of an anticoagulant. The anticoagulant can be heparin. Method 3100 can include administering the first dosage of the anticoagulant to the subject.
At block 3120, the sequence reads can be used to determine an amount of the cell-free DNA fragments that have a particular size. Block 3120 may be performed in a similar manner as block 1120 in method 1100. The effect on the size can be as illustrated in
At block 3130, the amount can be compared to a reference value to determine a classification of the efficacy of the treatment. The reference value can be determine in a similar manner as for method 3000. The first amount is an example of a parameter value and the reference value can be a calibration value or determined from calibration values of calibration samples. Further details are provided for other methods but equally apply to method 3100.
If the amount does not match the reference value (e.g., within a specified range of the reference value), a second dosage of the anticoagulant can be administered to the subject based on the comparison, the second dosage being greater than the first dosage. In other examples, the second dosage can be less than the first dosage, e.g., if the amount overshoots the reference value.
C. Results
The second row lists the method using to determine the concentration of cfDNA in the plasma samples. The third row shows the concentration of cell-free DNA in GE/ml. The fourth row shows the reference value determined from 3,844 reference samples that are not treated with an anticoagulant and that do not have a blood disorder. The fifth and sixth row shows the difference in the measured value in the second row to the reference values in the third row. As one can see, there is a significant increase. The last row shows significant deviations from the mean for cell-free DNA quantity, which shows that the dosage of heparin is affecting the amount of cell-free DNA resulting in a significant increase.
As shown in rows five and six, the amount of cell-free DNA increases significantly as the heparin works to prevent coagulation. Thus, the total amount of DNA can be used to determine an efficacy of dosage. As described below, the absolute or fold decrease in the cfDNA can be determined and compared to a target to determine the efficacy of a current dose and/or to determine how much the dosage should increase or decrease. If the parameter is too high, the dosage can be decrease to meet the target.
Blood clotting tests can be used as calibration data for each subject with a particular dosage of the anticoagulant to identify what change in amount or size correlates to an effective change in the amount/size. For example, correlation studies done in a group of patients (e.g., DVT patients) who are given anticoagulants can determine the fold change in total amount of cfDNA, change in amount having a particular end motif, or change in size profile that may result in the optimal speed of clearance of a DVT clot. The measured change (absolute or fold) can correspond to a calibration value that corresponds to the target or measure property (e.g., optimal speed for clearance). This value or range of values for amount/size can be a target for treatment for monitoring therapy. Blood of a subject may be allowed to undergo clotting in vitro, and then anticoagulants can be titrated in vitro for the dose in which the anticoagulant is effective. The cfDNA amount/size can be measured in the sample after the clot is dissolved, and these values or a range of values can be the treatment target for the subject. For example, a clotting test can identify that the subject is clotting at the proper amount, and the corresponding amount/size can be used as the reference (calibration) value, which may be used to classify the efficacy of a current dosage.
The dosage can vary per person in order to achieve the effective change, which is why such techniques can be advantageous as they allow measurement of the resulting changes. Such a change in the size or amount of fragments can measures the actual effects within the body, as opposed to just expecting every person to react in the same way to the same dose.
Some embodiments can be used to monitor the activity of a nuclease, e.g., DFFB, DNASE1, and DNASE1L3. Such activity can be from internal nucleases (i.e., as a natural process of the body) and/or from the result of adding a nuclease, e.g., DNASE1. Such monitoring can be used to determine a change in a genetic disorder for the efficacy of a treatment. For example, DNASE1 can be used to treat a subject. An effect of the treatment can be measured by analyzing the T-end fragment percentage or size. In some embodiments, DNASE1 (e.g., exogenously added) can be used to treat auto-immune conditions, such as SLE. Depending on the determination of the activity, the dosage of treatment of the nuclease can be changed.
The determination of abnormal nuclease activity (e.g., above or below a reference value corresponding to normal/healthy values) can indicate a level of pathology alone or in combination with other factors. The pathology can be cancer.
A. Effect of Adding DNASE1 to Samples
The T-end fragments 3312 increase with DNASE1 dose. As shown, the red line 3312 increases from left to right with the higher dosage. This dependency of base content (total or per size) on nuclease activity can allow a classification of a test sample as having a particular activity. The total amount of T-end could be used or a particular amount at a particular size or size range. Any of the features described elsewhere in this disclosure and that depend on nuclease activity can be used (e.g., content for other bases at certain sizes or across all fragment sizes) to measure nuclease activity, e.g., using reference values determined in other samples having a known classification.
A size profile 3305 can also reflect DNASE1 activity. For example, an increase in smaller DNA fragments can show an increase in DNASE1 activity. The number of smaller DNA fragments increases with higher dosage of DNASE1, as can be seen in the progression from left to right in the figure, with more small DNA fragments with the highest dose of 20 U/ml.
Any of the data from any of these plots can be used as a reference value or compared to a reference value. For example, the frequency of DNA fragments at a particular size range (including a specific size) can be determined for each of the doses. Then, a measurement for a new sample can be compared to each of these reference values to determine a relative amount of activity in the test sample. Such a classification of nuclease activity can be qualitative (e.g., low, medium, or high) or quantitative (a particular numerical value). Since these samples correspond to a known activity, they can act as calibration values for determining an activity in the test sample. If desired, interpolation or regression can be used to estimate a particular activity for the measured value in the test sample.
B. DNASE1 Activity in Urine
Other cell-free samples can be used for any of the methods described herein. As an example urine can be used. The amount of nucleases in plasma can differ from blood, resulting in a different cfDNA profile, including size.
C. Monitoring Using Amount of a Particular Base at Fragment Ends
Accordingly, some embodiments can monitor nuclease activity using an amount of DNA fragments having a particular base at the end. Various figures herein show example data for such monitoring suing samples of one or more subjects.
At block 3810, sequence reads are received. The sequence reads can be obtained from sequencing cell-free DNA fragments in a biological sample of a subject.
At block 3820, an amount of the cell-free DNA fragments that end with a particular base are determined using the sequence reads. As with other methods, the particular base may be part of a larger end motif, e.g., a 2-mer, 3-mer, etc. Further, the particular base can be required to be on both ends of a DNA fragment, or a particular pair of different end motifs can be used to select a particular set of DNA fragments.
The amount is an example of a parameter value. The measured amount in this method and other methods can be normalized, e.g., using a property of the sample (e.g., volume or mass of the sample) or using another amount of cell-free DNA fragments or sequence reads satisfying specified criteria (e.g., a total amount of DNA fragment in the sample or a number of fragments with a different end motif). Such normalization can be performed for any of the amounts (parameters) described herein.
At block 3830, the amount is compared to a reference value to determine a classification of an activity of the nuclease. In some embodiments, if the activity is below the reference value, the subject can be classified as having a disorder. In such a case, the subject can be treated, e.g., as described herein. The classification can be a numerical classification value, which can be compared to a cutoff to determine a second classification of whether a gene associated with the nuclease exhibits a genetic disorder in the subject.
The reference value can be a calibration value determined using calibration (reference) samples, which have known classifications and can be analyzed collectively to determine a reference value or calibration function (e.g., when the classifications are continuous variables). For example, the nuclease activity can be a continuous variable, and the comparison of the amount to the reference value can be determine by inputting the amount to a calibration function, e.g., as is described herein.
D. Monitoring Using Size of Fragments
Embodiments can also provide monitor nuclease activity using an amount of DNA fragments at a particular size range, including at a particular size value. Various figures herein show example data for such monitoring suing samples of one or more subjects.
At block 3910, sequence reads are received. The sequence reads can be obtained from sequencing cell-free DNA fragments in a biological sample of a subject. The biological sample can be treated with an anticoagulant and incubated for at least a specified amount of time.
At block 3920, an amount of the cell-free DNA fragments that have a particular size are determined using the sequence reads. As with other methods, the particular base may be part of a larger end motif, e.g., a 2-mer, 3-mer, etc. Further, the particular base can be required to be on both ends of a DNA fragment, or a particular pair of different end motifs can be used to select a particular set of DNA fragments. The amount is an example of a parameter value.
At block 3930, the amount is compared to a reference value to determine a classification of an activity of the nuclease. In some embodiments, if the activity is below the reference value, the subject can be classified as having a disorder. In such a case, the subject can be treated, e.g., as described herein.
Regardless of the amount of a particular base or use of size, the reference value can be determined from a calibration sample having a first classification of the activity of the nuclease. If the amount is similar to the reference value, then the biological sample (and the subject from whom it was obtained) can be identified as having the first classification for the nuclease activity. As examples, the first classification can be normal, increased, or decreased.
In various embodiments, comparing the amount to the reference value can include determining whether the amount differs from the reference value by at least a threshold amount. Comparing the amount to the reference value includes determining whether the amount is less than the reference value by at least a threshold amount. Comparing the amount to the reference value includes determining whether the amount is greater than the reference value by at least a threshold amount.
As examples, the nuclease can be DFFB, DNASE1L3, or DNASE1. The biological sample can be obtained from a subjected treated with the nuclease. The method can further include determining a classification of the efficacy of the treatment based on the comparison of the amount to the reference value.
As described herein, the reference values can be determined using one or more reference (calibration) samples that have a known classification. For example, the reference samples can be known to be healthy or known to have a genetic disorder. As other examples, the reference/calibration samples can have known or measured nuclease activities or efficacy values for a given calibration value (e.g., a parameter including any of the amounts described herein).
The one or more calibration values can be one or more reference values or be used to determine a reference value. The reference values can correspond to particular numerical values for the classifications. For example, calibration data points (calibration value and measured property, such as nuclease activity or level of efficacy) can be analyzed via interpolation or regression to determine a calibration function (e.g., a linear function). Then, a point of the calibration function can be used to determine the numerical classification as an input based on the input of the measured amount or other parameter (e.g., a separation value between two amounts or between a measured amount and a reference value). Such techniques may be applied to any of the method described herein.
For an example with methods 3000 and 3100, the reference value can be determined using one or more reference samples having a known or measured classification for the efficacy of the treatment. The efficacy of treatment for the one or more reference samples can be measured by performing a clotting test on the one or more reference samples. The corresponding amount (e.g., the amount in block 3020 or 3120) can be measured in the one or more reference samples, thereby providing calibration data points comprising the two measurements for the reference/calibration samples. The one or more reference samples can be a plurality of reference samples. A calibration function can be determined that approximates calibration data points corresponding to the measured efficacies and measured amounts for the plurality of reference samples, e.g., by interpolation or regression.
For an example with methods 3800 and 3900, the reference value can be determined using one or more reference samples having a known or measured classification for the activity of the nuclease. The activity of the nuclease for the one or more reference samples can be measured as described herein, e.g., fluorometric or spectrophotometric measurement of cfDNA quantity, which may be done on its own or before, after, and/or in real-time with, the addition of a nuclease-containing sample. Another example is using radial enzyme diffusion methods. The corresponding amount (e.g., the amount in block 3820 or 3920) can be measured in the one or more reference samples, thereby providing calibration data points comprising the two measurements for the reference/calibration samples. The one or more reference samples can be a plurality of reference samples. A calibration function can be determined that approximates calibration data points corresponding to the measured activities and measured amounts for the plurality of reference samples, e.g., by interpolation or regression.
Embodiments may further include treating the genetic disorder or low nuclease activity (e.g., lower than a threshold) in the patient after determining a classification for the subject. The classification for the subject after treatment may or may not involve adding anticoagulants in vivo or in vitro to enhance the cfDNA end profile. Further, the treatment can be determined as an alternative to a current treatment (e.g., an anticoagulant) when the current dosage has low efficacy, e.g., an increase in dosage or a different anticoagulant can be used. Treatment can be provided according to a determined level of a disorder, any identified mutations, and/or a tissue of origin. For example, an identified mutation (e.g., for polymorphic implementations) can be targeted with a particular drug or chemotherapy. The tissue of origin can be used to guide a surgery or any other form of treatment. And, the level of a disorder can be used to determine how aggressive to be with any type of treatment, which may also be determined based on the level of disorder. A disorder (e.g., cancer) may be treated by chemotherapy, drugs, diet, therapy, and/or surgery. In some embodiments, the more the value of a parameter (e.g., amount or size) exceeds the reference value, the more aggressive the treatment may be.
Treatments may include transurethral bladder tumor resection (TURBT). This procedure is used for diagnosis, staging and treatment. During TURBT, a surgeon inserts a cystoscope through the urethra into the bladder. The tumor is then removed using a tool with a small wire loop, a laser, or high-energy electricity. For patients with NMIBC, TURBT may be used for treating or eliminating the cancer. Another treatment may include radical cystectomy and lymph node dissection. Radical cystectomy is the removal of the whole bladder and possibly surrounding tissues and organs.
Treatment may include chemotherapy, which is the use of drugs to destroy cancer cells, usually by keeping the cancer cells from growing and dividing. The drugs may involve, for example but are not limited to, mitomycin-C (available as a generic drug), gemcitabine (Gemzar), and thiotepa (Tepadina) for intravesical chemotherapy. The systemic chemotherapy may involve, for example but not limited to, cisplatin gemcitabine, methotrexate (Rheumatrex, Trexall), vinblastine (Velban), doxorubicin, and cisplatin.
In some embodiments, treatment may include immunotherapy. Immunotherapy may include immune checkpoint inhibitors that block a protein called PD-1. Inhibitors may include but are not limited to atezolizumab (Tecentriq), nivolumab (Opdivo), avelumab (Bavencio), durvalumab (Imfinzi), and pembrolizumab (Keytruda).
Treatment embodiments may also include targeted therapy. Targeted therapy is a treatment that targets the cancer's specific genes and/or proteins that contributes to cancer growth and survival. For example, erdafitinib is a drug given orally that is approved to treat people with locally advanced or metastatic urothelial carcinoma with FGFR3 or FGFR2 genetic mutations that has continued to grow or spread of cancer cells.
Some treatments may include radiation therapy. Radiation therapy is the use of high-energy x-rays or other particles to destroy cancer cells. In addition to each individual treatment, combinations of these treatments described herein may be used. In some embodiments, when the value of the parameter exceeds a threshold value, which itself exceeds a reference value, a combination of the treatments may be used. Information on treatments in the references are incorporated herein by reference.
A. Mice
Plasma DNA data for Dnase1l3−/− mice were retrieved from the European Genome-phenome Archive (EGA; accession number EGAS00001003174) (Serpas, L. et al. (2019), Proceedings of the National Academy of Sciences 116, 641-649). Mice carrying a targeted allele of Dnase1 [Dnase1tm1.1(KOMP)Vlcg] and mice carrying a targeted allele of Dffb [DffbC57BL/6N-Dffbem1Wtsi] both on B6 background were obtained from the Knockout Mouse Project Repository of the University of California at Davis. See “Key Resources Table” for details. The mice were maintained in the Laboratory Animal Center of The Chinese University of Hong Kong (CUHK). All experimental procedures were approved by the Animal Experimentation Ethics committee of CUHK and performed in compliance with “Guide for the Care and Use of Laboratory Animals” (8th edition, 2011) established by the National Institutes of Health. Male and female mice of 13-17 weeks were used for experiments. An analysis on the influence of sex and gender on the results were not done since their blood samples were pooled.
B. Murine Sample Collection
Mice were killed and exsanguinated by cardiac puncture. Blood from each mouse was pooled and immediately distributed evenly into experimental conditions: EDTA with 0 h incubation and EDTA with 6 h incubation, or heparin with 0 h incubation and heparin with 6 h incubation. For the Dffb−/− experiments, 5 pools of blood were created, each containing blood from 2-4 mice using a total of 14 WT and 14 Dffb−/− mice. For the Dnase1−/− experiments, one pool was created for each genotype, from a total of 12 WT, 12 Dnase1+/−, and 11 Dnase1−/− mice. The EDTA tubes were commercially bought 1.3 mL K3E micro tubes (Sarstedt). Heparin tubes were 2 mL microcentrifuge tubes with 18 IU heparin (Sigma-Aldrich) per mL blood added. Incubation was done at room temperature (12-20° C.) on a rocker.
After the room temperature (RT) incubation time was completed, the blood samples were separated by a double centrifugation protocol (1,600×g for 10 minutes at 4° C., then recentrifugation of the plasma at 16,000×g for 10 minutes at 4° C.) (Chiu, R. W. K. et al., (2001), Clinical Chemistry 47, 1607-1613). The resulting plasma was collected, yielding 0.4-1.5 mL of plasma for each condition and time point.
C. Plasma DNA Extraction and Library Preparation
Plasma DNA was extracted with the QIAamp Circulating Nucleic Acid Kit (Qiagen) according to the manufacturer's protocol. Indexed plasma DNA libraries were constructed using a TruSeq DNA Nano Library Prep Kit according to the manufacturer's instructions. The adaptor-ligated DNA was enriched with 8 cycles of PCR and analyzed on Agilent 4200 TapeStation (Agilent Technologies) using the High Sensitivity D1000 ScreenTape System (Agilent Technologies) for quality control and gel-based size determination. Libraries were quantified by the Qubit dsDNA high sensitivity assay kit (Thermo Fisher Scientific) before sequencing.
D. DNA Sequencing and Alignment
Multiplexed DNA libraries were sequenced for 2×75 bp paired-end reads on the NextSeq 500 platform (Illumina). Sequences were assigned to their corresponding samples based on their six-base index sequence. Using the Short Oligonucleotide Alignment Program 2 (SOAP2), the paired-end reads from mouse plasma were aligned to the reference mouse genome (NCBI build 37/UCSC mm9; non-repeat-masked) (Li, R. et al., (2009), Bioinformatics 25, 1966-1967). Up to two nucleotide mismatches were allowed. Only paired-end reads aligned to the same chromosome in the correct orientation and spanning an insert size of <600 bp were retained for downstream analysis. Paired-end reads sharing the same start and end genomic coordinates were deemed PCR duplicates and were discarded from downstream analysis.
E. Base-End Analysis and Fragment Type Analysis
CTCF and Pol II regions were downloaded from the mouse ENCODE project (Shen, Y. et al. (2012), Nature 488, 116-120). The transcription start sites (TSS) of all genes in the reference mouse genome UCSC mm9 were downloaded from UCSC. 10,000 random non-overlapping regions with 10,000 bp length were randomly selected across the whole genome by BEDTools (v2.27.1) (Quinlan, A. R. and Hall, I. M. (2010), Bioinformatics 26, 841-842). We used a window size±500 bp. For the end density analysis, the end density of ±1500 bp window of CTCF regions was normalized by the median end counts in ±3000 bp CTCF regions.
For the random, CTCF, and Pol II regions, only cfDNA fragments oriented in the direction of the Watson strand was used for analysis. For the TSS region, only cfDNA fragments oriented in the same direction as the TSS region were used. At each position in these regions, the first nucleotide on the 5′ end was identified for each fragment and the base-end percentage was calculated (e.g. A-end fragments/All fragments, with all fragments including A-end, G-end, C-end, and T-end fragments). To analyze the base end percentages by fragment size, both 5′ ends (on the respective Watson or Crick strands) of a cfDNA fragment were counted per fragment and the base end percentages at each size were calculated.
For fragment type analysis, each fragment was assigned to a fragment type based on their two ending nucleotides. These fragments where both ends were identified were denoted with their end nucleotides and the symbol < > in between, such that a fragment with both ends as A would be designated as A< >A. All fragments include A< >A, A< >G, A< >C, A< >T, C< >C, C< >G, C< >T, G< >G, G< >T, T< >T fragments. Each fragment type percentages was calculated (e.g. A< >A fragment percent=A< >A fragments/All fragments).
F. cfDNA Quantification
Heparin was found to have significant positive interference with the Qubit dsDNA high sensitivity assay (ThermoFisher Scientific) (data not shown). Instead, the Bio-Rad QX200 Droplet Digital PCR (ddPCR) platform was used for all cfDNA quantification since the heparin interference of DNA target molecules can be ameliorated by the reaction partitioning of ddPCR (Dingle, T. C. et al., (2013), Clin Chem 59, 1670-1672). Heparin samples were diluted 5-fold and at least four wells per sample were done. Mouse cfDNA was quantified by the mouse TaqMan Copy number reference assay (ThermoFisher Scientific) targeting the transferrin receptor gene (Tfrc).
G. Quantification and Statistical Analysis
Analysis was performed using custom-built programs written in Python and R languages. Statistical differences were calculated using Mann-Whitney U tests unless otherwise specified. A P value of less than 0.05 was considered statistically significant and all probabilities were two-tailed.
Logic system 4230 may be, or may include, a computer system, ASIC, microprocessor, etc. It may also include or be coupled with a display (e.g., monitor, LED display, etc.) and a user input device (e.g., mouse, keyboard, buttons, etc.). Logic system 4230 and the other components may be part of a stand-alone or network connected computer system, or they may be directly attached to or incorporated in a device (e.g., a sequencing device) that includes detector 4220 and/or assay device 4210. Logic system 4230 may also include software that executes in a processor 4250. Logic system 4230 may include a computer readable medium storing instructions for controlling measurement system 4200 to perform any of the methods described herein. For example, logic system 4230 can provide commands to a system that includes assay device 4210 such that sequencing or other physical operations are performed. Such physical operations can be performed in a particular order, e.g., with reagents being added and removed in a particular order. Such physical operations may be performed by a robotics system, e.g., including a robotic arm, as may be used to obtain a sample and perform an assay.
Measurement system 4200 may also include a treatment device 4260, which can provide a treatment to the subject. Treatment device 4260 can determine a treatment and/or be used to perform a treatment. Examples of such treatment can include surgery, radiation therapy, chemotherapy, immunotherapy, targeted therapy, hormone therapy, and stem cell transplant. Logic system 4230 may be connected to treatment device 4260, e.g., to provide results of a method described herein. The treatment device may receive inputs from other devices, such as an imaging device and user inputs (e.g., to control the treatment, such as controls over a robotic system).
Any of the computer systems mentioned herein may utilize any suitable number of subsystems. Examples of such subsystems are shown in
The subsystems shown in
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface 81, by an internal interface, or via removable storage devices that can be connected and removed from one component to another component. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.
Aspects of embodiments can be implemented in the form of control logic using hardware circuitry (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein, a processor can include a single-core processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked, as well as dedicated hardware. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C #, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission. A suitable non-transitory computer readable medium can include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk) or Blu-ray disk, flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective step or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at a same time or at different times or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means of a system for performing these steps.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention. However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
The above description of example embodiments of the present disclosure has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form described, and many modifications and variations are possible in light of the teaching above.
A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary. Reference to a “first” component does not necessarily require that a second component be provided. Moreover, reference to a “first” or a “second” component does not limit the referenced component to a particular location unless expressly stated. The term “based on” is intended to mean “based at least in part on.”
All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art.
This application claims the benefit of U.S. Provisional Patent Application No. 62/949,867, entitled “Cell-Free DNA Fragmentation And Nucleases,” filed on Dec. 18, 2019, and U.S. Provisional Patent Application No. 62/958,651, entitled “Cell-Free DNA Fragmentation And Nucleases,” filed on Jan. 8, 2020, which are hereby incorporated by reference in their entirety and for all purposes.
Number | Date | Country | |
---|---|---|---|
62958651 | Jan 2020 | US | |
62949867 | Dec 2019 | US |