The invention is generally related to methods of detecting length polymorphisms by amplifying a target genomic region and imaging the amplified region using atomic force microscopy.
Length polymorphisms associated with genetic disease are currently among the most difficult variant types to characterize, and their clinical impact is significant. Nucleotide repeat expansions are found in individuals with Fragile X syndrome, Huntington disease, and most hereditary ataxias,1,2 while copy number variations and tandem duplications are associated with lung,3 breast,4,5 blood,6 and prostate7 cancers. In these cases, the insertion or deletion is of variable length, composed to some degree of a repeated sequence, and often longer than the short reads used in next-generation sequencing (NGS)—challenges that limit the detection of variants and accurate reporting of indel length by traditional assays like microarrays and targeted sequencing.1
For example, FLT3 internal tandem duplications (ITDs), identified in about 30% of AML cases, are associated with a poor prognosis, especially if the variant allele frequency (VAF) is greater than 50%.6,9-11 In WT patients, the FLT3 gene expresses an fms-like tyrosine kinase receptor that is involved in cell growth and division pathways.12 FLT3-ITDs are associated with overactivation of the pathway, yielding uncontrolled cellular proliferation and formation of myeloid blasts characteristic of AML. Most detected ITDs are reported to be less than 100 bp in length, with a mode length of about 40 bp.13,14 Since the expression level of the FLT3-ITD, mutation size, and insertion location are linked to survivability6,9,11,15,16 and determine sensitivity to different drugs,9,12 quantification of variant length and frequency are of urgent concern for informed AML treatment.
Existing assays to detect FLT3-ITDs present shortcomings in throughput, accuracy, or feasibility. In PCR-CE techniques, e.g. the widely used commercial Leukostrat CDx FLT3 Mutation Assay, PCR is used to amplify the 368 bp ITD region within the FLT3 gene, capillary gel electrophoresis (CE) gives the length of the resulting amplicons, and amplicon length indicates the presence and length of an insertion.13 However, due to PCR amplification bias and inherent limitations of CE, the lowest VAF that can be detected is 5%, the sensitivity to longer variants is even more limited, and, crucially, the VAF is not reported due to lack of reliability.13,17 Multi-parameter flow cytometry can detect AML-associated combinations of cell antigens at VAFs of 10-3 to 10-4, but the genetic character of a mutation is not directly identified, and the technique requires live cells and extensive expertise.17 Denaturing high performance liquid chromatography can detect insertion sites and lengths at VAF>0.025,11,18,19 but its expense and length limitations narrows its appeal. Karyotyping lacks sufficient resolution to effectively detect FLT3-ITDs.6 Quantitative PCR (qPCR) is a reliable and cost-effective approach that can detect many AML-associated chromosomal rearrangements and mutations, but the lack of a consistent target in length polymorphisms like FLT3-ITDs is a major barrier, as is the attenuation of variant signal due to amplification bias in favor of a shorter WT target.8,17 NGS is a strong candidate for characterizing FLT3-ITDs: FLT3 is included amongst the targeted genes in major NGS myeloid panels,20 and minimum residual disease detection of FLT3-ITDs at extremely low (<10-5) variant fractions was recently demonstrated.15,20 However, the feasibility of NGS for comprehensive and accurate FLT3-ITD reporting remains a concern; notably, (a) repetitive inserts of variable length, especially those longer than the NGS reads, present a significant challenge,20 and (b) the cost and time required for multiple specialized NGS assays over the course of a single patient's treatment would necessitate considerable additional expense. Single-molecule sequencing platforms may eventually overcome the repeatability and length limitations of NGS, but cost and reliability remain issues for these emergent techniques for the foreseeable future.21 To improve upon the often-serious health outcomes of patients with length polymorphisms, there is a need for a rapid, low-cost, and accurate assay that can meet the challenges of length heterogeneity and nucleotide repetition.
Described herein are methods for determining the length and frequency of a length polymorphism which combines amplification techniques such as digital polymerase chain reaction (dPCR) followed by single-molecule length measurement using atomic force microscopy (AFM).
An aspect of the disclosure provides a method for detecting length polymorphisms in a DNA sample, comprising amplifying a target region of the DNA in the sample, wherein the DNA sample is diluted prior to the amplifying step to provide a plurality of amplified samples in which 0 or 1 target DNA strand is amplified in each amplified sample and wherein at least one amplified sample includes 1 target DNA strand; depositing the plurality of amplified samples onto a surface; imaging the plurality of amplified samples deposited on the surface using AFM to determine an amplicon length distribution; comparing the amplicon length distribution to a corresponding amplicon length distribution obtained from a reference DNA sample that does not contain a length polymorphism in the target region; and detecting a length polymorphism in the target region of the DNA sample when the amplicon length distribution is distinct from the corresponding amplicon length distribution.
In some embodiments, the amplifying step is performed using dPCR. In some embodiments, the DNA sample and the reference DNA sample are combined into a single sample before the amplifying step in a 1:1 ratio. In some embodiments, the method further comprises a step of determining a percentage frequency of the length polymorphism in the DNA sample as compared to the reference DNA sample. In some embodiments, a Bayesian statistical analysis is used to determine whether the amplicon length distribution is distinct from the corresponding amplicon length distribution of the reference DNA sample.
In some embodiments, the AFM is high-speed AFM. In some embodiments, the amplicon length distribution is determined by a computer-implemented method comprising the steps of flattening an AFM image; thresholding to identify DNA strands; skeletonizing the thresholded strands; determining the longest backbone of each skeleton; measuring the length of the longest backbone; and applying a quality filter. In some embodiments, the length polymorphism is an internal tandem duplication. In some embodiments, the target region is labeled with CRISPR associated protein 9 (Cas9).
Embodiments of the disclosure provide the use of atomic force microscopy (AFM) in combination with DNA amplification techniques to detect length polymorphisms. The term “polymorphism”, as used herein, refers to the coexistence of more than one form of a gene or portion thereof.
Exemplary amplification techniques include polymerase chain reaction (PCR), including digital PCR (dPCR), droplet digital PCR (ddPCR), ligase chain reaction, antisense RNA amplification, NASBA, etc. In contrast to conventional PCR in which one reaction is performed per well, dPCR involves partitioning the PCR solution into tens of thousands of nano-liter sized droplets, where a separate PCR reaction takes place in each one. In methods of the present disclosure, the sample is diluted so that one target molecule is present in a single PCR reaction. This allows the determination of how many different length target molecules are in the sample. A PCR solution is made similarly to a TaqMan assay, which consists of template DNA (or RNA), fluorescence-quencher probes, primers, and a PCR master mix, which contains DNA polymerase, dNTPs, MgCl2, and reaction buffers at optimal concentrations. Several different methods can be used to partition samples, including microwell plates, capillaries, oil emulsion, and arrays of miniaturized chambers with nucleic acid binding surfaces. The PCR solution is divided into smaller reactions and are then made to run PCR individually. After multiple PCR amplification cycles, the samples are checked for fluorescence with a binary readout of “0” or “1”. The fraction of fluorescing droplets is recorded. The partitioning of the sample allows one to estimate the number of different molecules by assuming that the molecule population follows the Poisson distribution, thus accounting for the possibility of multiple target molecules inhabiting a single droplet. Using Poisson's law of small numbers, the distribution of target molecule within the sample can be accurately approximated allowing for a quantification of the target strand in the PCR product. This model simply predicts that as the number of samples containing at least one target molecule increases, the probability of the samples containing more than one target molecule increases. In conventional PCR, the number of PCR amplification cycles is proportional to the starting copy number. Digital PCR uses statistical power to provide relative quantification.
In some exemplary embodiments, the methods described herein are performed on DNA, e.g. genomic DNA, with sizes ranging from tens to hundreds of thousands of base pairs. In other embodiments, smaller strands of DNA are analyzed, for example, under 300 bp in length, or under 250 bp or under 200 bp in length.
Patient samples can be extracted with a variety of methods known in the art to provide nucleic acid (e.g. genomic DNA) for use in the methods described herein. For example, a DNA sample may be extracted from blood, tissue (e.g. tumor sample), saliva, urine, semen, etc.
As used herein, the term “reference DNA sample” or “control DNA sample” refers to genomic DNA obtained from a healthy individual who does not have a length polymorphism or a disease or disorder associated with a length polymorphism, also referred to herein as “wild type”. The term “wild type” as used herein refers to the normal, or non-mutated, or functional form of a gene.
With reference to
AFM is a very-high-resolution type of scanning probe microscopy (SPM), with demonstrated resolution on the order of fractions of a nanometer, more than 1000 times better than the optical diffraction-limit. AFM systems compatible with the present disclosure are known in the art, e.g. as disclosed in U.S. Pat. No. 9,926,589 incorporated herein by reference. In some embodiments, the AFM is high-speed AFM. High-speed AFM (HS-AFM) is a type of atomic force microscopy (AFM) that, unlike conventional AFM, can take an image very quickly and with relatively low imaging force. High speed AFM may be defined as >200,000 pixels per second. Typical pixel sizes range from 1 ×1 nm to 10×10 nm. High speed AFM allows for the characterization of the conformation of a plurality of fixed molecules across a wide range of area. In some embodiments, the sample is dry not immersed in liquid. In some embodiments, a scan speed of 0.5-2 Hz is used, e.g. a 1 Hz scan speed.
Tracing software may be used to process the images and to determine the amplicon length distribution in each well. For example, the computer-implemented method may include steps of flattening an AFM image; thresholding to identify DNA strands; skeletonizing the thresholded strands; determining the longest backbone of each skeleton; measuring the length of the longest backbone; and applying a quality filter. A decision test, e.g. using a Bayesian statistical analysis, may then be applied to the distributions to determine whether the amplicons in each well most likely originated from a WT or variant template.
Bayesian statistics is based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. Bayesian statistical methods use Bayes' theorem to compute and update probabilities after obtaining new data. Bayes' theorem describes the conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event. For example, in Bayesian inference, Bayes' theorem can be used to estimate the parameters of a probability distribution or statistical model. Since Bayesian statistics treats probability as a degree of belief, Bayes' theorem can directly assign a probability distribution that quantifies the belief to the parameter or set of parameters.
A Bayesian approach may be used to determine the most credible mean amplicon length within a given reaction/well and to characterize the distribution as WT or variant. In some embodiments, 95% highest density interval (HDI) of each distribution's mean length (a measure similar to a confidence interval that gives the 95% most credible values of the mean) is determined. Then, the well may be characterized as WT or variant based on whether its 95% HDI fell within a region of practical equivalence (ROPE): a range of lengths centered at the most credible WT length (null value). In some embodiments, a receiver operating characteristic (ROC) curve is used to set the optimal ROPE. The VAF for a given sample may be equal to nvar/(nvar+nwT), where nvar is the number of variant wells and nwT is the number of WT wells according to the decision rule.
In some embodiments, the target region is labeled with CRISPR associated protein 9 (Cas9). In some embodiments other molecules or nanoparticles are used as labels instead of Cas9. The chemistry to attach such labels is known in the art. Labeling a target motif with programmable Cas9 and imaging to reveal the Cas9 molecules bound to strands allows on-target strands to be distinguished from off-target, improving specificity. Some embodiments provide methods of Cas9 labeling as disclosed in U.S. Pat. No. 9,926,589 incorporated herein by reference. In some embodiments, the target region is not labeled with Cas9 or any other labeling molecule.
The methods described herein can be used to detect length polymorphisms including additions and deletions such as nucleotide repeat expansions, copy number variations, and tandem duplications, such as internal tandem duplications. The present methods can be applied to any length polymorphism as long as the site of the polymorphism is known and can be amplified.
Length polymorphisms are associated with several diseases and disorders such as Fragile X syndrome, Huntington disease, hereditary ataxias, and various cancers such as lung, breast, blood, and prostate cancers. Accordingly, some embodiments of the disclosure provide methods of diagnosing a subject with a disease or disorder associated with a length polymorphism using a detection method as described herein. Further embodiments provide a method of treating the disease or disorder comprising diagnosing the disease or disorder using a detection method as described herein and subsequently treating the subject with a suitable therapy for the disease or disorder, e.g. an anti-cancer agent.
The terms “subject” and “patient” are used interchangeably herein, and refer to an animal such as a mammal, which is afflicted with or suspected of having, at risk of, or being pre-disposed to a disease or disorder associated with a length polymorphism. In general, the terms refer to a human. The terms also include domestic animals bred for food, sport, or as pets, including horses, cows, sheep, poultry, fish, pigs, cats, dogs, and zoo animals, goats, apes (e.g. gorilla or chimpanzee), and rodents such as rats and mice. Typical subjects include persons susceptible to, suffering from or that have suffered a disease or disorder associated with a length polymorphism.
Before exemplary embodiments of the present invention are described in greater detail, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, representative illustrative methods and materials are now described.
All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
It is noted that, as used herein and in the appended claims, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
The invention is further described by the following non-limiting examples which further illustrate the invention, and are not intended, nor should they be interpreted to, limit the scope of the invention.
Length polymorphisms are found in a host of serious diseases, and assessment of mutation length and variant allele frequency (VAF) is often critical for accurate diagnosis. However, characterization of length polymorphisms remains challenging due to their frequently variable or repetitive nature. Here, we present digital polymerase chain reaction (dPCR) followed by high-speed atomic force microscopy (HSAFM) as an effective method for quantifying length polymorphisms. We focused on internal tandem duplications (ITDs) located within the FLT3 gene, which are associated with acute myeloid leukemia and often indicative of a poor prognosis. In analysis of over 1.5 million HSAFM-imaged amplicons from cell line and clinical samples containing FLT3-ITDs, dPCR-HSAFM returned the expected variant length and VAF, down to 5% VAF samples. As a flexible method with single-molecule resolution, dPCR-HSAFM thus represents a new frontier for HSAFM imaging and a powerful tool for the diagnosis of length polymorphisms.
Materials and Methods
To test and validate dPCR-HSAFM, we focused on the case of internal tandem duplications (ITDs) within exons 14 and 15 of the FLT3 gene, a length polymorphism associated with acute myeloid leukemia (AML).6,8 Using a collection of over 200,000 atomic force microscopy images, we show dPCR-HSAFM to be an accurate, straightforward, and low-cost option to characterize FLT3-ITDs.
Samples
Human female genomic DNA was obtained from Promega (Madison, Wis.), synthetic DNA was ordered from Integrated DNA Technologies (Coralville, Iowa), and genomic DNA from cell lines MV4-11 and PL-21 were obtained from DSMZ (Braunschweig, Germany). Since PL-21 is a heterozygote containing one WT FLT3 allele and one FLT3 allele with a 126 bp insert, we used a 126 bp synthetic template to obtain a 100% VAF sample for our 126 bp insert series. With IRB approval, de-identified clinical samples in the form of slides with bone marrow aspirate smears from confirmed AML patients were obtained from the Department of Pathology at Virginia Commonwealth University. FLT3-ITD status and insertion length measured via the Leukostrat CDx FLT3 mutation assay were collected from uploaded reports in the patients' electronic medical records (Table 1). We extracted and purified genomic DNA from slides using MagAttract HMW DNA kit (Qiagen, Gaithersburg, Md.), and concentrations of DNA were determined with Qubit dsDNA HS assay (Invitrogen, ThermoFisher Scientific, Carlsbad, Calif.). Samples are listed in Table 1.
Digital PCR
We employed dPCR to amplify our target gene (FLT3) and determine the proportion of DNA within a sample containing a variant allele. We used 1× Luna Universal Probe qPCR master mix (New England Biolab, Ipswich, Mass.), 0.025 units/μL Antarctic Thermolabile UDG (New England Biolab, Ipswich, Mass.), 25 nM of forward and reverse primers, 25 nM of fluorogenic probe, and template DNA at a concentration of 1 copy per 10-15 μl of dPCR solution, for a final volume of 5 μl per well. PCR was performed on a C1000 Touch Thermo Cycler with CFX96 Real-Time System optical head (Bio-Rad Laboratories, Hercules, Calif.). We targeted exons 14 and 15 of the FLT3 gene for amplification using the following forward and reverse primer sequences, respectively: ACTGCCTATTCCTAACTGACTCATC (SEQ ID NO: 1) and CTTTCAGCATTTTGACGGCAACC (SEQ ID NO: 2). We used the fluorogenic probe 6-FAM-CAGGGAAGG/ZEN/TACTAGGATCAGGTGCT-IABkFQ (SEQ ID NO: 3), where “6-FAM” indicates fluorescein, “ZEN” indicates ZEN quencher, and “IABkFQ” indicates Iowa Black Fluorescence quencher. Our amplification protocol involved 25° C. for 10 min, 95° C. for 1 min, 50 cycles at 95° C. for 15 sec each, and 60° C. for 2 min. To determine the presence of amplicons in dPCR wells, qPCR kinetic curves were recorded for each well and visually inspected; only wells with distinct S-like amplification curves were interpreted to contain DNA and thereby used for further analysis. Non-template control (NTC) plates were run regularly to detect for environmental contamination, and no kinetic curves were detected in NTC plates. The amplicons from the positive wells were purified with 5 μL of AMPure XP (Beckman Coulter, Brea, Calif.) per well and eluted in 10 μl of TE buffer.
Deposition for HSAFM Scanning
Since atomic force microscopy depends on resolving the topography of a desired feature against the substrate background, imaging the 2 nm-diameter backbone of DNA requires an atomically flat substrate. We thus deposited our dPCR samples onto 15 mm×15 mm squares of cleaved mica (Ted Pella) for scanning. We subdivided our mica into 25 sample regions by printing UV-cured epoxy into a 5×5 grid on the mica surface. The applied dPCR-HSAFM procedure was equivalent for all samples except sW-0; in that case, aliquots were deposited directly onto mica without first undergoing dPCR, thereby providing a control to determine the effect of dPCR on the length distributions. To deposit each sample other than sW-0, we followed a slightly modified version of our previously described procedure.23,24 We first added MgCl2 to the purified DNA in order to facilitate adhesion of negatively-charged DNA to negatively-charged mica and diluted the sample to yield the optimal density of DNA molecules; we found the ideal deposition concentration to be 0.1 ng/μL of DNA with 2.5 mM MgCl2. We then deposited approximately 0.2 μL into each grid square, waited 1 minute after the final sample deposition to allow for DNA relaxation on the surface, and washed the entire grid 3 times with 600 μL of Millipore water. We then immediately loaded the sample into a fixed apparatus with a wide-angle nozzle (ImpactRM) to keep the angle and pressure (2.5 psi) of airflow consistent, and applied compressed air to dry the grid. We transferred the grid to an infrared oven and baked at 120° C. for 10 minutes. DNA was not deposited in at least one grid square of each mica sheet in order to provide a baseline for background noise and to detect for carryover of DNA between wells during washing. Empty wells consistently revealed negligible carryover of DNA (<1 strand per 20 frames) due to the unfavorable conditions for DNA deposition in a wash of excess pure water.
HSAFM
Individual DNA molecules were imaged using HSAFM and measured using custom computer vision software (MATLAB). Our HSAFM is a contact-mode system with a laser vibrometer to detect height and a flexure stage for high-speed lateral displacement.25 For DNA imaging, we found optimal conditions to be a 1 Hz scan speed, a 1800 nm×1800 nm scan size, and 1000×1000 pixel resolution. These settings yielded ˜5 nm (15 bp) length accuracy,24 while the laser vibrometer yielded 0.015 nm height resolution. We used measurements from our synthetic control sample (sW-0) as a fiducial marker26 to calibrate the lateral dimensions of our HSAFM images and to correct for strand length variation due to tip wear27. In each grid square containing the amplicons from a single dPCR well, we nominally captured 55 frames, which required approximately 1 minute to land the HSAFM probe onto the surface and 2 minutes to scan. We measured strand length in HSAFM images using custom computer vision software. Briefly, our algorithm flattened the raw AFM image, thresholded to identify DNA strands, skeletonized those thresholded strands, discerned the longest backbone of each skeleton, measured the length of the backbone, and applied a quality filter. To mitigate the inclusion of nanoscale surface contamination in our DNA length measurements, we discarded strands less than 70 nm (≈210 bp) in length. We also discarded wells with fewer than 150 strands after tracing and filtering as a conservative cutoff in order to mitigate the influence of background contamination and achieve sufficient confidence in our estimates of mean length.
Bayesian Analysis
We used a Bayesian approach to determine the most credible mean amplicon length within a given well and to characterize the distribution as WT or variant. In contrast to null hypothesis testing using a frequentist approach, Bayesian inference does not depend on sampling or testing intentions; instead, prior knowledge is explicitly described and included in the calculation of the posterior distribution of parameter values by Bayes' rule.28 This posterior distribution, which we derived by Markov chain Monte Carlo (MCMC) sampling, directly imparts the credibility of each parameter value.
Results
To validate the dPCR-HSAFM technique, we tested (a) synthetic and cell line samples of known length and spiked-in VAF, and (b) clinical samples previously analyzed using the standard PCR-CE clinical test, Leukostrat, which reports insert length but not VAF. Table 1 shows a list of all tested samples.
Example scan images from two samples (W-0 and M-1 are shown in
To determine if the amplicons within a given well likely originated from a WT or variant template, we used a decision rule based on Bayesian inference that is similar to an equivalence test.30 First, we determined the 95% highest density interval (HDI) of each distribution's mean length—a measure similar to a confidence interval that gives the 95% most credible values of the mean. Then, the well was characterized as WT or variant based on whether its 95% HDI fell within a region of practical equivalence (ROPE): a range of lengths centered at the most credible WT length (null value).
After establishing the optimal ROPE with W-0 and M-100, we applied our decision rule to all samples and determined their VAFs (
Accuracy and Reproducibility
Our testing of FLT3-ITD samples shows that dPCR-HSAFM successfully detected insertions of 30 bp and 126 bp at clinically relevant VAFs. As shown in
Accuracy and reproducibility of dPCR-HSAFM length measurements (
dPCR-HSAFM analysis of clinical samples generally matched the Leukostrat assay results, although notable exceptions reveal the limits and promise of the approach. First, the mean variant length of ClinA as determined by dPCR-HSAFM is strikingly greater than the value returned by Leukostrat (
As for the VAF characterization of ClinA, both tests designated the sample as clearly positive for FLT3-ITD, and there was good agreement in both variant length and FLT3-ITD positive result for ClinB, ClinC, and ClinD as well. In contrast, ClinE and ClinF were both found to be FLT3-ITD negative by the Leukostrat assay, but the dPCR-HSAFM length distributions of a small number of wells from those samples were found to fall above the ROPE, thus returning VAF>0. However, the few variant-identified wells returned 95% HDIs that were only marginally greater than the upper ROPE bound—a factor that can and should be considered when assessing the credibility of a positive or negative FLT3-ITD assignation. Additional rigorous testing with a greater number of positive and negative controls will allow for the refinement of the ROPE values and methodology of the dPCR-HSAFM decision test, but it is notable that the resolution and quantitative nature of the dPCR-HSAFM data enable a more informative and transparent FLT3-ITD result to be reported than a binary positive or negative.
As a proof of principle, the close agreement between our spiked-in VAF and detected VAF (
dPCR
dPCR offers distinct advantages over bulk PCR in the quantification of variants within a mixed sample.40 By “digitizing” the sample constituents into a homogeneous population of amplicons within each well, length analysis and determination of VAF is straightforward. dPCR also mitigates PCR bias: by separating single target DNA molecules into their own wells for amplification, the preferential amplification of shorter over longer strands within a single solution is avoided. As indicated by our negative non-template control results, we showed that dPCR contamination can be avoided using amplification with dUTP and careful procedures. Finally, dPCR is a single-molecule technique, as only a single template molecule is required to initiate amplification within a well. While this study focused on FLT3-ITDs, dPCR-HSAFM could be applied to any length polymorphism as long as the site of the polymorphism is known and can be amplified.1,2
For the qPCR kit we employed in this study, a maximum amplicon length of only 200 bp is recommended.41 While longer amplification is possible,42 a decrease in amplification efficiency relative to the shorter WT template is expected in proportion to variant length. It is likely that this decrease in efficiency will be consistent and correctable by a spiked-in VAF v. detected VAF calibration curve,15 but any decrease in efficiency demands an increase in well sampling in order to achieve an equivalent VAF sensitivity—and indeed, the inverse relationship between well number and VAF sensitivity is another dPCR consideration. With our current setup and assuming a binomial distribution of variants within dPCR wells, 3000 samples would be needed to detect a 10-3 VAF with 95% confidence; in comparison, a recent study showed that NGS can detect a VAF as low as 10-5,15 Gains in miniaturization and software automation are fully achievable.40,43 Furthermore, while minimal residual detection of disease down to 10-5 VAF may be useful for some prognoses, a lower-cost option that gives an accurate report of VAF and variant length at higher VAFs would still be useful. For example, in the case of FLT3-ITDs, the risk of poor outcome appears stratified over a range of VAFs, from 10-5 to 50%.6,9-14,15
Recently, microfluidics has enabled the development of miniaturized dPCR platforms like droplet digital PCR (ddPCR), wherein a diluted dPCR solution is hypercompartmentalized into oil-separated droplets that are each amplified, yielding thousands or even millions of distinct fluorescence readings.40 Using a nonspecific DNA-binding dye, VAF of a mixed sample was determined down to 20%.38 These encouraging results are nevertheless, like CE, limited by the bulk nature of the fluorescence output for a single ddPCR well, which indicates both amplicon length and VAF and must be calibrated for amplification efficiency, droplet volume, and the DNA template volume. Multiplexed ddPCR with multi-channel fluorescent probes or dyes may help mitigate the confounding of amplicon length and VAF,44 but reproducible and accurate partitioning of the multiplexed signals presents its own challenges.45 The gains in throughput and sensitivity of dPCR due to automation and miniaturization mirror advancements in AFM technology, and we believe there is great use in both techniques for broader applications.
HSAFM
This study represents the first statistically robust diagnosis of a genetic variant with atomic force microscopy (AFM). AFM was invented in the 1980s46 and has served as a highly flexible research tool for discerning the topography and mechanical properties of a surface with sub-nanometer resolution,47 but limitations in scan speed (typically 0.001 Hz for a 1000×1000 pixel image) has hampered its widespread adaptation in large-scale analyses. Previous AFM studies of DNA involved tracing dozens or hundreds of DNA molecules by hand, sometimes with the aid of smoothing algorithms to better fit the traces to strand curvature.23,24,48-53 The development of HSAFM-which can image 1000×1000 pixel images at speeds of 1 Hz or greater-enabled an exciting range of new possibilities, but HSAFM biological research has mainly focused on characterizing biomolecular kinetics or whole-cell imaging at nanoscale resolution and with different imaging modes.47,54,55 Recently, some groups have explored a large-area approach to collecting robust statistical measurements of features,43,56,57 but to our knowledge, fully automated flattening and tracing as exhibited in this work has never been developed to the extent that it could be applied to such a large sample size (>200,000 images and >1.5 million DNA molecules traced). The scanning procedures and automated processing routines developed here enabled gigapixel-scale analysis.
HSAFM offers unique benefits in the characterization of genetic variants. First, the single-molecule sensitivity of HSAFM, which complements the single-molecule amplification of dPCR, has many advantages. A single-molecule imaging approach requires less material: to diagnose a well containing WT or variant amplicons, we scanned an area <200 μm2, which usually contained a few hundred strands. These numbers point to the potential of the process to be scaled down and paired with other microscale single-molecule techniques, such as ddPCR or amplification-free analysis. It is notable that either ddPCR or dPCR detect the presence of single molecules before amplification, but HSAFM scans produce data with single-molecule resolution after amplification, enabling more powerful statistical analysis than the bulk fluorescence intensity measurements achieved with CE or ddPCR. For example, individual species can be identified and quantified after multiplexing with low-cycle number PCR,23 and in the detection of nuanced differences between populations, the Bayesian approach we applied here can be extended to sophisticated hierarchical models.28
The direct-imaging approach of HSAFM confers additional advantages. Unlike a narrow-range CE like that employed in the Leukostrat assay, HSAFM is an extremely wide-range length measurement system: by stitching together multiple overlapping HSAFM images, strands of virtually any length can be imaged and traced.24,58 HSAFM images can also convey more than length measurement: as we previously showed,24 labeling a target motif with programmable Cas9 and imaging to reveal the Cas9 molecules bound to strands allows on-target strands to be distinguished from off-target, improving specificity.
In mixed cell line and clinical samples containing FLT3-ITDs, dPCR-HSAFM successfully reported variant length and allele fraction. Using high-throughput scanning and automated tracing of DNA strands, we acquired a large data set that facilitated robust analysis by Bayesian inference. The scale and single-molecule resolution of the data, plus the ability of the method to report VAF, make it an attractive alternative to the standard FLT3-ITD clinical assay, which returns bulk fluorescence data and cannot quantify the VAF with confidence. Accuracy and reproducibility of dPCR-HSAFM results were similar to that of wide-range capillary gel electrophoresis. With single-molecule sensitivity before and after amplification, the ability to image molecules of virtually any length, high throughput, and low cost, dPCR-HSAFM serves as a valuable tool for the diagnosis of FLT3-ITDs and other length polymorphisms.
Nanotechnology 18, 044030 (2007).
While the invention has been described in terms of its preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims. Accordingly, the present invention should not be limited to the embodiments as described above, but should further include all modifications and equivalents thereof within the spirit and scope of the description provided herein.
This application claims priority to U.S. Provisional Application 63/045,877 filed on Jun. 30, 2020. The complete content thereof is herein incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/039746 | 6/30/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63045877 | Jun 2020 | US |