This invention relates to methods and reagent kits for isolating and amplifying dsDNA fragments with sequence specificity from a larger dsDNA piece or genomic dsDNA. One of the applications of the invention is for targeted genomic enrichment, for example, to isolate a DNA region of interest from the whole genome for DNA sequencing.
The advancement in next-generation sequencing technologies has improved our ability to sequence large genomes at a lower cost and faster speed than ever before. However, it is still not feasible to apply whole genome sequencing routinely in clinical settings. The primary reason is that the cost and time of sequencing the entire genome with an accuracy level sufficient to call the variant of interest is still prohibitively high. Contrary to common conception that a person only needs to have his/her gene sequenced once in a lifetime, sequencing may be required multiple times each for a specific purpose. For examples, in cancer diagnostics, heterogeneous cell populations such as tumor cells and normal cells would be sequenced at the same time. In analyzing disease progression, cells from the same source may need to be sequenced at different times. Sequencing may also be applied in prenatal diagnostics to specific cell populations.
In many applications, the goal is only to get an accurate picture of a certain region or regions of the genome of these particular cell populations. Without isolating the specific genomic region, whole genome sequencing is not only wasteful, but also causes delay and inaccuracy. Therefore, a genomic enrichment method that allows isolation of a specific region or regions of interest will lower the cost of sequencing, improve accuracy, and cut time to result significantly.
A number of methods have been used for genome enrichment. One method is PCR based, in which multiple PCR primers are designed and tested. However, PCR amplification and normalization process is labor intensive, and as a result, this method cannot be applied universally. In addition, PCR can only be used for DNA fragments of certain limited size ranges, and complexity of the genome makes it hard to achieve high multiplex PCR with consistent result. A second method is based on sequence specific ligation followed by universal PCR. Again, ligation probe design, process optimization, and size limitation make it less than ideal. A third method is microarray hybridization based. The genomic DNA is sheered into small pieces, and a subset of genomic DNA sequences is captured based on complementary sequence identity. The captured DNA fragments are then taken through the typical library construction protocol.
A common characteristic of the existing targeted genomic enrichment methods is that the DNA region of interest, if more than a few hundred bases long, is captured in small fragments no more than a few hundred bases long. In PCR based methods, the length of each fragment is limited by the ability to reliably and consistently PCR amplify the fragment and is generally a few hundred bases long. In hybridization methods, the genomic DNA is randomly sheared into pieces of about a few hundred bases long, and then each piece is captured through hybridization. There are many inherent problems with capturing a long DNA region of interest in small fragments: (1) not all fragments are captured with the same efficiency, and some fragments may be missed altogether, and (2) many probes will have to be designed and made to cover the entire length of the region of interest, resulting in higher cost. Additionally, PCR may introduce errors to the amplified fragments. For hybridization, the specificity is low, and the processing time is long.
The key to overcoming the shortcomings of the existing targeted genomic enrichment methods is to be able to specifically cleave and isolate a long DNA region of interest in large fragments, preferably in one whole piece, rather than isolating many short fragments like in current methods. This requires the ability to (1) cleave a target DNA with sequence specificity at predetermined sites, and (2) isolate the cleaved DNA region of interest.
Described herein are methods and reagent kit for cleaving and purifying a DNA fragment of interest 5×102−1×108-base pairs long, enabling targeted genomic enrichment and selective genomic sequencing with higher specificity, simpler work flow, and lower cost. Central to the invention is a sequence specific DNA nuclease that is capable of cleaving the target DNA with sequence specificity, the protection of targeted DNA segments after the nuclease cleavage, the removal of non-target DNAs using exonuclease, and the isolation of the cleaved DNA fragment. Sequence specific means that the engineered nuclease is capable of cutting DNA with sequence specificity of eight base pairs or better. A specific sequence must be present for the engineered nuclease to cut. The cutting point may or may not be precisely at any particular base, but will be at close to where it is directed by the targeting sequence. Non-specific background cutting may also be present.
In an embodiment of the invention, a method utilizes a sequence specific DNA nuclease. The nuclease includes one or more targeting oligonucleotides. The nuclease is capable of cutting a target double-stranded DNA with sequence specificity greater than eight base pairs long. The cleaved DNA have cohesive end. The cleaved DNA segment of interest has cohesive ends flanking the target DNA. In another embodiment, a method for cutting out a DNA fragment of interest from a target DNA includes: cleave target DNAs with a sequence specific DNA nuclease described above; modified nucleotides are added to the cohesive ends by polymerases, or linkers with modified bases are added to the ends by ligases; removing non-modified DNAs with exonucleases; and tagging enriched target DNAs for further purification and manipulations.
In yet another embodiment, a reagent kit includes a sequence specific DNA nuclease, wherein the DNA nuclease is capable of cutting a target double-stranded DNA with sequence specificity greater than eight base pairs long; a polymerase wherein is capable of adding modified nucleotides to the cohesive ends; or a ligase wherein is capable of adding linkers with modified bases to the target DNAs; a exonuclease wherein is capable of removing non-target DNAs from the reaction mix; and an enzyme wherein is capable of adding tags to the enriched target DNAs.
The term “Cas12a-associated guide RNA” refers to the RNA oligonucleotides that binds Cas12a protein and recognizes the target DNA region of interest and directs the Cas12a nuclease there for editing.
The term “Cas12a protein” refers to the Cas12a (CRISPR associated protein 12a, previously known as Cpf1) and it is a subtype of Cas12 proteins and forms part of the CRISPR system in some bacteria and archaea.
The terms “cohesive” and “cohesive ends” refers to double-stranded DNAs (dsDNAs) that have unpaired (single-stranded) DNA nucleotides on either the 5′- or 3′-strand, and are known as overhangs. This is illustrated in
The terms “cut,” “cutting,” cleave,” and “cleaved” refers to making breaks in dsDNA strands through the sugar-phosphate backbone of DNA strands.
The term “enrich” or “enriched” refers to an increase in the concentration of a DNA fragment of interest compared to the non-interested DNAs within a sample of DNA.
The term “exonuclease” refers to an enzyme capable to digesting single-and/or double-stranded DNA Including, but not limited to exonuclease III, T7 exonuclease, exonuclease V, exonuclease VIII, Lambda exonuclease, T5 exonuclease, nuclease Bal-31, their variants and truncated forms and combinations thereof.
The term “genomic DNA” as used herein refers to double-stranded DNA (dsDNA) from a cell, tissue or culture sample of procaryotic or eukaryotic origin.
The term “guide RNA(s)” (gRNA) refers to a piece of RNA that functions as a guide for RNA- or DNA-targeting enzymes, e.g., including but not limited to CRISPR endonucleases, with which it forms complexes. The complexes are able to cleave DNA as specific sites on, within, at or flanking a target DNA sequence within a genomic DNA sample.
The terms “ligase” and “ligases” refers to an enzyme(s) capable of attaching an oligonucleotide and/or nucleotide sequence to either the 5′ or 3′ end of a DNA and/or RNA molecule or sequence. The ligase can include, but is not limited to, T4 DNA ligase, T4 DNA ligase 2, T4 RNA ligase1, T4 RNA ligase 2, SplintR ligase, RtcB ligase, T3 DNA ligase, Taq DNA ligase, 9° N DNA ligase, E. coli DNA ligase, as well as their variants and truncated forms and combinations thereof.
The term “ligate” refers to the covalent linking of two ends of DNA or RNA molecules.
The term “linkers” refers to the double-stranded DNA or RNA molecules that can be covalently linked to the ends of double-stranded DNA or RNA molecules.
The term “modified” refers to oligonucleotides, nucleotides and the like that are chemically modified in their triphosphate moiety, sugar moiety, or in their bases. These nucleotides include, but not limited to, alpha-phosphorothioate nucleotide triphosphates, morpholino triphosphates, peptide nucleic acids, peptide nucleic acid analogs, and sugar modified nucleotide triphosphates and combinations thereof.
The term “pre-defined” refers to defined, or established in advance.
The terms “protect” and “protected” refers to keeping safe or shielded from unwanted treatment by an enzyme, including but not limited to, digestion and/or another chemical, physical or mechanical means of treatment or exposure.
The terms “specific” and “specificity” refers to quality of belonging or relating uniquely to a target DNA sequence or DNA sequence fragment.
The term “sequence specificity” refers to a clearly defined region of a DNA or RNA chain/sequence.
In one embodiment disclosed is a method of target DNA enrichment. In a single sample having at least one target DNA fragment sequence and at least one specific DNA nucleases with targeting oligonucleotides (ON) that are homologous to their respective selected binding sites on a target double-stranded DNA (“dsDNA”) (as illustrated in
In one aspect of the embodiment, as shown in
As used herein, a Cas12a variant includes a mutant of Cas12a that maintains part or all of Cas12a functions, or a Cas 12a homolog derived from a common ancestor that performs the same or similar function as Cas12a. As illustrated in
In yet another aspect of the embodiment, multiple pairs of sequence specific DNA nucleases are employed in one reaction. Thus, multiple, different sequence specific DNA fragments covering multiple, sequence specific regions of DNA sequences of interest may be cut and isolated in a multiplex reaction within a single vial.
In another aspect of the embodiment, multiple pairs of sequence specific DNA nucleases are employed in one reaction to cut out the same DNA sequence of interest from the same target DNA, but at different cutting points, resulting in multiple fragments of interest all including the same DNA region of interest. By carrying out such redundant cuts for the same DNA sequence of interest, the overall efficiency, i.e. percentage of target DNA cut, may be increased. By combining the forgoing two embodiments as illustrated in
As Cas12a endonuclease generates cohesive ends after cleavage, modified nucleotides are added to the cohesive ends of the cleaved DNA segment of interest by polymerases 106 (
The modified nucleotides can include, but are not limited to, triphosphates comprising alpha-phosphorothioate nucleotide triphosphates, morpholino triphosphates, peptide nucleic acids, peptide nucleic acid analogs, and/or sugar modified nucleotide triphosphates.
In the enrichment step, non-modified DNAs are digested with exonucleases, including but not limited to, exonuclease III, T7 exonuclease, exonuclease V, exonuclease VIII, Lambda exonuclease, T5 exonuclease, nuclease Bal-31, as well as variants and truncated forms.
All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.
Enrichment of Plasmid pGEM3Zf DNA With the Claimed Method
This scheme has been first experimentally demonstrated in the plasmid pGEM-3Zf. In an experiment as illustrated in
Isolation of ABL1 as the Target Gene From Human Genomic dsDNA Using With the Claimed Method
To demonstrate that the CRISPR/Cas12a based enrichment strategy can be used to selectively enrich long DNA sequences from the actual human genome, we selected human ABL1 as the target gene for the confirmatory experiments. ABL1 gene is approximately 175-kb long, consisting of multiple introns and exons segments as illustrated in
To cleave ABL1 gene from within human genomic DNA, 1 μg total DNA was isolated from cultured HEK293 cells treated with CRISPR/Cas12a and ABL-5′-guide RNA and ABL-3′-guide RNA complexes for 30 min at 37° C. After the genomic DNA was cleaved with CRISPR-based Cas12a/guide RNA complexes, a mixture of dNTP analogues at final concentrations of 25 μM each and 10 Units of Taq DNA polymerase were added to the cleavage reaction mixture. After 30 min. incubation at 72° C., the reaction products were purified with ZYMO DNA Clean Kit (ZYMO Research, Tustin, CA, USA) by following the manufacturer's instructions. Purified DNA products were eluted with TE buffer to a final concentration of 20 ng/μl.
After the 5′-overhang ends of the CRISPR/Cas12a cleaved DNA fragments were filled in with alpha-phosphorothioate deoxyribonucleotides and purified with ZYMO DNA Clean Kit, the isolated DNA products were subjected to treatment with exonuclease III (New England Biolabs) following the manufacturer's instructions. Each reaction was carried out in a total volume of 20 μl containing ˜0.1 μg of isolated DNA and 50 Units of exonuclease III. The reaction mixtures were incubated for 30-60 min. at 37° C. followed by incubation at 70° C. for 15 min. to inactivate remaining exonuclease activity. The final products were then analyzed by real-time quantitative PCR assays for genomic DNA samples.
In the TaqMan real-time qPCR assay, each qPCR reaction consists of 5 min. initial incubation at 94° C. followed by 40 amplification cycles of 10 seconds at 94° C., and 40 seconds at 60° C. per cycle. The qPCR has sequence-specific primers/probe for ABL1 (probe labeled with Cy5) and housekeeping gene GAPDH (probe labeled with FAM) probes, respectively, so the assay is able to detect both genes simultaneously within a single reaction. As shown in
The relative changes in Ct value between the TaqMan assays (ΔCt=0.4 for ABL1 and ΔCt=9 for GAPDH with and without exonuclease treatment) were determined by Ct values from 5A and 5B, As the Ct value of GAPDH difference is greater than 8, it was estimated that there is at least 100-fold enrichment for ABL1 relative to GAPDH in this experiment (28=256 fold difference as each unit of Ct value difference represents 2 fold difference in concentration). These results demonstrated the claimed method can isolate and enrich large genomic dsDNA segments from human genomic DNA efficiently.
In a further embodiment, the target DNAs can be modified in the enrichment step by adding linkers or tags including, but not limited to, e.g., biotin or another affinity tag used to bind the target DNA to a solid support and pull down the target DNA from the reaction solution mix. The resulting isolated target DNA can undergo further purifications and manipulations for e.g. sequencing analysis, as is known to the skilled artisan.
While embodiments and applications of this disclosure have been shown and described, the terms, and the following claims should not be construed to limit the claims to the disclosed specification's specific embodiments. It would be apparent to those skilled in the art that many more modifications and improvements than mentioned above are possible without departing from the inventive concepts herein. All possible embodiments along with the full scope of equivalents should be construed for the disclosed terms and to which such claims are entitled such that the claims are not limited by the disclosure.
This application claims the benefit of priority to U.S. Provisional application No. 63/596,263, filed Nov. 4, 2023, entitled “METHOD AND REAGENT KIT FOR TARGETED GENOMIC ENRICHMENT”, and which is hereby incorporated by reference in its entirety. All references cited and discussed in this specification are incorporated herein by reference in their entireties and to the same extent as if each reference were individually incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63596263 | Nov 2023 | US |