The present invention relates to the measuring or testing processes involving nucleic acid. In particular, the present invention relates to the detection, quantification, and identification of DNA.
Detection and quantification of rare genetic events, including low level microbial DNA, is complicated by nature. Typically, high-throughput detection methodologies, which are characterized by an error rate of 0.1-1%, with every 1 of 100 or 1000 bases being called incorrectly due to artifacts introduced during sample preparation and sequencing, are needed to detect and quantify rare genetic events. High-throughput detection methodologies known in the art, however, require repeated sampling or deep sequencing of a large number of molecules, that may not be readily possible due to limitations of sample input amount. To overcome limitations of sample input, the person skilled in the art typically would have to amplify the nucleic acid sequences present in the sample. However, it is generally accepted that amplification methods known in the art are not reliable and do not retain the degree of accuracy demanded for the detection of genomic alterations that occur at extremely low frequencies (i.e. <1%) in the background of otherwise unchanged DNA.
Additionally, conventional methods for simultaneously evaluating point mutations, small INDELs and structural variants make use of the hybridization-based approach capture methods which tend to capture off-target regions besides (or in addition to) sequences targeted by capture probes. These off-target regions consume sequencing capacity which is undesirable from the viewpoint of cost-reduction and simplification of analytical methods. Hybridization methods also take much longer for library preparation and have lower specificity of target capture with off-target regions being captured by the hybridization probes. On the other hand, conventional methods for target capture using forward and reverse primers flanking the target loci, are limited to being able to capture only structural variants with previously known or characterized breakpoints. For the detection of genomic rearrangements with unknown fusion partners, the conventional method (e.g. a pure PCR-based approach) is therefore not applicable. Therefore, there is a need for an alternative method for capturing and identifying distinct targets within a DNA sample. The method should seek to retain specificity of target capture while being able to identify targets of multiple classes.
Thus, the method of the present invention seeks to impart specificity of target capture while not being limited to capturing target regions with previously known sequence changes. The present invention also seeks to provide an alternative method of detecting and/or quantifying genetic alterations that address reliable detection and a system of verification to ensure errors that occur during amplifications are removed from further processing.
In one aspect, the present invention provides a method of simultaneously capturing and identifying distinct targets within a DNA sample, wherein the distinct targets comprise a defined target region and an undefined target region, wherein the undefined target region comprises structural variations or rearrangement or fusion, comprising the steps of:
In one embodiment, the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 10 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides. In another embodiment, the barcode sequence is an oligonucleotide comprising 10 random nucleotides.
In yet another embodiment, the primer A, the primer B, the primer C and/or the double stranded oligonucleotide further comprises an adapter sequence.
In yet another embodiment, the structural variation is selected from the group consisting of deletion, duplication, insertion, inversion, transversion, and translocation.
In yet another embodiment, the sequencing result is further used to detect a point mutation within the undefined target regions.
In yet another embodiment, step o further comprises:
In yet another embodiment, the length of the target-specific sequence A, the target-specific sequence B, and/or the target-specific sequence C is from 16 nucleotides to 30 nucleotides, or from 19 nucleotides to 29 nucleotides, or from 20 nucleotides to 28 nucleotides, or from 21 nucleotides to 27 nucleotides, or from 22 nucleotides to 26 nucleotides, or 16 nucleotides, or 17 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides, or 27 nucleotides, or 28 nucleotides, or 29 nucleotides, or 30 nucleotides.
In yet another embodiment, the separation molecule is selected from the group consisting of biotin, digoxigenin (DIG), and Fluorescein isothiocyanate (FITC). In yet another embodiment, the separation molecule is biotin.
In yet another embodiment, the bead that binds the separation molecule comprises streptavidin, anti-digoxigenin, or anti-FITC. In yet another embodiment, the bead that binds the separation molecule comprises streptavidin.
In yet another embodiment, the DNA sample is obtained from a subject having and/or suspected of having a disease. In yet another embodiment, the disease is cancer or infectious disease. In yet another embodiment, the cancer is selected from the group consisting of lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, and gastrointestinal cancer. In yet another embodiment, the infectious disease is viral infection and bacterial infection.
In yet another embodiment, the DNA sample is a liquid sample, a tissue sample, or a cell sample. In yet another embodiment, the liquid sample is bodily fluids selected from the group consisting of blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, and pancreatic juice. In yet another embodiment, the bodily fluid is blood. In yet another embodiment, the tissue sample is a frozen tissue sample or a fixed tissue sample.
In yet another embodiment, the length of the DNA fragment A and/or the DNA fragment B is from 80 base pairs to 220 base pairs, or from 90 base pairs to 210 base pairs, or from 100 base pairs to 200 base pairs, or from 110 base pairs to 190 base pairs, or from 120 base pairs to 180 base pairs, or from 130 base pairs to 170 base pairs, or from 140 base pairs to 160 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs, or about 180 base pairs, or about 190 base pairs, or about 200 base pairs, or about 210 base pairs, or about 220 base pairs. In yet another embodiment, the length of the DNA fragment A and/or the DNA fragment B is about 150 base pairs.
In yet another embodiment, the amount of DNA sample is from 10 ng to 200 ng, or from 20 ng to 190 ng, or from 30 ng to 180 ng, or from 40 ng to 170 ng, or from 50 ng to 160 ng, or from 60 ng to 150 ng, or from 70 ng to 140 ng, or from 80 ng to 130 ng, or from 90 ng to 120 ng, or from 100 ng to 110 ng, or about 10 ng, or about 20 ng, or about 30 ng, or about 40 ng, or about 50 ng, or about 60 ng, or about 70 ng, or about 80 ng, or about 90 ng, or about 100 ng, or about 110 ng, or about 120 ng, or about 130 ng, or about 140 ng, or about 150 ng, or about 160 ng, or about 170 ng, or about 180 ng, or about 190 ng, or about 200 ng. In yet another embodiment, the amount of DNA sample is about 100 ng.
In yet another embodiment, the DNA sample is selected from the group consisting of a eukaryotic DNA sample, a prokaryotic DNA sample, a viral DNA sample, and a mixture thereof. In yet another embodiment, the prokaryotic DNA sample is a bacterial DNA sample.
In yet another embodiment, the eukaryotic DNA sample is selected from the group consisting of a protozoa DNA sample, a fungal DNA sample, an algae DNA sample, a plant DNA sample, and an animal DNA sample. In yet another embodiment, the animal DNA sample is a mammalian DNA sample. In yet another embodiment, the mammalian DNA sample is a human DNA sample. In yet another embodiment, the DNA sample is a cell free DNA or DNA of a lysed cell.
Advantageously, the method described herein allows for simultaneous capture and identification of both defined target regions and undefined target regions within a DNA sample, which increases efficiency of the detection, quantification, and identification of DNA.
Advantageously, the method described herein does not require initial splitting of the sample at the target capture step, and a single sample is used for capturing both the defined target region and the undefined target region. Thus, the copy number of the DNA fragments that can be accessed by both the primer that targets the defined target region (i.e. primer A) and the primer that targets the undefined target region (i.e. primer B) is not reduced. Accordingly, the method achieves high sensitivity and specificity.
Advantageously, the method described herein is able to achieve simultaneous detection of: 1) Viral DNA; 2) Microsatellite instability; 3) Structural rearrangements; 4) SNVs and INDELs from samples ranging from cfDNA from plasma (or cerebrospinal fluid, pleural effusion) or DNA from fixed tissue.
In another aspect, the present invention provides a kit comprising a plurality of primer A as defined herein, a plurality of primer B as defined herein, a plurality of primer C as defined herein, a bead that binds the separation molecule as defined herein, and a double stranded oligonucleotide as defined herein. In yet another embodiment, the kit further comprises a DNA polymerase, a Taq polymerase, a ligase, and a plurality of deoxyribonucleotide triphosphate (dNTPs).
The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
The platform technology allows the simultaneous capture of targeted regions of the human and/or viral genome, as defined by pairs of primers, and of regions not defined by primers pairs, allowing the capture of genomic regions undergoing alterations at unspecified locations within a defined region of interest. In the capture step, a unique molecular tag (i.e. barcode sequence) is attached to each target DNA molecule being captured. The molecular tag (i.e. barcode sequence) allows the tracking of each target DNA fragment as it undergoes sequencing to form a DNA library. The presence of a molecular tag (i.e. barcode sequence) is detected using bioinformatics methods known in the art to count and assign each target DNA sequence from high-throughput sequencing to an original DNA molecule from the sample, carrying the same molecular tag (i.e. barcode sequence). The molecular tags (i.e. barcode sequence) are used to define molecular families, each member of which should carry the exact same sequence unaltered by the processes of capture and conversion to DNA library. Molecular families are then considered together for each region of interest to identify deviations from the expected DNA sequence. Precise deviations in the original nature of DNA sequence are detectable from the application of the ‘agreement rule’ within molecular families, where lack of agreement among members of each molecular family would result in the entire family being removed from consideration in molecular counts of a region of interest. In the absence of this rule, deviations within molecular families would erroneously lead to the conclusion of a sequence variant, when in fact the disagreement most likely arose from an inevitable process error.
Similarly, tags (i.e. barcode sequence) defining molecular families are also used to determine the number of unique molecules corresponding to each region of interest. Therefore, detection and accurate quantification of rare variants becomes possible through the precise and confident detection of molecules with variant sequence and those without. As exemplified in the Experimental Section, the method as described herein is also capable of detecting non-human genomic sequences such as microbial DNA in a mixture with human DNA.
The present invention can also be broadly illustrated by the following features. Firstly, a group of primers will bind to DNA fragments comprising the defined (or fully defined) target regions and another group of primers will bind to DNA fragments comprising the undefined (or partly defined) target regions. Secondly, the primers that annealed to the DNA fragments comprising part of the defined target region (i.e. product A) are separated from the primers that annealed to the DNA fragments comprising part of the undefined target region (i.e. product B). Thirdly, upon separation, the two products will undergo two different treatments. For product A, a reverse primer will be added. For product B, a double stranded oligonucleotide is added and ligated to the end that is not connected to the separation molecule that binds the separation beads in an earlier separation step. Fourthly, product A and product B that has been processed are recombined, amplified together, and the resulting amplicons are sequenced.
The method of the present invention is advantageous because it allows for simultaneous capture and identification of both the defined (or fully defined) target regions and the undefined (or partly defined) target regions (i.e. target regions that are prone to undergo sequence changes which are not previously characterized). The simultaneous capture allows for lesser DNA samples to be used. The reason for having a separate method for the undefined (or partly defined) target regions is that these regions cannot be captured by a pair of primers because the sequence changes can happen at positions within the target that cannot be known when the target capture is being performed (i.e. the precise location and sequence change is unknown). Because the location and the sequence change is unknown, it is not possible to use a pair of primers flanking the target region, as happens in conventional methods. Further, the use of primers and polymerase-mediated extension affords for greater specificity of target capture, compared to conventional methods based on probe hybridization.
Further to the above, another advantage that the present invention has is that despite separate workflows for converting the defined (or the fully defined) targets and the undefined (or the partly defined) targets into sequencing libraries, the method does not require initial splitting of the sample. By not requiring such splitting, the copy number of the DNA fragments that can be accessed by both the primer that targets the defined target region (i.e. primer A) and the primer that targets the undefined target region (i.e. primer B) is not reduced.
Thus, in one aspect, the present invention provides a method of simultaneously capturing and identifying distinct targets within a DNA sample, wherein the distinct targets comprise a defined (or a fully defined) target region and an undefined (or a partly defined) target region, wherein the undefined (or the partly defined) target region comprises structural variations or rearrangement or fusion, comprising the steps of:
In one example, the present invention provides a method of simultaneously identifying a defined region and an undefined region within a DNA sample, wherein the undefined region comprises a structural variation, comprising the steps of:
For example, the method as described herein is illustrated by the schematic diagrams presented in
As used herein, the term “defined region” is defined as a region in a DNA fragment that is free of structural variations that may be found in the undefined region (i.e. structural variations that are not previously characterized). That is, the “defined region” comprises a region of DNA fragment that structurally is identical to or substantially the same as DNA fragments from a reference sequence. In other words, a “fully defined target region” is a target for which the sequence identity (i.e. the start and end of the target) are fully defined prior to capture. In the present disclosure, the term “defined region”, “defined target region”, and “fully defined target region” are used interchangeably. Thus, it would be understood by the person skilled in the art that the term “undefined region” would encompass a region of DNA fragment that has structural variations that are not previously characterized. In other words, “partly defined target region” is a target for which the sequence identity is not fully defined prior to target capture and comprises target region prone to undergo sequence changes (such as structural rearrangements). It is appreciated that the precise sequence composition of a “partly defined target region” cannot be predetermined and thus it may be impossible to design a pair of defining primers for such region. The sequence definition of an “undefined region”, or a “partly defined target region”, such as detection of genomic rearrangements with unknown fusion partners, is determinable only once the sequencing results are obtained. It would also be apparent to the person skilled in the art that the defined region and undefined region would have different DNA sequences. Thus, in some examples, the target specific sequence A and the target specific sequence B do not overlap. As would be understood by the person skilled in the art, the term “undefined target region” does not mean that 100% of the DNA sequence within the target region is unknown in the art. As used herein, the “undefined target region” refers to a target region wherein about 5%, or about 10%, or about 20%, or about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 80%, or about 90%, or about 95% of the DNA sequence within the target region is unknown in the art. In the present disclosure, the term “undefined region”, “undefined target region”, and “partly defined target region” are used interchangeably. As used herein, the term “barcode sequence” is a commonly used term in the art of nucleic acid sequencing and used within the definition as known in the art. Thus, the term “barcode sequence” refers to the encoded molecules or barcodes that include variable amount of information within the nucleic acid sequence. For example, the barcode sequence is a tag that can be read out using any of a variety of sequence identification techniques, for example, nucleic acid sequencing, probe hybridization based assay, and the like. In some examples, the barcode sequence is used in the method as described herein to append different target specific sequences, such that when the barcode sequence and target specific sequence anneal to the (target) DNA fragment, each different (target) DNA fragment would then have a unique barcode sequence that is attached to it and read out with the sequence of the (target) DNA fragment from that sample. The barcode sequence allows the pooled analysis of multiple unique DNA fragments, where the resulting sequence information from the pool can be later attributed back to each starting DNA fragment. That is, after the process of amplification, the barcode sequence is used to group amplicons to form a family of amplicons having the same oligonucleotide with a randomly assigned nucleic acid sequence (i.e. same barcode oligonucleotide). In some examples, the barcode sequence is an overhang that does not complement any sequence within DNA fragment A and DNA fragment B. In some examples, the barcode sequence may be an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 8 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides. In one example, the barcode sequence is an oligonucleotide comprising 10 random nucleotides. As exemplified in the Experimental Section, the barcode sequence may be defined as NNNNNNNNNN (SEQ ID NO: 1), which may have the sequences such as, but is not limited to, CATTACATAC (SEQ ID NO: 2), GCGTGGACAA (SEQ ID NO: 3), TTTTTAGACA (SEQ ID NO: 4), TAAGAGGTCC (SEQ ID NO: 5), and the like.
As used herein, the term “at the 3′ end” corresponds to the last nucleotide of a single DNA strand. As used herein, the term “close to the 3′ end” corresponds to a distance of from 1 to 100 nucleotides, or from 5 to 90 nucleotides, or from 10 to 80 nucleotides, or from 15 to 70 nucleotides, or from 20 to 60 nucleotides, or about 1 nucleotides, or about 5 nucleotides, or about 10 nucleotides, or about 15 nucleotides, or about 20 nucleotides, or about 25 nucleotides, or about 30 nucleotides, or about 35 nucleotides, or about 40 nucleotides, or about 50 nucleotides, or about 60 nucleotides, or about 70 nucleotides, or about 80 nucleotides, or about 90 nucleotides, or about 100 nucleotides from the 3′ end of a single DNA strand. In one example, when the term “close to the 3′ end” is used to define a reverse primer, the binding site of the reverse primer (for example, primer C) is predetermined such that the overall length of the target region defined by combination of the forward primer (for example primer A) and the reverse primer is from 80 base pairs (bp) to 200 bp, or from 100 bp to 180 bp, or from 120 bp to 160 bp, or from 140 bp to 150 bp, or about 80 bp, or about 90 bp, or about 100 bp, or about 110 bp, or about 120 bp, or about 130 bp, or about 140 bp, or about 150 bp, or about 160 bp, or about 170 bp, or about 190 bp, or about 200 bp.
In regard to step i of the present invention (i.e. the step of connecting a single nucleotide to the 3′ end of the single stranded elongated primer B of the double stranded complex B in the mixture B), a person skilled in the art is aware that the single nucleotide that is to be connected with the 3′ end of the single stranded elongated primer B can be any nucleotide. In one example, the single nucleotide may include, but is not limited to, adenine (A), cytosine (C), guanine (G), thymine (T), and the like. In one example, wherein when the single nucleotide to be connected is adenine (A), Taq polymerase is used and the connecting step is known as “A-tailing”. The A-tailing step exploits the intrinsic terminal transferase activity of Taq polymerase by which it catalyzes the template-independent addition of an adenine residue to the 3′ end of both strands of DNA molecules. In the presence of a mixture of four dNTPs, dA is added preferentially to 3′ end of DNA molecule by Taq polymerase. Other nucleotides can be added but would require differing reaction conditions for Taq activity. Therefore, under standard reaction conditions, in the presence of dNTPs, Taq polymerase will preferentially incorporate dA to the 3′ end of the DNA molecules.
As the method as described herein utilises sequencing platforms/methods known in the art, it would be apparent to the person skilled in the art that the DNA fragment processed through the steps of the method as described herein may have to be prepared to comprise additional nucleic acid sequences recognised by the sequencing platforms/methods (i.e. adapter sequences). Thus, in some examples, the primer A, the primer B, the primer C and/or the double stranded oligonucleotide further comprises an adapter sequence.
As used herein, the term “adapter sequence” refers to an oligonucleotide sequence bound to the 5′ and 3′ end of each DNA fragment in a sequencing library. The adapter sequences are complementary to the plurality of oligonucleotide present on the surface of flow cells of the sequencing tools thereby allowing the DNA fragment to attach to the sequencing tools. In some examples, when the sequencing utilized is Illumina Sequencing (i.e. Illumina® sequencing technology), the adapter may be a universal P5 adapter as follows: AATGATACGGCGACCACCGAGATCT (SEQ ID NO: 13), and/or an indexed P7 adapter as follows: CAAGCAGAAGACGGCATACGAGAT (SEQ ID NO: 14) (see Table 1).
As described herein, the distinct targets within the DNA sample that can be simultaneously captured and identified by the method of the present comprises a defined target region (or a fully defined target region) and an undefined target region (or a partly defined target region). In one example, the undefined target region (or the partly defined target region) comprises structural variations or rearrangement or fusion, which are not previously characterized. In one example, the undefined target region (or the partly defined target region) is prone to undergo a structural rearrangement or sequence changes. As used herein, the term “structural variations” refers to variations in the structure of the genome—i.e. in the order of sections of the DNA (as opposed to the smaller variation to the sequence alone which maintains the overall order to the DNA sections with respect to the genome). As used herein, the term “rearrangement” refers to—rearrangements in the order of sections of the DNA (interchangeable with “structural variations”). As used herein, the term “fusion” refers to structural variants produced through interchromosomal or intrachromosomal rearrangements. In one example, the structural variations may include, but are not limited to, deletion, duplication, insertion, inversion, transversion, translocation, and the like. As used herein, the term “deletion” refers to a sequence change where more than 50 nucleotides are removed. As used herein, the term “duplication” refers to a sequence change where a copy of one or more nucleotides are inserted directly 3′-flanking of the original copy. As used herein, the term “insertion” refers to a sequence change where more than 50 nucleotides are inserted between two nucleotides but where the insertion is not a copy of a sequence immediately 5′-flanking. As used herein, the term “inversion” refers to a sequence change where more than one nucleotide replacing the original sequence are the reverse complement of the original sequence. As used herein, the term “translocation” refers to rearrangement of parts between non-homologous chromosomes, which can result in “fusion”.
As would be apparent to the person skilled in the art, the method as described herein can also be used to detect single nucleotide variations such as substitution. In some examples, the sequencing result is further used to detect a single nucleotide variation. In some examples, the sequencing result is further used to detect a single nucleotide variation within the undefined target region (or the partly defined target region). In some examples, the sequencing result is further used to detect a single nucleotide variation within the defined target region (or the fully defined target region). As used herein, the term “single nucleotide variation”, “single nucleotide sequence variation”, and “point mutation” may be used interchangeably.
In one example, the defined target region (or the fully defined target region) comprises single nucleotide sequence variations, small insertion, small deletion, genomic copy number alteration, deletion of homopolymeric region, foreign DNA sequences (e.g. wherein the DNA sample is human DNA, microbial DNA sequences are considered foreign DNA sequence), polymorphisms or single-nucleotide variations in microbial DNA sequence, and the like. In one example, the deletion of homopolymeric region may include but is not limited to microsatellite instability. As used herein, the term “single nucleotide sequence variations” or “single nucleotide variations” refers to variation in a single nucleotide that occurs at a specific position in the genome, differing from the nucleotide defining the position in the reference genome. As used herein, the term “small insertion” refers to a sequence change where less than 50 nucleotides are inserted between two nucleotides but where the insertion is not a copy of a sequence immediately 5′-flanking. As used herein, the term “small deletion” refers to a sequence change where less than 50 nucleotides are removed. As used herein, the term “copy number alteration” refers to the repetition of sections of the genome (duplication) or loss of sections of the genome (deletion). As used herein, the term “deletions of homopolymeric regions” refers to the shortening of a homopolymeric tracts in the genome. An example of “deletions of homopolymeric region” is GCGAAAAAAAAAAAAAAATA becomes GCGAAATA, this a deletion of 12 A's from the the homopolymeric tract of 15 A's. As used herein, the term “polymorphism” refers to a variation in a single nucleotide that occurs at a specific position in the genome, and is a variation in all copies of the organism's genome, differing from nucleotide defining the position in the organism's population (reference). As used herein, the term “microsatellite instability” refers to genetic instability in short nucleotide repeats or microsatellite, which is a tract of tandemly repeated (i.e. adjacent) DNA motif ranging from one to six or up to ten nucleotides, with each motif repeated 5 to 50 repeated times. A person skilled in the art is aware that the sum of all of the variants within the defined target region (or the fully defined target region) is known as total mutation (or variant) load or tumour mutational burden (TMB). A person skilled in the art is also aware that determining the total mutation (or variant) load or tumour mutational burden (TMB) is useful in determining the therapeutic target of certain diseases (such as cancer).
The method of the present disclosure can also be used to detect certain diseases. Thus, in one example, the DNA sample for the method of the present disclosure is obtained from a subject having and/or suspected of having a disease. In some examples, the disease may include, but is not limited to cancer, infectious disease, and the like. In some examples, the cancer may include, but is not limited to, lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, gastrointestinal cancer, and the like. In some examples, the infectious diseases may include, but is not limited to, viral infection, bacterial infection, and the like.
To reduce false positive alterations that typically arise in amplification process, the barcode sequences used in the method as described herein can be used to form subgroups of sequences and to arrive at consensus sequences that are then used for further analysis or determination of whether mutation is truly present in the target DNA fragments. Thus, in some examples, step n further comprises:
As used herein, the term “reference sequence” refers to nucleotide sequences (such as DNA sequences or RNA sequences) known in the art that may be obtainable from public databases.
As used herein, the term “consensus sequence” refers to a nucleotide sequence obtained from consensus calling. In one example, consensus calling is performed by identifying the nucleotide at each position for each sequencing result within the subgroup, comparing the identity for the nucleotide at each position across the plurality of sequencing results, and determining a majority nucleotide at each position. If the majority nucleotide count is above a threshold set for determining majority for specific position, the assignment for said position is the majority nucleotide. If the majority nucleotide count is below this threshold, no assignment is made for said position. The threshold is variable for every position and is a function of the total number of sequencing results corresponding to a specific position.
In some examples, the length of the target-specific sequence A, the target-specific sequence B, and/or the target-specific sequence C is from 17 nucleotides to 31 nucleotides, or from 19 nucleotides to 29 nucleotides, or from 20 nucleotides to 28 nucleotides, or from 21 nucleotides to 27 nucleotides, or from 22 nucleotides to 26 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides, or 27 nucleotides, or 28 nucleotides, or 29 nucleotides, or 30 nucleotides. In some examples, the length of the target-specific sequence A, the target-specific sequence B, and/or the target-specific sequence C is 22 nucleotides. A person skilled in the art is also aware that in order to determine the length of the primer A, the primer B, the primer C, the target-specific sequence A, the target-specific sequence B, and/or the target-specific sequence C, he will have to also consider other primer properties including, but not limited to, melting temperature (or Tm), GC-content (or guanine-cytosine content or GC %) and propensity of a primer to dimerize with other primers and itself.
As used herein, a “separation molecule” refers to a tag or molecule that is capable of binding to a bead to thereby allow for the separation of the nucleotide that is connected to the separation molecule. As illustrated in
In addition, the method as described herein is compatible with multiple sources of DNA material, including circulating DNA from blood plasma or cerebrospinal fluid (CSF), fragmented formalin-fixed paraffin embedded DNA (FFPE DNA), genomic DNA from leukocytes and from other cells. The method as described herein could also cover more than 50 targeted genes, over 500 targeted regions in the human genome and 15 DNA virus families, and is readily expandable for future inclusion of target regions. As the sequencing library is based on the use of primers for the capture of target regions, it works with equivalent specifications on multiple sample types such as circulating DNA and FFPE DNA. For example, primer-based capture of FFPE DNA is not hindered by fragmentation, as long as the expected amplicon size as defined by primers is limited to a reasonably short length of about 160-bp. Up to eight classes of target regions such as single-nucleotide variations or fusions can also be simultaneously captured using the first set of primers from a single sample of DNA. Following the primer-based capture, steps are taken for the completion of amplicons or ends with sequencing adapters and final amplification before high-throughput sequencing. The combination of primer and PCR-based methods for sequencing analysis allows for a smaller input DNA to be worked with without losing sensitivity. As such, the inventors of the present disclosure envisaged that the method as described herein can be performed in a liquid sample or tissue sample. Thus, in some examples, the sample is a liquid sample, a tissue sample, or a cell sample.
In some examples, the liquid sample may include, but is not limited to, bodily fluids such as, but is not limited to, blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, pancreatic juice, and the like. In one example, the bodily fluid is blood. The liquid sample that is useful for the method of the present technology is a liquid that comprises DNA which is circulating and not contained within cells (or cell free DNA). The DNA within the liquid can be isolated from the liquid in a form that is free from impurities (or pure form).
In some examples, the tissue sample may include, but is not limited to frozen tissue sample, fixed tissue sample (such as formalin-fixed tissue sample).
The method of the present invention is optimized for DNA fragments having certain sizes. A person skilled in the art is aware that when the DNA sample comprises full-length DNA, the full-length DNA can be processed and fragmented to certain length that is suitable for the method of the present invention. In some examples, the length of the DNA fragment A and/or the DNA fragment B is from 80 base pairs to 220 base pairs, or from 90 base pairs to 210 base pairs, or from 100 base pairs to 200 base pairs, or from 110 base pairs to 190 base pairs, or from 120 base pairs to 180 base pairs, or from 130 base pairs to 170 base pairs, or from 140 base pairs to 160 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs, or about 180 base pairs, or about 190 base pairs, or about 200 base pairs, or about 210 base pairs, or about 220 base pairs. In one example, the length of the DNA fragment A and/or the DNA fragment B is about 150 base pairs.
Since primers are used to detect target defined or undefined regions, the inventors found the method as described herein to be useful in detecting small DNA sample. Thus, in some examples, the amount of DNA sample may be from 10 ng to 200 ng, or from 20 ng to 190 ng, or from 30 ng to 180 ng, or from 40 ng to 170 ng, or from 50 ng to 160 ng, or from 60 ng to 150 ng, or from 70 ng to 140 ng, or from 80 ng to 130 ng, or from 90 ng to 120 ng, or from 100 ng to 110 ng, or about 10 ng, or about 20 ng, or about 30 ng, or about 40 ng, or about 50 ng, or about 60 ng, or about 70 ng, or about 80 ng, or about 90 ng, or about 100 ng, or about 110 ng, or about 120 ng, or about 130 ng, or about 140 ng, or about 150 ng, or about 160 ng, or about 170 ng, or about 180 ng, or about 190 ng, or about 200 ng. In some examples, the amount of DNA sample is about 100 ng.
Since the method as described herein can be used to detect undefined region that comprises structural variations that are not previously characterized, the DNA sample to be used in the method as described herein may include, but is not limited to, a eukaryotic DNA sample, a prokaryotic DNA sample, a viral DNA sample, and a mixture thereof. In some examples, the prokaryotic DNA sample is a bacterial DNA sample. In some examples, the eukaryotic DNA sample may include, but is not limited to, a protozoa DNA sample, a fungal DNA sample, an algae DNA sample, a plant DNA sample, an animal DNA sample, and the like. In some examples, the animal DNA sample is a mammalian DNA sample (such as human DNA sample). In some examples, the DNA sample may be a cell free DNA or DNA of a lysed cell.
In another aspect, the present invention provides for a kit comprising a plurality of primer A as defined herein, a plurality of primer B as defined herein, a plurality of primer C as defined herein, a bead that binds the separation molecule as defined herein, and a double stranded oligonucleotide as defined herein. In some examples, the kit of the present invention further comprises a DNA polymerase, a Taq polymerase, a ligase, a plurality of deoxyribonucleotide triphosphate (dNTPs). In some examples, the reagents provided in the kit as described herein may be provided in separate containers comprising the components independently distributed in one or more containers. As the method as described herein relates to sequencing (such as high-throughput sequencing), further components required in sequencing processes could be easily determined by the person skilled in the art.
As used in this application, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a primer” includes a plurality of primers, including mixtures and combinations thereof.
As used herein, the terms “increase” and “decrease” refer to the relative alteration of a chosen trait or characteristic in a subset of a population in comparison to the same trait or characteristic as present in the whole population. An increase thus indicates a change on a positive scale, whereas a decrease indicates a change on a negative scale. The term “change”, as used herein, also refers to the difference between a chosen trait or characteristic of an isolated population subset in comparison to the same trait or characteristic in the population as a whole. However, this term is without valuation of the difference seen.
As used herein, the term “about” in the context of concentration of a substance, size of a substance, length of time, or other stated values means +/−5% of the stated value, or +/−4% of the stated value, or +/−3% of the stated value, or +/−2% of the stated value, or +/−1% of the stated value, or +/−0.5% of the stated value.
Throughout this disclosure, certain embodiments may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising”, “including”, “containing”, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.
The invention has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
Other embodiments are within the following claims and non-limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
Material
Exemplary Molecular Tag Complex or Primers when Target is EGFR-exon18_1
An example of a “primer” when the target sequence is EGFR-exon 18 1 (an example of primer A, illustrated in
ACACGACGCTCTTCCGATCT
NNNNNNNNNN
GGTGACCCTTGTCTCTGT
GTTC,
wherein the bases in italic and underline are an example of adapter sequence, the bases in bold represent the barcode sequence and the bases in underline is an example of target specific sequence.
An example of subsequent primers for the “completion of amplicon” (an example of primer C, illustrated in
where bases in underline are target-specific primers.
Expected Amplicon (Only Target-Specific Region)
Product after amplicon completion (in two steps) (Only one strand of the double stranded product is shown.):
TTCttgtcccccccagcttgtggagcctcttacacccagtggagaagct
cccaaccaagctctcttgaggatcttgaaggaaactgaattcAAAAAGA
TCAAAGTGCTGGGCTCAGATCGGAAGAGCACACGTC,
where the bases in underline is target nucleic acid.
Universal amplification primer 1 (an example of the primer for amplifying product C or D):
Universal amplification primer 2 (indexed) (an example of the primer for amplifying product C or D, the index is the bases in bold and italic font):
Final Product (Suitable for Sequencing on Illumina)
cagcttgtggagcctcttacacccagtggagaagctcccaaccaagctct
cttgaggatcttgaaggaaactgaattcAAAAAGATCAAAGTGCTGGGCT
CAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTGATATCTCGTAT
where the bases in underline is target nucleic acid.
Methods
DNA Library Generation
The workflow for preparing DNA library is divided into three major steps. In the first step (
Briefly, in a 50 μl reaction, 10-100 ng of DNA was mixed with a primer pool in which each primer was at 0.05-0.2 μM, 0.2 mM of dNTPs, 0.5-1.5 nM MgSO4, 0.6 units of KOD enzyme and reaction buffer. Target capture and enrichment was done using the following thermocycling conditions: Denaturation at 94° C. for 1 min, followed by 1 to 3 cycles of 98° C. for 1 min, 60° to 65° C. for 6 mins, and 68° C. for 5 mins. The length of the targets captured was dictated by the length of the template DNA fragment, and the extension time allowed, such that a variety of target lengths would be captured in this first step. Three cycles were allowed to compensate for less than 100% efficiency of primer binding to targets, so as to increase target capture. At the end of this reaction, each captured target DNA had a random molecular tag linked to it. Excess unused primers were removed by purification with 1.5×AMPure XP beads in two rounds. This means eluate from the first round of purification was bound to 1.5×beads and subjected to a second round of purification. Final elution was done in 10 to 30 μl of buffer EB.
Product after Step 1 (Examples of Product A and Complex B, Illustrated in
An example with a very short target captured region shown for illustrative purposes is shown on
In the second step (
Targets captured on beads were washed briefly with bead wash (B&W) solution, followed by “on-bead” A-tailing reaction. Briefly, the beads with immobilized targets were resuspended in 10 μl reaction mixture containing 6.4 μl water, 1 μl 10×buffer for KOD-Plus-Neo, 1 μl of 2 mM dNTPs, 0.6 μl of 25 mM MgSO4, and 1 μl of 10× A-attachment mix (Toyobo Co., Ltd., Japan). The beads were incubated at 60° C. for 10 mins to allow A-tailing of the captured, immobilized DNA targets. The beads were washed again with 1× B&W buffer. Following this, the beads were resuspended in a ligation mix to allow “on-bead” ligation of a ds-oligo partial adapter. Briefly, beads were resuspended in a 10 μl reaction mix containing 5 μl of Blunt/TA ligase master mix (NEB, USA), 4 μl of water, and 1 μl of 10 μM adapter with a 3′ T overhang. An example of a 3′ T overhang is shown for example on
The mixture was incubated at 25° C. for 1 hr, with intermittent shaking. At the end of hours, the mixture was chilled on ice. The beads were then washed three times with 1× B&W buffer. At the end of this step, target DNA captured on the beads would have undergone amplicon-generation by the one-sided ligation of the partial adapter. Adapter ligation on the other (immobilized) end was inhibited due to the overhang tail introduced during target capture, and the presence of biotin-streptavidin complex. Finally, the completed amplicons were eluted from the streptavidin beads by disrupting the biotin-streptavidin bonds, by incubating the beads in 10 μl of elution solution (10 mM EDTA pH 8.2 and 95% formamide) at 65° C. for 5 mins to elute biotin labelled targets from the beads. The eluate was collected following magnetic separation of streptavidin beads. The eluate containing captured DNA targets (converted to amplicons) was collected and purified once with 1.5×AMPure XP beads to remove the formamide solution and replace it with EB buffer. DNA was eluted in 11.5 μl Buffer EB.
Targets that were not captured on streptavidin beads, as they lacked biotin tags, were first purified once with 1.5×AMPure XP beads to replace the B&W buffer with sample buffer. DNA was eluted in 23 μl of Buffer EB. Amplicon-generation was then done using a multiplex pool of “reverse” target-specific primers. Briefly, in a 50 μl reaction, purified DNA from target capture step is mixed with a primer pool in which each primer is at 0.05-0.2 μM, 0.5-1.5 mM of dNTPs, 1.5 nM MgSO4, 1 unit of KOD enzyme and reaction buffer. Amplicon generation was done using the following thermocycling conditions: Denaturation at 94° C. for 1 min, followed by 1 to 3 cycles of 98° C. for 1 min, 60° C. for 6 mins, and 68° C. for 5 mins. The completed amplicons were purified twice from the PCR mix with 1.5×AMPure beads. DNA was eluted in 11.5 μl Buffer EB. An example of the product after step 2 if target captured goes through adapter ligation for amplicon generation is shown on
In the third step (
1 μl of 5-20 μM indexed P7 adapter (Table 1) and KAPA HiFi HotStart ReadyMix in a 50 μl reaction. The PCR was carried out with the following profile: Denaturation at 98° C. for 45 s, followed by 22-26 cycles of 98° C. for 15 s, 60° C. for 30 s, and 72° C. for 30 s, with a final extension at 72° C. for 1 min. The amplified library was purified twice with 0.6-0.8×AMPure XP beads to remove non-specific products. The quality and quantity of the sequencing library was assessed using the 4200 Tapestation system (Agilent Technologies, USA) and KAPA Library Quantification Kit for Illumina® Platforms (Kapa Biosystems Inc., USA) respectively. An Example of the product after step 3 is shown on
Libraries were multiplexed and paired-end sequencing (2×150 bp) was done following manufacturer's instructions.
CAAGCAGAAGACGGCATACGAGAT
ATCACG
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
CGATGT
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
TTAGGC
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
TGACCA
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
ACAGTG
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
GCCAAT
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
CAGATC
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
ACTTGA
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
GATCAG
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
TAGCTT
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
GGCTAC
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
CTTGTA
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
AGTCAA
GTGACTGGAGTTC
CAAGCAGAAGACGGCATACGAGAT
AGTTCC
GTGACTGGAGTTC
Data Analysis
FASTQ files were processed using a custom pipeline. First, expected amplicons were identified and labeled in the FASTQ files based on the expected primer sequences in Read 1 and paired Read 2. For amplicons with one unknown end, only primers in Read 1 were used for identification and labeling. Primer sequences and upstream molecular tag sequences were trimmed using cutadapt, primer trimmed sequences were mapped to the reference genome using bwa-mem. For “primer” trimmed fastq files, the name of the primer which had the best match to a read was concatenated to the name of the mapped output reads (for both Read 1 and Read 2). The primer name assigned to Read 1 might not always match that of Read 2, which could be due to overlapping amplicons or non-specific binding. An “amplicon_name” was assigned to each paired read by combining the matching primer name of Read 1 and Read 2 (concatenated by semicolon).
Molecular tag (or barcode) sequences were included in the trimmed “primer” sequences of Read 1, and could be extracted given the unique structure of primer sequences in Read 1. The extracted molecular tag sequences are clustered in two steps: 1. Initial grouping by exact match of the combination of amplicon_name+barcode sequence and 2. Cluster Reassignment, in each group of same amplicon_name, barcodes were further reassigned using global pairwise alignment with maximum 2 base differences between barcodes. Barcode clusters with number of associated reads less than 3 (after cluster reassignment) were considered unreliable clusters and removed from downstream analysis.
Consensus Calling was done for each molecular tag (or barcode) cluster, by first performing global alignment among all associated reads using MAFFT. The consensus base in each aligned position was called by determining the majority representative base type, the percentage of which is no less than an automatically determined threshold. The threshold was a function of the total number of reads for that barcode sequence. If no representative base could be called, the position was assigned N (as opposed to one of A, C, T, G). A new quality score was assigned to each position, which was either 90th percentile of all the quality values from the representative base type in that position (if a consensus base was found), or 10th percentile of all quality values in that position (if no consensus bases was found). The consensus reads were written to a new FASTQ file. An exemplary result of the consensus reads mapped to the reference is shown on
The consensus FASTQ files were mapped to the reference genome, with local realignment to improve mapping. Read depth was calculated from the mapped BAM file in the target regions in the specified .bed file (of expected amplicons or regions). Variant calling was performed on consensus BAM files using Mutect2, lofreq and a custom variant caller. Exemplary result of the library generation is shown on
Exemplary results for variant detection and frequency of clinical samples are shown on Table 2 and exemplary results for detection of Epstein Barr Virus (EBV) microbial DNA targets in clinical samples are shown on Table 3. To generate the results in Table 2 and Table 3, clinical samples which have been previously characterized for EGFR mutations (positive or negative) and EBV DNA (present or absent) by orthogonal methods (such as Quantitative PCR) were identified. Cell-free DNA (cfDNA) was extracted from the same samples which had been selected to have had sufficient plasma. The extracted cfDNA was quantified and processed with the method as described herein to determine if similar results of detection (of EGFR mutations and EBV DNA) with orthogonal methods (such as Quantitative PCR) could be achieved. Tables 2 and 3 summarize the findings of orthogonal methods (such as Quantitative PCR) presented together with findings from the method as described herein. As can be seen in Table 2, 16 clinical samples (plasma) were tested by the method as described herein and by quantitative PCR, respectively, for detection of EGFR mutations (such as small nucleotide variants, and small INDELs) and determination of the frequency of mutations. The result showed 98% concordance of mutation detection and agreement of mutant allele frequency by both methods. The sample numbers in the first column of Table 2 which showed concordance between the conventional method (quantitative PCR) and the method of the present invention are: 1, 2, 3, 4, 5 (for L858R), 6, 7, 8, 9, 10, 11 (for EGFR c.2236_2250del), 12, 13, 14, 15 (for E746_A750delELRA and EGFR T790M), and 16 (for KRAS G12D). In addition, in contrast to quantitative PCR, which is used to detect various mutations in separate reactions (each reaction is used to detect one mutation, i.e. each row in column 2 (labeled “Mutation reported by AS-PCR”) of Table 2 corresponds to one single reaction), the method of present invention is able to simultaneously detect multiple mutations in the same sample, in one single reaction (i.e. all the mutations listed in all the rows in column 6 (labeled “Variant identified by Hallmark”) of Table 2 are detected in one single reaction). As can be seen in Table 3, detection of Epstein Barr Virus (EBV) microbial DNA targets BamHI-W and EBNA1, in clinical samples (plasma) by the method as described herein and quantitative PCR showed 89% concordance of detection. The sample numbers in the first column of Table 3 which showed concordance between the conventional method (quantitative PCR) and the method of the present invention are: 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34 and 35. Additionally, mutations in human DNA were detected. Also, in serial samples from the same individual, matched mutations (such as small nucleotide variants, and small INDELs) were present. Serial samples from the same individual are depicted within a black box and are shaded in grey. Thus, the method of the present inventions is able to simultaneously detect viral DNA and mutations in human DNA. In addition, in contrast to quantitative PCR, which is used to detect various mutations and the viral DNA in separate reactions (each reaction is used to detect one mutation or viral DNA, i.e. each row in column 2 (labeled “EBV BamHI-W”) of Table 3 corresponds to one single reaction), the method of present invention is able to simultaneously detect multiple mutations in human DNA and the viral DNA in the same sample, in one single reaction (i.e. all the mutations listed in all the rows in column 11 (labeled “Mutations”) of Table 3 are detected in one single reaction).
Exemplary result for the summary of Variant allele frequency (VAF) observed using the method of the present invention vs. standards is shown on
Exemplary use of the sequencing results obtained using the method of the present invention for the detection of fusion is shown in
(1) A sample was obtained from cell line DNA with known structural variations, for the purpose of validating the method of the present invention. The DNA was fragmented to generate fragments with sizes ranging from 20-400 bp;
(2) The fragmented DNA (100 ng) underwent conversion to sequencing library as described in the methods described in paragraph [00100]. An appropriate primer pool was used in the initial target capture such that primers for the capture of a broad region of ROS1 known to undergo structural variations were included in the target capture;
(3) Sequencing and data analysis was performed according to the methods described in the section of “Data Analysis” above;
(4) Mapped reads were inspected in Integrated genome Viewer (IGV) for the presence of a) soft-clip, b) insertions and/or 3) mapping of Reads 1 and 2 of a paired sequencing read to physically separated regions of the genome. Two or more such supporting paired reads carrying the breakpoint or mapping to distant regions of genome were required to support the call for structural variant. The “partner” of the structural variant was identified by the mate read location or by aligning (BLASTing) an insertion or soft-clip sequence against the human genome to identify the origin of the insertion sequence.
The above process may be used for detecting structural variation in any target region known to undergo structural variation without prior knowledge of the precise location of the breakpoint. The above process may also be applied to DNA from fixed tissue (which is already fragmented to varying degrees) or cfDNA from plasma, pleural fluid or cerebrospinal fluid.
In addition, examples of other types of structural variants which may be detected using the above mentioned process are:
Inversion
An example of detection of a structural variant described as an inversion, in which a DNA sequence is reversed end to end, is shown in
The resulting inversion in a smaller target region of interest is represented in
An example of an inversion involving a region of chromosome 9 with breakpoints determined at exactly chr9:5,467,953 and chr9:6,557,405, was detected by the method of the invention (
Translocation
An example of a translocation involving a region of chromosome 6 and chromosome 4 is shown in
In principle, any of the following listed types of other structural variants:
Comparison of the Method of the Invention to Conventional Methods
Compared to conventional methods of next-generation sequencing using hybridization capture of targets or primer-based amplicon capture, the performance of the method of the invention is comparable to the conventional methods for detection of various types of genomic alterations, as established during the development and validation phase of the method (please refer to Tables 2 and 3). As can be seen in Table 4 below, the method of the invention achieved more than 99% sensitivity and specificity for detecting small nucleotide variations (SNVs) at all the mutant allele frequency tested; more than >83.3% sensitivity and specificity for detecting INDELs at 0.1% mutant allele frequency and more than 99% sensitivity and specificity for detecting INDELs at 1%, 5% and 10% mutant allele frequency tested; more than >50% sensitivity and specificity for detecting fusions at 1% mutant allele frequency and more than 99% sensitivity and specificity for detecting fusions at 5% and 10% mutant allele frequency tested. In addition, the various mutations listed in
In addition, compared to the conventional methods, the method of the invention possesses unexpected advantages. For example, the method of the invention is able to achieve simultaneous detection of: 1) Viral DNA; 2) Microsatellite instability; 3) Structural rearrangements; 4) SNVs and INDELs from samples ranging from cfDNA from plasma (or cerebrospinal fluid, pleural effusion) or DNA from fixed tissue.
Compared to the method of the invention, conventional methods do not allow for the simultaneous detection of these genomic alteration types or are not amenable to function with multiples sources of DNA.
Number | Date | Country | Kind |
---|---|---|---|
10201805450Y | Jun 2018 | SG | national |
This application is a Continuation Track One of U.S. application Ser. No. 17/253,857, filed Dec. 18, 2020, which claims the benefit of National Stage Entry under 35 U.S.C. § 371 of International Patent Application No. PCT/SG2019/050317, filed 25 Jun. 2019, which claims the benefit of priority of Singapore patent application No. 10201805450Y, filed 25 Jun. 2018, the contents of which are hereby incorporated by reference in their entireties for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 17253857 | US | |
Child | 17182615 | US |