Non-invasive liquid biopsy tests are proving to be the next frontier in cancer diagnostics. There is a rising interest in sensitive, low-cost liquid biopsy assays to detect cancer-specific methylation events. Real-time quantitative methylation-specific PCR (qMSP) allows for the detection of rare methylated fragments, however widespread application has not been achieved. While qMSP achieves high specificity and sensitivity with low cfDNA inputs, assay design is challenged by the mostly three-base genome of bisulfite-converted DNA and the high homology between methylated ctDNA and excessive unmethylated cfDNA background.
Disclosed herein are methods for enriching for target nucleic acid sequences. In particular embodiments, target nucleic acid sequences comprise sequences having or corresponding to one or more methylated CpG sites in comparison to other nucleic acids that include sequences, hereafter referred to as candidate sequences, that do not contain or correspond to the same one or more methylated CpG sites. These other nucleic acids can, in various embodiments, only differ from the target nucleic acid sequences due to one or more nucleotides corresponding to the differentially methylated CpG sites. As referred to herein, such other nucleic acids can include a candidate sequence comprising one or more variant sequences e.g., a variant sequence in relation to the target nucleic acid sequence.
Generally, methods for enriching for target nucleic acid sequences involve using small oligonucleotide probes (e.g., >10 bp) to block unmethylated, off-target events. Such small oligonucleotide probes, referred to herein as “blockers,” hybridize with the unmethylated sequences e.g., a variant sequence or a portion thereof. Therefore, probes that are useful for enriching target nucleic acid sequences (e.g., referred to herein as enrichment probes) can selectively hybridize with target nucleic acid sequences as opposed to variant sequences. This enables selective enrichment of target nucleic acid sequences.
Disclosed herein is a method for enriching for a target sequence, the method comprising: obtaining a mixture of nucleic acid sequences, the mixture comprising a first nucleic acid comprising the target sequence and a second nucleic acid comprising a candidate sequence comprising one or more variant sequences; providing an enrichment probe that binds to the target sequence of the first nucleic acid, or a portion thereof; providing a plurality of blocker sequences, each blocker probe comprising a sequence complementary to at least one variant sequence of the candidate sequence; and selectively enriching for the target sequence using the enrichment probe without enriching for the one or more variant sequences of the candidate sequence.
In various embodiments, the target sequence comprises a sequence comprising one or more methylated CpG sites, or a sequence derived from the sequence comprising one or more methylated CpG sites. In various embodiments, the target sequence comprises a sequence comprising at least five consecutive methylated CpG sites, or a sequence derived from the sequence comprising at least five consecutive methylated CpG sites. In various embodiments, the one or more variant sequences of the candidate sequence comprise one or more unmethylated CpG sites. In various embodiments, the one or more variant sequences comprise at least five consecutive non-methylated CpG sites. In various embodiments, each variant sequence of the one or more variant sequences comprises a single unmethylated CpG site. In various embodiments, the candidate sequence comprises five or more variant sequences. In various embodiments, the candidate sequence comprises five or more variant sequences comprising five or more consecutively unmethylated CpG sites.
In various embodiments, the candidate sequence differs from the target sequence by one or more nucleotides based on differential methylation of a corresponding one or more CpG sites. In various embodiments, the enrichment probe is between about 50 nucleotide bases and about 150 nucleotide bases. In various embodiments, the enrichment probe is between about 60 nucleotide bases and about 80 nucleotide bases. In various embodiments, the enrichment probe is about 70 nucleotide bases. In various embodiments, the enrichment probe is between about 80 nucleotide bases and about 120 nucleotide bases. In various embodiments, the enrichment probe is about 100 nucleotide bases. In various embodiments, each of the plurality of blocker sequences are between about 10 and about 20 nucleotide bases in length. In various embodiments, each of the plurality of blocker sequences are between about 15 and about 18 nucleotide bases in length. In various embodiments, each of the plurality of blocker sequences are about 17 nucleotide bases in length.
In various embodiments, the enrichment probe is between about 4 and about 6 times longer in length in comparison to an average length of the plurality of blocker sequences. In various embodiments, the enrichment probe is between about 4 and about 6 times longer in length in comparison to each of the plurality of blocker sequences. In various embodiments, selectively enriching for the target sequence using the enrichment probe without enriching for the one or more variant sequences comprises: hybridizing the enrichment probe to the target sequence, or a portion thereof; hybridizing the plurality of blocker sequences to the one or more variant sequences of the candidate sequence; and enriching for the hybridized enrichment probe while the one or more hybridized blocker probes prevents enrichment for the one or more variant sequences. In various embodiments, enriching for the hybridized enrichment probe while the one or more hybridized blocker probes prevents enrichment for the one or more variant sequences comprises: binding a streptavidin bead to a biotin group of the enrichment probe; and washing and removing unbound nucleic acids, thereby enriching for the target sequence. In various embodiments, the unbound nucleic acids comprise one or more blocker sequences and/or candidate sequences.
The foregoing and other objects, features and advantages of the invention will become apparent from the following description of preferred embodiments, as illustrated in the accompanying drawings. Like referenced elements identify common features in the corresponding drawings. The drawings are not necessarily to scale, with emphasis instead being placed on illustrating the principles of the present invention, in which:
It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. For example, a letter after a reference numeral, such as “nucleic acid molecule 415A,” indicates that the text refers specifically to the element having that particular reference numeral. A reference numeral in the text without a following letter, such as “nucleic acid molecule 415” refers to any or all of the elements in the figures bearing that reference numeral (e.g. “nucleic acid molecule 415” in the text refers to reference numerals “nucleic acid molecule 415A” and/or “nucleic acid molecule 415B” in the figures).
The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
As used herein, the term “about” or “approximately” can mean within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which can depend in part on how the value is measured or determined, e.g., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. “About” can mean a range of ±20%, ±10%, ±5%, or ±1% of a given value. The term “about” or “approximately” can mean within an order of magnitude, within 5-fold, or within 2-fold, of a value. Where a particular value is described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value can be assumed. The term “about” can have the meaning as commonly understood by one of ordinary skill in the art. The term “about” can refer to ±10%. The term “about” can refer to ±5%.
It should be understood that the expression of “at least one of” includes individually each of the recited objects after the expression and the various combinations of two or more of the recited objects unless otherwise understood from the context and use. The expression “and/or” in connection with three or more recited objects should be understood to have the same meaning unless otherwise understood from the context.
As used herein, the term “biological sample,” or “sample” refers to any sample taken from a subject, which can reflect a biological state associated with the subject, and that includes cell free DNA. A biological sample can take any of a variety of forms, such as a liquid biopsy (e.g., blood, urine, stool, saliva, or mucous), or a tissue biopsy, or other solid biopsy. Examples of biological samples include, but are not limited to, blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject. A biological sample can include any tissue or material derived from a living or dead subject. A biological sample can be a cell-free sample. A biological sample can comprise a nucleic acid (e.g., DNA or RNA) or a fragment thereof. The term “nucleic acid” can refer to deoxyribonucleic acid (DNA), ribonucleic acid (RNA) or any hybrid or fragment thereof. The nucleic acid in the sample can be a cell-free nucleic acid. A sample can be a liquid sample or a solid sample (e.g., a cell or tissue sample). A biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc. A biological sample can be a stool sample. In various embodiments, the majority of DNA in a biological sample that has been enriched for cell-free DNA (e.g., a plasma sample obtained via a centrifugation protocol) can be cell-free (e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free). A biological sample can be treated to physically disrupt tissue or cell structure (e.g., centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis.
As used herein, the terms “nucleic acid” and “nucleic acid molecule” are used interchangeably. The terms refer to nucleic acids of any composition form, such as ribonucleic acid (RNA), deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or RNA/DNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), all of which can be in single- or double-stranded form. Unless otherwise limited, a nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides. A nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like). A nucleic acid in some embodiments can be from a single chromosome or fragment thereof (e.g., a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism). In certain embodiments nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures. Nucleic acids can comprise protein (e.g., histones, DNA binding proteins, and the like). Nucleic acids analyzed by processes described herein can be substantially isolated and are not substantially associated with protein or other molecules. Nucleic acids can also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single-stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides. Deoxyribonucleotides can include deoxyadenosine, deoxycytidine, deoxyguanosine and deoxythymidine. A nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
As used herein, the terms “template nucleic acid” and “template nucleic acid molecule(s)” are used interchangeably. The terms refer to nucleic acid that has been obtained from a sample and processed to form an immortalized library. The template nucleic acid can be nucleic acid obtained directly from the sample, or nucleic acid that is derived from that obtained directly from the sample. Examples of nucleic acid derived from a sample include DNA that has been reverse-transcribed from RNA obtained directly from a sample, or DNA that has be amplified from DNA obtained directly from a sample, for example, by PCR.
As used herein, the term “cell-free nucleic acids” refers to nucleic acid molecules that can be found outside cells, in bodily fluids such as blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of a subject. Cell-free nucleic acids originate from one or more healthy cells and/or from one or more cancer cells, or from non-human sources such bacteria, fungi, viruses. Examples of the cell-free nucleic acids include but are not limited to cell-free DNA (“cfDNA”), including mitochondrial DNA or genomic DNA, and cell-free RNA. In certain embodiments herein, instruments for assessing the quality of the cell-free nucleic acids, such as the TapeStation System from Agilent Technologies (Santa Clara, CA) can be used. Concentrating low-abundance cfDNA can be accomplished, for example using a Qubit™ Fluorometer from Thermofisher Scientific (Waltham, MA).
As used herein, the term “methylation” refers to a modification of a nucleic acid where a hydrogen atom on the pyrimidine ring of a cytosine base is converted to a methyl group, forming 5-methylcytosine. Methylation can occur at dinucleotides of cytosine and guanine referred to herein as “CpG sites”. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5′-CHG-3′ and 5′-CHH-3′, where His adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine. Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6-methyladenine. Anomalous cfDNA methylation can be identified as hypermethylation or hypomethylation, both of which may be indicative of cancer status. As is well known in the art, DNA methylation anomalies (compared to healthy controls) can cause different effects, which may contribute to cancer.
Certain portions of a genome comprise regions with a high frequency of CpG sites. A CpG site is portion of a genome that has cytosine and guanine separated by only one phosphate group and is often denoted as “5′-C-phosphate-G-3”, or “CpG” for short. Regions with a high frequency of CpG sites are commonly referred to as “CG islands” or “CGIs”. It has been found that certain CGIs and certain features of certain CGIs in tumor cells tend to be different from the same CGIs or features of the CGIs in healthy cells. Herein, such CGIS and features of the genome are referred to herein as “cancer informative CGIs”, which is defined and described in more detail below. An “informative CpG” can be specified by reference to a specific CpG site, or to a collection of one or more CpG sites by reference to a CG island that contains the collection. These cancer informative CGIs tend to have methylation patterns in tumor cells that are different from the methylation patterns in healthy cells. DNA fragments from other CGIs may not express such differences.
As used herein, “DNA methylation” in mammalian genomes can refer to the addition of a methyl group to position 5 of the heterocyclic ring of cytosine (e.g., to produce 5-methylcytosine) among CpG dinucleotides. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5′-CHG-3′ and 5′-CHH-3′, where His adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine. Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6-methyladenine.
The phrase “target sequence” refers to a sequence of a nucleic acid derived from a sequence comprising one or more methylated CpG sites. In particular embodiments, a “target sequence” refers to a sequence of a nucleic acid derived from a sequence comprising two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty methylated CpG sites. In various embodiments, such methylated CpG sites are consecutively methylated CpG sites (e.g., no non-methylated CpG sites exist between any two of the consecutively methylated CpG sites). For example, the target sequence may be a sequence of a converted nucleic acid (e.g., where unmethylated cytosines have been converted to uracil and/or where methylated cytosines remain cytosines). In various embodiments, a target sequence includes one or more CpG sites within a region disclosed in Table 1 or Table 2. In particular embodiments, a target sequence includes five CpG sites within a region disclosed in Table 1 or Table 2. In particular embodiments, a target sequence includes five sequential CpG sites within a region disclosed in Table 1 or Table 2.
The phrase “candidate sequence” refers to a sequence of a nucleic acid derived from a sequence comprising one or more non-methylated CpG sites. In particular embodiments, a “candidate sequence” refers to a sequence of a nucleic acid derived from a sequence comprising two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty non-methylated CpG sites (e.g., no methylated CpG sites exist between any two of the consecutively non-methylated CpG sites). In various embodiments, such non-methylated CpG sites are consecutively non-methylated CpG sites. For example, the candidate sequence may be a sequence of a converted nucleic acid (e.g., where unmethylated cytosines have been converted to uracil and/or where methylated cytosines remain cytosines). In various embodiments, a candidate sequence includes one or more CpG sites within a region disclosed in Table 1 or Table 2. In particular embodiments, a candidate sequence includes five CpG sites within a region disclosed in Table 1 or Table 2. In particular embodiments, a candidate sequence includes five sequential CpG sites within a region disclosed in Table 1 or Table 2.
Generally, a “candidate sequence” may differ from a corresponding “target sequence” by one or more nucleotides attributable to differential methylation of the one or more CpG sites. For example, each of a candidate sequence and a corresponding target sequence can include X CpG sites. Assuming that Y CpG sites of the total X CpG sites (where Y is less than X) are differentially methylated between the target sequence and the candidate sequence, then the sequences of the target sequence and the candidate sequence can differ by the Y different nucleotides corresponding to the Y CpG sites. The phrase “candidate sequence” may be referred to herein as including one or more variant sequences of a target sequence.
The phrase “sequential CpG sites” refers to CpG sites within a range of genomic locations in which all CpG sites within the range of genomic locations are part of the sequential CpG sites. Sequential CpG sites include a neighboring CpG site i.e., a previous contiguous or next contiguous CpG site.
The terms “unmethylated” and “non-methylated” are used interchangeably and refer to nucleic acid molecules comprising sequences that include one or more unmethylated CpG sites and/or nucleic acid sequences derived from nucleic acid sequences that include one or more unmethylated CpG sites. For example, “unmethylated nucleic acid molecules” can refer to converted nucleic acid sequences (e.g., bisulfite-converted nucleic acid sequences) that are derived from cell-free DNA that include sequences with one or more unmethylated CpG sites.
As used herein, the term “amplifying” means performing an amplification reaction. In one aspect, an amplification reaction is “template-driven” in that base pairing of reactants, either nucleotides or oligonucleotides, have complements in a template polynucleotide that are required for the creation of reaction products. In one aspect, template-driven reactions are primer extensions with a nucleic acid polymerase, or oligonucleotide ligations with a nucleic acid ligase. Such reactions include, but are not limited to, polymerase chain reactions (PCRs), bisulfite-specific qPCR (qBSP), methylation-specific qPCR (qMSP), linear polymerase reactions, nucleic acid sequence-based amplification (NASBAs), rolling circle amplifications, and the like, disclosed in the following references, each of which are incorporated herein by reference herein in their entirety: Mullis et al., U.S. Pat. Nos. 4,683,195; 4,965,188; 4,683,202; 4,800,159 (PCR); Gelfand et al., U.S. Pat. No. 5,210,015 (real-time PCR with “taqman” probes); Wittwer et al., U.S. Pat. No. 6,174,670; Kacian et al., U.S. Pat. No. 5,399,491 (“NASBA”); Lizardi, U.S. Pat. No. 5,854,033; Aono et al., Japanese patent publ. JP 4-262799 (rolling circle amplification); and the like. As used herein, bisulfite-specific PCR refers to PCR that amplifies converted DNA with no or limited methylation bias. As used herein, methylation-specific PCR refers to PCR that is methylation-specific and bi-sulfite specific, which amplifies converted methylated DNA. In one aspect, the amplification reaction is PCR. An amplification reaction may be a “real-time” amplification if a detection chemistry is available that permits a reaction product to be measured as the amplification reaction progresses, e.g., “real-time PCR”, or “real-time NASBA” as described in Leone et al., Nucleic Acids Research, 26:2150-2155 (1998), and like references.
The terms “fragment” or “segment”, as used interchangeably herein, refer to a portion of a larger polynucleotide molecule. A polynucleotide, for example, can be broken up, or fragmented into, a plurality of segments. Various methods of fragmenting nucleic acid are well known in the art. These methods may be, for example, either chemical or physical or enzymatic in nature. Enzymatic fragmentation may include partial degradation with a DNase; partial depurination with acid; the use of restriction enzymes; intron-encoded endonucleases; DNA-based cleavage methods, such as triplex and hybrid formation methods, that rely on the specific hybridization of a nucleic acid segment to localize a cleavage agent to a specific location in the nucleic acid molecule; or other enzymes or compounds which cleave a polynucleotide at known or unknown locations. Physical fragmentation methods may involve subjecting a polynucleotide to a high shear rate. High shear rates may be produced, for example, by moving DNA through a chamber or channel with pits or spikes, or forcing a DNA sample through a restricted size flow passage, e.g., an aperture having a cross sectional dimension in the micron or submicron range. Other physical methods include sonication and nebulization. Combinations of physical and chemical fragmentation methods may likewise be employed, such as fragmentation by heat and ion-mediated hydrolysis. See, e.g., Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 3rd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001) (“Sambrook et al.) which is incorporated herein by reference for all purposes. These methods can be optimized to digest a nucleic acid into fragments of a selected size range.
The terms “polymerase chain reaction” or “PCR”, as used interchangeably herein, mean a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors that are well-known to those of ordinary skill in the art, e.g., exemplified by the following references: McPherson et al., editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature>90° C., primers annealed at a temperature in the range 50-75° C., and primers extended at a temperature in the range 72-78° C. The term “PCR” encompasses derivative forms of the reaction, including, but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like. The particular format of PCR being employed is discernible by one skilled in the art from the context of an application. Reaction volumes can range from a few hundred nanoliters, e.g., 200 nL, to a few hundred μL, e.g., 200 μL. “Reverse transcription PCR,” or “RT-PCR,” means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, an example of which is described in Tecott et al., U.S. Pat. No. 5,168,038, the disclosure of which is incorporated herein by reference in its entirety. “Real-time PCR” means a PCR for which the amount of reaction product, i.e., amplicon, is monitored as the reaction proceeds. There are many forms of real-time PCR that differ mainly in the detection chemistries used for monitoring the reaction product, e.g., Gelfand et al., U.S. Pat. No. 5,210,015 (“taqman”); Wittwer et al., U.S. Pat. Nos. 6,174,670 and 6,569,627 (intercalating dyes); Tyagi et al., U.S. Pat. No. 5,925,517 (molecular beacons); the disclosures of which are hereby incorporated by reference herein in their entireties. Detection chemistries for real-time PCR are reviewed in Mackay et al., Nucleic Acids Research, 30:1292-1305 (2002), which is also incorporated herein by reference. “Nested PCR” means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, “initial primers” in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and “secondary primers” mean the one or more primers used to generate a second, or nested, amplicon. “Asymmetric PCR” means a PCR wherein one of the two primers employed is in great excess concentration so that the reaction is primarily a linear amplification in which one of the two strands of a target nucleic acid is preferentially copied. The excess concentration of asymmetric PCR primers may be expressed as a concentration ratio. Typical ratios are in the range of from 10 to 100. “Multiplexed PCR” means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g., Bernard et al., Anal. Biochem., 273:221-228 (1999) (two-color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. Typically, the number of target sequences in a multiplex PCR is in the range of from 2 to 50, or from 2 to 40, or from 2 to 30. In particular embodiments, the number of target sequences in a multiplex PCR is about 4. “Quantitative PCR” means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Quantitative PCR includes both absolute quantitation and relative quantitation of such target sequences. Quantitative measurements are made using one or more reference sequences or internal standards that may be assayed separately or together with a target sequence. The reference sequence may be endogenous or exogenous to a sample or specimen, and in the latter case, may comprise one or more competitor templates. Typical endogenous reference sequences include segments of transcripts of the following genes: β-actin, GAPDH, β2-microglobulin, ribosomal RNA, and the like. Techniques for quantitative PCR are well-known to those of ordinary skill in the art, as exemplified in the following references, which are incorporated by reference herein in their entireties: Freeman et al., Biotechniques, 26:112-126 (1999); Becker-Andre et al., Nucleic Acids Research, 17:9437-9447 (1989); Zimmerman et al., Biotechniques, 21:268-279 (1996); Diviacco et al., Gene, 122:3013-3020 (1992); and Becker-Andre et al., Nucleic Acids Research, 17:9437-9446 (1989).
The term “primer” as used herein means an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3′ end along the template so that an extended duplex is formed. Extension of a primer is usually carried out with a nucleic acid polymerase, such as a DNA or RNA polymerase. The sequence of nucleotides added in the extension process is determined by the sequence of the template polynucleotide. Usually, primers are extended by a DNA polymerase. Primers usually have a length in the range of from 14 to 40 nucleotides, or in the range of from 18 to 36 nucleotides. Primers are employed in a variety of nucleic amplification reactions, for example, linear amplification reactions using a single primer, or polymerase chain reactions, employing two or more primers. Guidance for selecting the lengths and sequences of primers for particular applications is well known to those of ordinary skill in the art, as evidenced by the following reference that is incorporated by reference herein in its entirety: Dieffenbach, editor, PCR Primer: A Laboratory Manual, 2nd Edition (Cold Spring Harbor Press, New York, 2003).
As used herein, the term “subject” refers to any living or non-living organism, including but not limited to a human (e.g., a male human, female human, fetus, pregnant female, child, or the like), a non-human animal, a plant, a bacterium, a fungus or a protist. Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g., cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g., pig), camelid (e.g., camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. In some embodiments, a subject is a male or female of any age (e.g., a man, a women or a child).
As used herein “selective enrichment of nucleic acid molecules” refers to the increased enrichment of nucleic acid molecules in relation to other nucleic acids. In various embodiments, selective enrichment refers to at least a fold enrichment of nucleic acid molecules relative to other nucleic acid molecules. Thus, selective enrichment of nucleic acid molecules does not require complete abatement or elimination of other nucleic acid molecules. Rather, at least a fold increase in the amplification and/or hybrid capture of nucleic acid molecules is achieved in comparison to other nucleic acid molecules. In various embodiments, selective enrichment of nucleic acid molecules refers to at least a 2-fold increase, at least a 3-fold increase, at least a 4-fold increase, at least a 5-fold increase, at least a 6-fold increase, at least a 7-fold increase, at least a 8-fold increase, at least a 9-fold increase, at least a 10-fold increase, at least a 15-fold increase, at least a 20-fold increase, at least a 25 fold increase, at least a 50-fold increase, at least a 100-fold increase, at least a 200-fold increase, at least a 500-fold increase, or at least a 1000-fold increase of the nucleic acid molecules relative to other nucleic acids.
The practice of the present disclosure will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are within the skill of the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al., 1989) Cold Spring Harbor Press; Oligonucleotide Synthesis (M. J. Gait, ed., 1984); Methods in Molecular Biology, Humana Press; Cell Biology: A Laboratory Notebook (J. E. Cellis, ed., 1998) Academic Press; Animal Cell Culture (R. I. Freshney, ed., 1987); Introduction to Cell and Tissue Culture (J. P. Mather and P. E. Roberts, 1998) Plenum Press; Cell and Tissue Culture: Laboratory Procedures (A. Doyle, J. B. Griffiths, and D. G. Newell, eds., 1993-1998) J. Wiley and Sons; Methods in Enzymology (Academic Press, Inc.); Gene Transfer Vectors for Mammalian Cells (J. M. Miller and M. P. Calos, eds., 1987); Current Protocols in Molecular Biology (F. M. Ausubel et al., eds., 1987); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994); Sambrook and Russell, Molecular Cloning: A Laboratory Manual, 3rd. ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (2001); Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, NY (2002); Harlow and Lane Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1998); Coligan et al., Short Protocols in Protein Science, John Wiley & Sons, NY (2003); Short Protocols in Molecular Biology (Wiley and Sons, 1999).
Enzymatic reactions and purification techniques are performed according to manufacturer's specifications, as commonly accomplished in the art or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, biochemistry, immunology, molecular biology, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, and chemical analyses.
Throughout this specification and embodiments, the word “comprise,” or variations such as “comprises” or “comprising,” will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
It is understood that wherever embodiments are described herein with the language “comprising,” otherwise analogous embodiments described in terms of “consisting of” and/or “consisting essentially of” are also provided.
Any example(s) following the term “e.g.” or “for example” is not meant to be exhaustive or limiting.
Unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements. Moreover, all ranges disclosed herein are to be understood to be inclusive of the numbers defining the range and to encompass any and all subranges subsumed therein. For example, a stated range of “1 to 10” should be considered to include any and all subranges between (and inclusive of) the minimum value of 1 and the maximum value of 10; that is, all subranges beginning with a minimum value of 1 or more, e.g., 1 to 6.1, and ending with a maximum value of 10 or less, e.g., 5.5 to 10.
Unless otherwise defined herein, scientific and technical terms used in this application shall have the meanings that are commonly understood by those of ordinary skill in the art.
Where aspects or embodiments of the disclosure are described in terms of a Markush group or other grouping of alternatives, the present disclosure encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group, but also the main group absent one or more of the group members. The present disclosure also envisages the explicit exclusion of one or more of any of the group members in an embodiment of the disclosure.
Exemplary methods and materials are described herein, although methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure. The materials, methods, and examples are illustrative only and not intended to be limiting.
Disclosed herein are methods for enriching nucleic acid molecules comprising a target sequence. Such methods are useful for enriching for a signal in a sample, such as a signal informative for determining presence or absence of cancer in the sample. In various embodiments, methods disclosed herein are useful for enriching for nucleic acids including sequences derived from methylated CpG sites by inhibiting or blocking other sequences derived from non-methylated CpG sites. For example, this enrichment technology can be used for methylation detection from bisulfite-converted DNA in qPCR, dPCR, hybridization capture, etc. Methods include providing short blocker sequences that bind to candidate sequences or portions thereof (e.g., a variant sequence), thereby inhibiting longer enrichment probe sequences from hybridizing with candidate sequences. This prevents the downstream processing and capturing of candidate sequences. In contrast, the short blocker sequences do not bind, or exhibit reduced affinity for target sequences, thereby allowing for longer probe sequences to hybridize with target sequences. This enables the downstream processing, capturing, and enrichment of target sequences.
In various embodiments, methods for enriching a target sequence comprises: obtaining a mixture of nucleic acid sequences, the mixture comprising a first nucleic acid comprising the target sequence and a second nucleic acid comprising a candidate sequence comprising one or more variant sequences; providing an enrichment probe that binds to the target sequence of the first nucleic acid, or a portion thereof; providing a plurality of blocker sequences, each blocker probe comprising a sequence complementary to at least one variant sequence of the candidate sequence; and selectively enriching for the target sequence using the enrichment probe without enriching for the one or more variant sequences of the candidate sequence. Generally, a blocker sequence of the plurality of blocker sequences may contain a sequence that is at least 90%, at least 95%, or 100% complementary to the candidate sequence, or a portion thereof (e.g., a variant sequence). Given that the first nucleic acid molecule was derived from a sequence comprising one or more methylated CpG sites, the plurality of blocker sequences each comprise a sequence that contains one or more mismatches relative to the target sequence, or a portion thereof. Thus, the blocker does not bind to the first nucleic acid (or binds to the first nucleic acid at a rate that is less than a rate at which the blocker binds to the candidate sequence, or portion thereof).
Reference is now made to
In various embodiments, the sample obtained from the subject is a liquid biopsy sample. In various embodiments, the liquid biopsy sample includes nucleic acid molecules. Example nucleic acid molecules include DNA or RNA. In particular embodiments, the nucleic acid molecules include cell-free DNA (cfDNA). In various embodiments, the cfDNA includes genomic sequences corresponding to CpG islands (CGIs) for which methylation states are informative for presence or absence of cancer. In various embodiments, the cfDNA can be derived from tumor cells and is referred to herein as circulating tumor DNA (ctDNA). In various embodiments, the nucleic acid molecules include a mixture of nucleic acid molecules that contain either methylated CpG sites or non-methylated CpG sites. For example, for a particular genomic region containing one or more CpG sites, the mixture of nucleic acid molecules includes a subset of nucleic acid molecules in which the one or more CpG sites are unmethylated and in a different subset of nucleic acid molecules in which the one or more CpG sites are partially or fully methylated.
Step 120 involves converting the nucleic acid molecules from the sample obtained from the subject. In various embodiments, converting the nucleic acid involves converting unmethylated nucleotides (e.g., cytosines) to another nucleotide (a “converted nucleotide”, as used herein). In various embodiments, methylated cytosines are protected from conversion (e.g., deamination) during the conversion step. Further details of performing conversion of nucleic acid molecules (e.g., step 120) are described herein.
Although not shown in
Step 125 involves enriching for a subset of nucleic acid molecules (e.g., nucleic acid molecules comprising target sequences). As shown in
Although steps 130, 135, and 140 are shown in
Step 130 involves providing one or more enrichment probes that bind to a target sequence, or one or more portions thereof. In various embodiments, an enrichment probe comprises a sequence that is at least 90% complementary, at least 95% complementary, at least 96% complementary, at least 97% complementary, at least 98% complementary, at least 99% complementary, or 100% complementary to the target sequence, or a portion thereof.
In various embodiments, a target sequence contains a sequence that includes or corresponds to X CpG sites that were partially or fully methylated. In particular embodiments, a target sequence contains a sequence that includes or corresponds to X CpG sites that were fully methylated. Thus, by binding the enrichment probe to the target sequence, or a portion thereof, the subsequent processing and enrichment steps can enrich for the target sequence using the enrichment probe.
Step 135 involves providing a plurality of blockers that bind to a candidate sequence, or portions thereof (e.g., a variant sequence). In various embodiments, step 135 involves providing copies of 1 blocker sequence, 2 blocker sequences, 3 blocker sequences, 4 blocker sequences, 5 blocker sequences, 6 blocker sequences, 7 blocker sequences, 8 blocker sequences, 9 blocker sequences, 10 blocker sequences, 11 blocker sequences, 12 blocker sequences, 13 blocker sequences, 14 blocker sequences, 15 blocker sequences, 16 blocker sequences, 17 blocker sequences, 18 blocker sequences, 19 blocker sequences, or 20 blocker sequences. In particular embodiments, the plurality of blocker sequences comprise 5 blocker sequences that are complementary to 5 different variant sequences of a candidate sequence. Each blocker sequence can be complementary to a single variant sequence of the candidate sequence.
Returning to
In various embodiments, the probe is complementary to a portion of the candidate sequence that is bound by the blocker. For example, the probe may be complementary to at least one nucleotide, at least two nucleotides, at least three nucleotides, at least four nucleotides, at least five nucleotides, at least six nucleotides, at least seven nucleotides, at least eight nucleotides, at least nine nucleotides, at least ten nucleotides, at least eleven nucleotides, at least twelve nucleotides, at least thirteen nucleotides, at least fourteen nucleotides, at least fifteen nucleotides, at least sixteen nucleotides, at least seventeen nucleotides, at least eighteen nucleotides, at least nineteen nucleotides, or at least twenty nucleotides that are bound by one or more blockers. In various embodiments, the one or more blockers outcompete the enrichment probe for binding to the candidate sequence, or portion thereof. For example, the plurality of blockers can each be bound to a different portion of the candidate sequence, thereby preventing the enrichment probe from hybridizing to the candidate sequence. As another example, the plurality of blockers can outcompete binding of the enrichment probe. Thus, when the subsequent enrichment step is performed, the candidate sequence is not enriched.
In various embodiments, step 140, involves performing an enrichment step. An example enrichment step includes performing any one of hybrid capture, use of DNA-binding proteins to enrich a target sequence or a subset of target sequences, and nucleic acid amplification (e.g., polymerase chain reaction). In particular embodiments, the enrichment step involves capturing target nucleic acids on a solid support through an affinity moiety. For example, the enrichment step can involve capturing target nucleic acids through a biotin-streptavidin interaction, where the enrichment probe includes a biotin group and a solid support is coated with streptavidin groups. Therefore, the target nucleic acid is captured by the solid support through the biotin-streptavidin interaction.
In various embodiments, step 140 involves performing one or more washes and/or selections to remove unwanted DNA fragments, such as single stranded DNA fragments, excess primers, excess adapters, and other molecules. For example, nucleic acids that remain unbound (e.g., not bound to a streptavidin coated solid support) are washed and removed, while retaining bound target nucleic acids. Thus, the bound target nucleic acids are enriched relative to the unbound nucleic acids.
Step 150 involves detecting the selectively enriched nucleic acids. In various embodiments, detecting the selectively enriched nucleic acids involves performing sequencing to determine the sequences of the first set of nucleic acids. In various embodiments, sequencing data can be demultiplexed e.g., using barcode sequences. In various embodiments, sequencing data can be aligned to a reference genome and/or trimmed. In various embodiments, sequencing data can be further analyzed to determine the enriched signal in the sample e.g., for determining presence or absence of cancer in the sample. In various embodiments, detecting the selectively enriched nucleic acids involves quantifying a signal from the first set of nucleic acids. For example, the signal may be a fluorescent signal. Thus, quantifying the fluorescent signal can be informative for determining a total quantity of the first set of nucleic acid molecules (or nucleic acids, such as amplicons, derived from the first set of nucleic acid molecules. Detecting the selectively enriched nucleic acids by quantifying a signal from the first set of nucleic acids can be performed e.g., when performing quantitative PCR.
a. Exemplary Methods
As disclosed herein in reference to
Beginning with
Additionally, nucleic acid molecule 415B includes a candidate sequence 408 that is derived from a sequence comprising one or more non-methylated CpG sites. Specifically,
One or more blockers (e.g., blocker 420A and blocker 420B) are introduced to the converted nucleic acids 410. The one or more blockers 420 bind to portions of candidate sequence 408 of nucleic acid molecule 415B. Here, each of blocker 420A and blocker 420B bind to a portion of candidate sequence 408. Although
In various embodiments, the provided number of different blocker sequences is dependent on the number of CpG sites in the candidate sequence 408. For example, if the candidate sequence 408 includes X total CpG sites, then X different blocker sequences can be provided. In such embodiments, each blocker sequence can comprise a sequence that is complementary to a variant sequence of the candidate sequence 408, where each variant sequence of the candidate sequence 408 includes a single CpG site of the X total CpG sites. Thus, every CpG site in a variant sequence of the candidate sequence 408 can be bound by at least a corresponding blocker sequence. As an example,
In various embodiments, if the candidate sequence 408 includes X total CpG sites, then fewer than X different blocker sequences can be provided. Here, each of the X different blocker sequences can bind to at least one of the X total CpG sites. In some scenarios, at least one of the blocker sequences can bind to two or more of the X total CpG sites. Therefore, fewer than X different blocker sequences can be provided while still binding to the X total CpG sites. This may be beneficial in situations where adjacent CpG sites are located near to each other such that a single blocker sequence can include a sequence that is complementary to a variant sequence comprising the adjacent CpG sites. In various embodiments, two CpG sites are located near each other if they are within 50 nucleotides, within 40 nucleotides, within 30 nucleotides, within 25 nucleotides, within 20 nucleotides, within 19 nucleotides, within 18 nucleotides, within 17 nucleotides, within 16 nucleotides, within 15 nucleotides, within 14 nucleotides, within 13 nucleotides, within 12 nucleotides, within 11 nucleotides, within 10 nucleotides, within 9 nucleotides, within 8 nucleotides, within 7 nucleotides, within 6 nucleotides, within 5 nucleotides, within 4 nucleotides, within 3 nucleotides, within 2 nucleotides, or within 1 nucleotide of each other. As an example, X−1 different blocker sequences can be provided where a blocker sequence binds to a variant sequence comprising 2 CpG sites. The remaining blocker sequences can each bind to one of the remaining CpG sites.
In various embodiments, two blockers 420 are designed such that when each blocker is hybridized with a portion of the candidate sequence 408, the two blockers 420 are immediately adjacent to each other. As an example, a first blocker may bind to a variant sequence of the candidate sequence 408 that is immediately adjacent to another variant sequence of the candidate sequence 408 that is bound by a second blocker. As shown in
In various embodiments, two blockers 420 are designed such that when each blocker is hybridized with a portion of the candidate sequence 408, the two blockers 420 are bound to two different variant sequences of the candidate sequence 408 that are separated by 1 or more nucleotides. In various embodiments, the two blockers 420 are bound to two different variant sequences of the candidate sequence 408 that are separated by 2 nucleotides, 3 nucleotides, 4 nucleotides, 5 nucleotides, 6 nucleotides, 7 nucleotides, 8 nucleotides, 9 nucleotides, 10 nucleotides, 11 nucleotides, 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides, 16 nucleotides, 17 nucleotides, 18 nucleotides, 19 nucleotides, 20 nucleotides, 25 nucleotides, 30 nucleotides, 35 nucleotides, 40 nucleotides, 45 nucleotides, 50 nucleotides, 55 nucleotides, 60 nucleotides, 65 nucleotides, 70 nucleotides, 75 nucleotides, 80 nucleotides, 90 nucleotides, or 100 nucleotides. In various embodiments, the two blockers 420 are bound to two different variant sequences of the candidate sequence 408 that are separated by between 100 and 200 nucleotides, between 200 and 300 nucleotides, between 300 and 400 nucleotides, between 400 and 500 nucleotides, between 500 and 600 nucleotides, between 600 and 700 nucleotides, between 700 and 800 nucleotides, between 800 and 900 nucleotides, between 900 and 1000 nucleotides, or between 1000 and 5000 nucleotides.
In various embodiments, the plurality of blockers 420 are designed such that when each blocker is hybridized with a portion of the candidate sequence 408, the entire candidate sequence 408 is bound by one of the plurality of blockers 420. In various embodiments, when the plurality of blockers 420 are each hybridized with a portion of the candidate sequence 408, there may be other portions of the candidate sequence 408 that remain unbound. In such embodiments, the plurality of blockers 420 may bind the variant sequences (e.g., variant sequences including CpG sites) but need not bind all of the other portions of the candidate sequence 408. Generally, the plurality of blockers 420 bind to a sufficient proportion of the candidate sequence 408 such that the plurality of blockers 420 prevents binding of a subsequent enrichment probe to the candidate sequence 408.
In various embodiments, when the plurality of blockers 420 are each hybridized with a portion of the candidate sequence 408, between 8% and 50% of the candidate sequence 408 is bound. In various embodiments, when the plurality of blockers 420 are each hybridized with a portion of the candidate sequence 408, between 10% and 30% of the candidate sequence 408 is bound. In various embodiments, when the plurality of blockers 420 are each hybridized with a portion of the candidate sequence 408, between 15% and 25% of the candidate sequence 408 is bound. In various embodiments, when the plurality of blockers 420 are each hybridized with a portion of the candidate sequence 408, between 18% and 22% of the candidate sequence 408 is bound. In various embodiments, when the plurality of blockers 420 are each hybridized with a portion of the candidate sequence 408, about 8%, about 9%, about 10%, about 11%, about 12%, about 13%, about 14%, about 15%, about 16%, about 17%, about 18%, about 19%, about 20%, about 21%, about 22%, about 23%, about 24%, about 25%, about 26%, about 27%, about 28%, about 29%, about 30%, about 31%, about 32%, about 33%, about 34%, about 35%, about 36%, about 37%, about 38%, about 39%, about 40%, about 41%, about 42%, about 43%, about 44%, about 45%, about 46%, about 47%, about 48%, about 49%, or about 50% of the candidate sequence 408 is bound.
In various embodiments, a blocker 420 does not bind, or minimally binds, to target sequence 405. Given the mismatches between the sequence of the plurality of blockers 420 and the nucleotides of the target sequence 405 that correspond to methylated CpG sites 402, the blocker 420 may fail to hybridize, or hybridizes to a lesser extent, with the target sequence 405. In various embodiments, less than 50% of nucleic acid molecule 415A (e.g., first set of nucleic acid molecules that are derived from sequences with one or more methylated CpG sites) are bound to a blocker 420. In various embodiments, less than 40%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, less than 1%, less than 0.9%, less than 0.8%, less than 0.7%, less than 0.6%, less than 0.5%, less than 0.4%, less than 0.3%, less than 0.2%, or less than 0.1% of nucleic acid molecule 415A are bound to a blocker 420.
Next, enrichment probes are provided. Reference is made to
Given that the enrichment probe 430 is bound to nucleic acid molecule 415A and not bound to nucleic acid molecule 415B, subsequent enrichment can be performed to enrich for nucleic acid molecule 415A using the bound enrichment probe 430. As an example, methods can involve performing hybrid capture e.g., using a solid support to capture the enrichment probe 430 bound to the target sequence 405 of the nucleic acid molecule 415A. As another example, methods can involve performing capturing the enrichment probe 430 and nucleic acid molecule 415A via an affinity moiety (e.g., a biotin and/or streptavidin) using a solid support. Thus, the nucleic acid molecule 415A can be enriched while eliminating or reducing the quantity of nucleic acid molecule 415B.
b. Example Blocker Sequences and Enrichment Probe Sequences
As disclosed herein, a blocker may contain a sequence that is at least 90%, at least 95%, or 100% complementary to one or more variant sequences of a candidate sequence, or a portion thereof. In various embodiments, a blocker may contain a sequence that is at least 90%, at least 95%, or 100% complementary to one or more variant sequences of a candidate sequence, or a portion thereof. In various embodiments, a blocker may be designed to be at least 90%, at least 95%, or 100% complementary to one or more variant sequences of a candidate sequence that includes one or more CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, a blocker may be designed to be at least 90%, at least 95%, or 100% complementary to one or more variant sequences of a candidate sequence that include two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, a blocker may be designed to be at least 90%, at least 95%, or 100% complementary to one or more variant sequences of a candidate sequence that includes two, three, four, five, six, seven, eight, nine, or ten CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, a blocker may be designed to be at least 90%, at least 95%, or 100% complementary to one or more variant sequences of a candidate sequence that includes five sequential CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, a blocker is designed to be 100% complementary to one or more variant sequences of a candidate sequence that includes five sequential CpG sites within a region disclosed in Table 1 or Table 2.
In various embodiments, a blocker may be designed to be complementary to one or more variant sequences of a candidate sequence, the candidate sequence containing X CpG sites, where the X CpG sites are fully unmethylated. In various embodiments, a blocker may be designed to be complementary to one or more variant sequences of a candidate sequence that contains X CpG sites, where at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the X CpG sites are unmethylated.
As referred to herein, the term “K*n #” refers to a sequence having “*” CpG sites, where “#” of the CpG sites that are methylated. Therefore, the term “K5n5” refers to a sequence including 5 CpG sites in which 5 of the CpG sites are methylated. As another example, the term “K6n5” refers to a sequence including 6 CpG sites in which 5 of the CpG sites are methylated. In various embodiments, “*” can be any of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In various embodiments, “#” can be any of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, a candidate sequence is less than 200 bp, less than 150 bp, less than 100 bp, less than 75 bp, less than 50 bp in length, or less than 35 bp in length. In some embodiments, a candidate sequences is less than 150 bp in length. In some embodiments, a candidate sequences is less than 100 bp in length. In some embodiments, a candidate sequences is less than 75 bp in length. In some embodiments, a candidate sequences is less than 50 bp in length. In some embodiments, a candidate sequence is less than 35 bp in length.
In various embodiments, a blocker may be designed to be fully complementary to one or more variant sequences of a candidate sequence derived from a “K*n #” sequence. For example, a blocker may be designed to be complementary to one or more variant sequences of a candidate sequence derived from a K5n0 sequence (e.g., fully unmethylated sequence of 5 CpG sites). Thus, as each of the CpG sites of the K5n0 sequence is unmethylated, the resulting sequence corresponding to each CpG site may be “TG”, or a complement thereof. Thus, a blocker may be designed to have a sequence that is complementary to a variant sequence including one of the five “TG” nucleotides, or a complement thereof.
In various embodiments, a blocker may be designed to be complementary to one or more variant sequences of a candidate sequence derived from any of a K1n0 sequence, a K2n0 sequence, a K3n0 sequence, a K4n0 sequence, a K5n0 sequence, a K6n0 sequence, a K7n0 sequence, a K8n0 sequence, a K9n0 sequence, or a K10n0 sequence. In various embodiments, a blocker may be designed to be complementary to one or more variant sequences of a candidate sequence derived from any of a K2n1 sequence, a K3n1 sequence, a K4n1 sequence, a K5n1 sequence, a K6n1 sequence, a K7n1 sequence, a K8n1 sequence, a K9n1 sequence, a K10n1 sequence, a K3n2 sequence, a K4n2 sequence, a K5n2 sequence, a K6n2 sequence, a K7n2 sequence, a K8n2 sequence, a K9n2 sequence, a K10n2 sequence, a K4n3 sequence, a K5n3 sequence, a K6n3 sequence, a K7n3 sequence, a K8n3 sequence, a K9n3 sequence, a K10n3 sequence, a K5n4 sequence, a K6n4 sequence, a K7n4 sequence, a K8n4 sequence, a K9n4 sequence, a K10n4 sequence, a K6n5 sequence, a K7n5 sequence, a K8n5 sequence, a K9n5 sequence, a K10n5 sequence, a K7n6 sequence, a K8n6 sequence, a K9n6 sequence, a K10n6 sequence, a K8n7 sequence, a K9n7 sequence, a K10n7 sequence, a K9n8 sequence, a K10n8 sequence, or a K10n9 sequence.
In various embodiments, a blocker is designed to be complementary to a “K*n0” variant sequence, where “*” is between 1 and 5. In particular embodiments, a blocker is designed to be complementary to a K1n0 variant sequence (e.g., a variant sequence including 1 CpG site that was previously unmethylated). In various embodiments, a blocker is designed to be complementary to a K2n0 variant sequence (e.g., a variant sequence including 2 CpG sites that were each previously unmethylated), a K3n0 variant sequence (e.g., a variant sequence including 3 CpG sites that were each previously unmethylated), a K4n0 variant sequence (e.g., a variant sequence including 4 CpG sites that were each previously unmethylated), or a K5n0 variant sequence (e.g., a variant sequence including 5 CpG sites that were each previously unmethylated).
In particular embodiments, a candidate sequence is a K5n0 sequence and the plurality of blockers include 5 different blockers, each designed to be complementary to a particular K1n0 variant sequence. Each blocker may be designed to bind to a different K1n0 variant sequence, thereby covering all CpG sites in the K5n0 candidate sequence.
In various embodiments, a blocker may include a sequence that is between 1 and 40 nucleotide bases in length. In various embodiments, a blocker may include a sequence that is between 5 and 30 nucleotide bases in length, between 10 and 20 nucleotide bases in length, between 13 and 19 nucleotide bases in length, or between 15 and 18 nucleotide bases in length. In various embodiments, a blocker may include a sequence that is 17 nucleotide bases in length.
As disclosed herein, methods may include providing a plurality of blocker sequences (e.g., see
As disclosed herein, an enrichment probe may contain a sequence that is at least 90% complementary, at least 95% complementary, at least 96% complementary, at least 97% complementary, at least 98% complementary, at least 99% complementary, or 100% complementary to the target sequence, or a portion thereof. In various embodiments, an enrichment probe may be designed to be at least 90%, at least 95%, or 100% complementary to a target sequence that includes one or more CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, an enrichment probe may be designed to be at least 90%, at least 95%, or 100% complementary to a target sequence that includes two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, an enrichment probe may be designed to be at least 90%, at least 95%, or 100% complementary to a target sequence that includes two, three, four, five, six, seven, eight, nine, or ten CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, an enrichment probe may be designed to be at least 90%, at least 95%, or 100% complementary to a target sequence that includes five sequential CpG sites within a region disclosed in Table 1 or Table 2. In various embodiments, an enrichment probe is designed to be 100% complementary to a target sequence that includes five sequential CpG sites within a region disclosed in Table 1 or Table 2.
In various embodiments, an enrichment probe may be designed to be complementary to a target sequence that contains X CpG sites, where the X CpG sites are fully methylated. In various embodiments, an enrichment probe may be designed to be complementary to a target sequence that contains X CpG sites, where the X CpG sites are fully methylated. In various embodiments, an enrichment probe may be designed to be complementary to a target sequence that contains X CpG sites, where at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the X CpG sites are methylated.
In various embodiments, an enrichment probe may be designed to be fully complementary to a target sequence derived from a “K*n #” sequence. For example, an enrichment probe may be designed to be complementary to a target sequence derived from a K5n5 sequence (e.g., fully methylated sequence of 5 CpG sites). Thus, as each of the CpG sites of the K5n5 sequence is methylated, the resulting sequence corresponding to each CpG site may be “CG”, or a complement thereof. Thus, the probe may be designed to have a sequence that is complementary to each of the five “CG”, or complement thereof.
In various embodiments, an enrichment probe may be designed to be complementary to a target sequence derived from any of a K1n1 sequence, a K2n2 sequence, a K3n3 sequence, a K4n4 sequence, a K5n5 sequence, a K6n6 sequence, a K7n7 sequence, a K8n8 sequence, a K9n9 sequence, or a K10n10 sequence. In various embodiments, an enrichment probe may be designed to be complementary to a target sequence derived from any of a K2n1 sequence, a K3n1 sequence, a K4n1 sequence, a K5n1 sequence, a K6n1 sequence, a K7n1 sequence, a K8n1 sequence, a K9n1 sequence, a K10n1 sequence, a K3n2 sequence, a K4n2 sequence, a K5n2 sequence, a K6n2 sequence, a K7n2 sequence, a K8n2 sequence, a K9n2 sequence, a K10n2 sequence, a K4n3 sequence, a K5n3 sequence, a K6n3 sequence, a K7n3 sequence, a K8n3 sequence, a K9n3 sequence, a K10n3 sequence, a K5n4 sequence, a K6n4 sequence, a K7n4 sequence, a K8n4 sequence, a K9n4 sequence, a K10n4 sequence, a K6n5 sequence, a K7n5 sequence, a K8n5 sequence, a K9n5 sequence, a K10n5 sequence, a K7n6 sequence, a K8n6 sequence, a K9n6 sequence, a K10n6 sequence, a K8n7 sequence, a K9n7 sequence, a K10n7 sequence, a K9n8 sequence, a K10n8 sequence, or a K10n9 sequence.
In various embodiments, an enrichment probe includes a sequence that is between about 50 nucleotide bases and about 150 nucleotide bases. In various embodiments, an enrichment probe includes a sequence that is between about 55 nucleotide bases and about 120 nucleotide bases, between about 60 nucleotide bases and about 80 nucleotide bases, or between about 65 nucleotide bases and about 75 nucleotide bases. In various embodiments, an enrichment probe includes a sequence that is about 70 nucleotide bases. In various embodiments, an enrichment probe includes a sequence that is between about 80 nucleotide bases and about 120 nucleotide bases, between about 90 nucleotide bases and about 110 nucleotide bases, or between about 95 nucleotide bases and about 105 nucleotide bases. In various embodiments, an enrichment probe includes a sequence that is about 100 nucleotide bases.
In various embodiments, an enrichment probe further includes an affinity moiety. An affinity moiety can be biotin, streptavidin, heparin, an aptamer, a click-chemistry moiety, digoxigenin, primary amine(s), carboxyl(s), hydroxyl(s), aldehyde(s), ketone(s), or any combination thereof. In various embodiments, the enrichment probe includes an affinity moiety located at the 3′ end of the enrichment probe. In various embodiments, the enrichment probe includes an affinity moiety located at the 5′ end of the enrichment probe. In various embodiments, the enrichment probe includes an affinity moiety located on a nucleotide of the enrichment probe. In particular embodiments, the enrichment probe includes a biotin group. Thus, the enrichment probe can be captured by a corresponding affinity moiety (e.g., a streptavidin group) for purposes of performing enrichment. For example, a streptavidin coated solid support can be used to capture the enrichment probe and therefore, capture the hybridized target sequence for purposes of enrichment.
Generally, the enrichment probe is longer than a blocker sequence. In various embodiments, the enrichment probe is longer than every blocker sequence provided (e.g., blocker sequences provided as shown in
d. Enrichment Steps
In certain embodiments, the target sequence or a subset of target sequences in the nucleic acid can be enriched using an enrichment step. The enrichments step can be performed using any enrichment method known in the art. Non-limiting examples include hybrid capture, use of DNA-binding proteins to enrich a target sequence or a subset of target sequences, and nucleic acid amplification (e.g., polymerase chain reaction). In various embodiments, the additional enrichment step involves performing indexing PCR amplification e.g., using library adapters (e.g., P5/P7 adapters). Thus, selective amplification nucleic acids with hybridized primer sequences can be performed using the library adapters. In contrast, nucleic acids in which primer sequences are incapable of hybridizing with do not undergo PCR amplification.
One or more additional enrichment steps can be performed before or after the blocking of the unmethylated nucleic acids, as described herein. For example, in certain embodiments, the method comprises a first step of depleting a first subset of nucleic acids (e.g., unmethylated nucleic acids or converted nucleic acids derived from unmethylated nucleic acids), thereby leaving a second subset of nucleic acids (e.g., methylated nucleic acids or converted nucleic acids derived from methylated nucleic acids). The method comprises a second step of subjecting nucleic acid sequences comprising the target sequence to one or more additional enrichment steps to enrich for at least a subset of the target sequences.
In certain embodiments, the method comprises a first step of subjecting a plurality of nucleic acid molecules that include target sequences to an enrichment step to enrich for the target sequences. The method further comprises a second step of subjecting the plurality of nucleic acid molecules to the depletion method disclosed herein, which depletes a first subset of nucleic acids (e.g., unmethylated nucleic acids or converted nucleic acids derived from unmethylated nucleic acids) thereby leaving a second subset of nucleic acids (e.g., methylated nucleic acids or converted nucleic acids derived from methylated nucleic acids).
In certain embodiments, a target sequence or a subset of target sequences in the nucleic acid can be enriched by subjecting the nucleic acid comprising the target sequence or the subset of target sequences to hybrid capture. In hybrid capture, labeled (e.g., biotinylated) capture probes that can bind to one or more target sequences or subsets of target sequences are exposed to the nucleic acid comprising the one or more target sequences. The capture probes are specific to a sequence of interest, for example, a methylation pattern of interest that can be detected as a bisulfite-converted epitype. Examples of such hybrid capture probe sets include the KAPA HyperPrep KAPA HyperCap Workflow with HyperChoice Probes, Twist Bioscience Twist Fast Hybridization Custom Target Enrichment Panel, Integrated DNA technologies xGen Custom Hybridization Capture Panel, and SeqCAP Epi Enrichment System from Roche Diagnostics (Pleasanton, CA).
In various embodiments, the enrichment step includes capturing the target sequence or a subset of target sequences and performing one or more washes to remove nucleic acids that were not successfully captured. In various embodiments, the additional enrichment step includes capturing the target sequence or a subset of target sequences using an affinity moiety. An affinity moiety can be biotin, streptavidin, heparin, an aptamer, a click-chemistry moiety, digoxigenin, primary amine(s), carboxyl(s), hydroxyl(s), aldehyde(s), ketone(s), or any combination thereof. An one example, an enrichment probe, such as an enrichment probe 430 disclosed in
In various embodiments, a solid support or substrate can be any physically separable solid to which an affinity moiety can be directly or indirectly attached including, but not limited to, surfaces provided by microarrays and wells, and particles such as beads (e.g., paramagnetic beads, magnetic beads, microbeads, nanobeads), microparticles, and nanoparticles. Solid supports also can include, for example, chips, columns, optical fibers, wipes, filters (e.g., flat surface filters), one or more capillaries, glass and modified or functionalized glass (e.g., controlled-pore glass (CPG)), quartz, mica, diazotized membranes (paper or nylon), polyformaldehyde, cellulose, cellulose acetate, paper, ceramics, metals, metalloids, semiconductive materials, quantum dots, coated beads or particles, other chromatographic materials, magnetic particles; plastics (including acrylics, polystyrene, copolymers of styrene or other materials, polybutylene, polyurethanes, TEFLON™, polyethylene, polypropylene, polyamide, polyester, polyvinylidene difluoride (PVDF), and the like), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon, silica gel, and modified silicon, Sephadex®, Sepharose®, carbon, metals (e.g., steel, gold, silver, aluminum, silicon and copper), inorganic glasses, conducting polymers (including polymers such as polypyrole and polyindole); micro or nanostructured surfaces such as nucleic acid tiling arrays, nanotube, nanowire, or nanoparticulate decorated surfaces; or porous surfaces or gels such as methacrylates, acrylamides, sugar polymers, cellulose, silicates, or other fibrous or stranded polymers. In some embodiments, the solid support or substrate may be coated using passive or chemically-derivatized coatings with any number of materials, including polymers, such as dextrans, acrylamides, gelatins or agarose. Beads and/or particles may be free or in connection with one another (e.g., sintered). In some embodiments, the solid phase can be a collection of particles. In some embodiments, the particles can comprise silica, and the silica may comprise silica dioxide. In some embodiments the silica can be porous, and in certain embodiments the silica can be non-porous. In some embodiments, the particles further comprise an agent that confers a paramagnetic property to the particles. In certain embodiments, the agent comprises a metal, and in certain embodiments the agent is a metal oxide, (e.g., iron or iron oxides, where the iron oxide contains a mixture of Fe2+ and Fe3+). The affinity moiety may be linked to the solid support by covalent bonds or by non-covalent interactions and may be linked to the solid support directly or indirectly (e.g., via an intermediary agent such as a spacer molecule).
In particular embodiments, the solid support is a bead, such as a streptavidin-coated bead. In such embodiments, the probe, such as an enrichment probe 430 disclosed in
As discussed herein, step 120 in
After a nucleic acid has been treated to convert unmethylated, or, in some cases, methylated nucleotides, into another nucleotide, the nucleic acid may be amplified. During amplification, the converted nucleotide pairs with its complementary nucleotide, and in the next round of amplification, the complementary nucleotide pairs with a replacement nucleotide. For example, following the conversion of an unmethylated cytosine to a uracil, the nucleic acid may be amplified such that an adenine pairs with the uracil in the first round of replication, and in the second round of replication, the adenine pairs with a thymine. Accordingly, the thymine replaces the uracil in the original nucleic acid sequence, and is referred to herein as a “replacement nucleotide”.
In certain aspects, conversion of the nucleic acids involves selectively deaminating nucleotides.
In some embodiments, the conversion, for example, bisulfite conversion or enzymatic conversion, uses commercially available kits. Bisulfite conversion can be performed using commercially available technologies, such as EZ DNA Methylation-Gold, EZ DNAMethylation-Direct or an EZ DNAMethylation-Lighting kit (Zymo Research Corp (Irvine, California)) or EpiTect Fast available from Qiagen (Germantown, MD). In another example a kit such as APOBECSeq (NEBiolabs) or OneStep qMethyl-PCR Kit (Zymo Research Corp (Irvine, California)) is used.
a. Source of Nucleic Acids
Nucleic acids used in the methods described herein can be derived from any source, such as a sample taken from the environment or from a subject (e.g., a human subject). A biological sample can be treated to physically disrupt tissue or cell structure (e.g., centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis. A biological sample can take any of a variety of forms, such as a liquid biopsy (e.g., blood, urine, stool, saliva, or mucous), or a tissue biopsy, or other solid biopsy. Examples of biological samples include, but are not limited to, blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject. A biological sample can include any tissue or material derived from a living or dead subject. A biological sample can be a cell-free sample. A sample can be a liquid sample or a solid sample (e.g., a cell or tissue sample). A biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc.
The nucleic acid can be of any composition form, such as deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or DNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like), and/or ribonucleic acid (RNA) and/or RNA analogs, all of which can be in single- or double-stranded form. In certain embodiments, single-stranded nucleic acids can be made double stranded prior to cutting with an enzyme. Unless otherwise limited, a nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides. A nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like). A nucleic acid in some embodiments can be from a single chromosome or fragment thereof (e.g., a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism). In certain embodiments nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures. Nucleic acids can comprise protein (e.g., histones, DNA binding proteins, and the like). Nucleic acids analyzed by processes described herein can be substantially isolated and are not substantially associated with protein or other molecules. Nucleic acids can also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single-stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides. Deoxyribonucleotides can include deoxyadenosine, deoxycytidine, deoxyguanosine and deoxythymidine. A nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
In certain embodiments, the nucleic acid is a cell-free nucleic acid, which can be found in bodily fluids such as blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of a subject. In certain embodiments, a plasma sample can be used directly in the methods disclosed herein (for example, in the cutting step), without prior purification or isolation of nucleic acids in the plasma. Cell-free nucleic acids originate from one or more healthy cells and/or from one or more cancer cells, or from non-human sources such bacteria, fungi, viruses. Examples of the cell-free nucleic acids include but are not limited to cell-free DNA (“cfDNA”), including mitochondrial DNA or genomic DNA, and cell-free RNA. In certain embodiments herein, instruments for assessing the quality of the cell-free nucleic acids, such as the TapeStation System from Agilent Technologies (Santa Clara, CA) can be used. Concentrating low-abundance cfDNA can be accomplished, for example using a Qubit Fluorometer from Thermofisher Scientific (Waltham, MA).
In various embodiments, the majority of DNA in a biological sample that has been enriched for cell-free DNA (e.g., a plasma sample obtained via a centrifugation protocol) can be cell-free (e.g., greater than 50%, 60%, 70%, 80%, 90%, 95%, or 99% of the DNA can be cell-free).
A methylated nucleic acid is a nucleic acid having a modification in which a hydrogen atom on the pyrimidine ring of a cytosine base is converted to a methyl group, forming 5-methylcytosine. Methylation can occur at dinucleotides of cytosine and guanine referred to herein as “CpG sites”, which can be a target for enrichment. Methylation of cytosine can occur in cytosines in other sequence contexts, for example, 5′-CHG-3′ and 5′-CHH-3′, where His adenine, cytosine or thymine. Cytosine methylation can also be in the form of 5-hydroxymethylcytosine. Methylation of DNA can include methylation of non-cytosine nucleotides, such as N6-methyladenine (6 mA). Anomalous cfDNA methylation can be identified as hypermethylation or hypomethylation, both of which may be indicative of cancer status. As is well known in the art, DNA methylation anomalies (compared to healthy controls) can cause different effects, which may contribute to cancer.
In certain embodiments, the nucleic acid comprises a CpG site (i.e., cytosine and guanine separated by only one phosphate group). In certain embodiments, the nucleic acid comprises a CpG island (also referred to as a “CG islands” or “CGI”) or a portion thereof, which is the target for enrichment. Because certain CGIs and certain features of certain CGIs in tumor cells tend to be different from the same CGIs or features of the CGIs in healthy cells, detection of such CGIs can be informative of a health condition. In certain embodiments, the CGI is a “cancer informative CGIs”, which is defined and described in more detail below. In certain embodiments, the CpG is an “informative CpG”, e.g., a “cancer informative CGI”. Such CGIs may have methylation patterns in tumor cells that are different from the methylation patterns in healthy cells. Accordingly, detection of a cancer informative CGI can be informative regarding a subject's risk of developing cancer or can be indicative that the subject has cancer. Exemplary cancer informative CGIs, which can be target sequences as described herein, are identified in, e.g., Table 1 of U.S. Patent Publication 2020/0109456A1 and Tables 2 and 3 of WO2022/133315, each of which are hereby incorporated by reference in its entirety. Further exemplary cancer informative CGIs are shown in Tables 1 and 2 included herein.
In certain aspects, the nucleic acids of the invention have been treated to convert one or more unmethylated nucleotides (e.g., cytosines) to another nucleotide (a “converted nucleotide”, as used herein, such as a uracil), for example, prior to amplification. In certain embodiments, one or more unmethylated cytosines are converted to a nucleotide that pairs with adenine (e.g., the unmethylated cytosine may be converted to uracil). In certain embodiments, one or more unmethylated adenines are converted to a base that pairs with cytosine (e.g., the unmethylated adenine may be converted to inosine (I)). In certain embodiments, one or more methylated cytosines (e.g., a 5-methylcytosine (5mC)) is converted to a thymine, which pairs with adenine. In certain embodiments, methylated cytosines are protected from conversion (e.g., deamination) during the conversion step.
After a nucleic acid has been treated to convert unmethylated, or, in some cases, methylated nucleotides, into another nucleotide, the nucleic acid may be amplified. During amplification, the converted nucleotide pairs with its complementary nucleotide, and in the next round of amplification, the complementary nucleotide pairs with a replacement nucleotide. For example, following the conversion of an unmethylated cytosine to a uracil, the nucleic acid may be amplified such that an adenine pairs with the uracil in the first round of replication, and in the second round of replication, the adenine pairs with a thymine. Accordingly, the thymine replaces the uracil in the original nucleic acid sequence, and is referred to herein as a “replacement nucleotide”.
b. Bisulfite Conversion
Bisulfite conversion is performed on DNA by denaturation using high heat, preferential deamination (at an acidic pH) of unmethylated cytosines, which are then converted to uracil by desulfonation (at an alkaline pH). Methylated cytosines remain unchanged on the single-stranded DNA (ssDNA) product.
In some embodiments the methods include treatment of the sample with bisulfite (e.g., sodium bisulfite, potassium bisulfite, ammonium bisulfite, magnesium bisulfite, sodium metabisulfite, potassium metabisulfite, ammonium metabisulfite, magnesium metabisulfite and the like). Unmethylated cytosine is converted to uracil through a three-step process during sodium bisulfite modification. As shown in
c. Enzymatic Conversion
In certain embodiments, the enzymatic treatment with a cytidine deaminase enzyme is used to convert cytosine to uracil. Enzymatic conversion can include an oxidation step, in which Tet methylcytosine dioxygenase 2 (TET2) catalyzes the oxidation of 5mC to 5hmC to protect methylated cytosines from conversion by subsequent exposure to a cytidine deaminase. Other protection steps known in the art can be used in addition to or in place of oxidation by TET2. After the oxidation step, the nucleic acid is treated with the cytidine deaminase to convert one or more unmethylated cytosines to uracils. As with bisulfite conversion, when the modified strand is copied, a G will be incorporated in the interrogation position (opposite the C being interrogated) if the C was methylated and an A will be incorporated in the interrogation position if the C was unmethylated. When the double stranded extension product is amplified those Cs that were converted to Us and resulted in incorporation of A in the extended primer will be replaced by Ts during amplification. Those Cs that were not modified and resulted in the incorporation of G will remain as C.
In certain embodiments the cytidine deaminase may be APOBEC. In certain embodiments the cytidine deaminase includes activation induced cytidine deaminase (AID) and apolipoprotein B mRNA editing enzymes, catalytic polypeptide-like (APOBEC). In certain embodiments, the APOBEC enzyme is selected from the human APOBEC family consisting of: APOBEC-1 (Apo1), APOBEC-2 (Apo2), AID, APOBEC-3A, -3B, -3C, -3DE, -3F, -3G, -3H and APOBEC-4 (Apo4). In certain embodiments, the APOBEC enzyme is APOBEC-seq.
d. Nitrite Conversion
In certain embodiments, nitrite treatment is used to deaminate adenine and cytosine. As shown in
Disclosed herein is a method for enriching for a target nucleic acid sequence, the method comprising: obtaining a mixture of nucleic acid sequences, the mixture comprising the target nucleic acid sequence and one or more variant sequences of the target nucleic acid sequence; providing an enrichment probe comprising a sequence complementary to the target nucleic acid sequence; providing a plurality of blocker probes comprising a sequence complementary to one or more variant sequences of the target nucleic acid sequence; and selectively enriching for the target nucleic acid sequence using the enrichment probe without enriching for the one or more variant sequences of the target nucleic acid sequence.
In various embodiments, the target nucleic acid sequence comprises a sequence comprising one or more methylated CpG sites, or a sequence derived from the sequence comprising one or more methylated CpG sites. In various embodiments, the target nucleic acid sequence comprises a sequence comprising at least five consecutive methylated CpG sites, or a sequence derived from the sequence comprising at least five consecutive methylated CpG sites. In various embodiments, the one or more variant sequences of the target nucleic acid sequence comprise one or more unmethylated CpG sites. In various embodiments, the one or more variant sequences and the target nucleic acid sequence differ by one or more methylated or unmethylated CpG sites.
In various embodiments, the enrichment probe is at least 50 nucleotide bases, at least 70 nucleotide bases, or at least 100 nucleotide bases in length. In various embodiments, the plurality of blocker probes are between 10 and 20 nucleotide bases in length. In various embodiments, the plurality of blocker probes are between 15 and 18 nucleotide bases in length. In various embodiments, selectively enriching for the target nucleic acid sequence using the enrichment probe without enriching for the one or more variant sequences of the target nucleic acid sequence comprises: hybridizing the enrichment probe to the target nucleic acid sequence; hybridizing one or more blocker probes to the one or more variant sequences of the target nucleic acid sequence; and enriching for the hybridized enrichment probe while the one or more hybridized blocker probes prevents enrichment for the one or more variant sequences of the target nucleic acid sequence. In various embodiments, selectively enriching for the target nucleic acid sequence using the enrichment probe without enriching for the one or more variant sequences of the target nucleic acid sequence comprises: binding a streptavidin bead to a biotin group of the enrichment probe; and washing and removing the plurality of blocker probes and/or the one or more variant sequences of the target nucleic acid sequence.
Practice of embodiments disclosed herein will be more fully understood from the foregoing examples, which are presented herein for illustrative purposes only, and should not be construed as limiting the invention in any way.
Purpose: Test binding of 70 base and 100 base probe to Km10C- and Km10T-gblocks in TMAC and binding to SA-beads at 52° C. in the presence of blockers. Table 1 below shows the experimental conditions that were tested.
Protocol: Use 5 million copies gblock as target. Use 100 million copies of probe as capture. Denaturing at 95° C. in TMAC or Tris buffers and slowly cool to 52° C. and incubate overnight. Then incubate with SA-beads at room temperature for 30 minutes while rotating. Collect supernatants on magnet. Wash beads. Ethanol ppt the supernatants and resuspend. Measure supernatants and beads for gblock using qPCR Km10 assays.
Reference is now made to
As shown in
Reference is now made to
As shown in
The entire disclosure of each of the patent and scientific documents referred to herein is incorporated by reference for all purposes.
The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.
This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/618,152 filed Jan. 5, 2024, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes.
| Number | Date | Country | |
|---|---|---|---|
| 63618152 | Jan 2024 | US |