The present invention relates to stem-loop probes for single nucleotide polymorphism (SNP) genotyping of individual SNP nucleic acid target sequences. The stem-loop probes comprise first, second, and third single stranded nucleic acid portions. The second single stranded nucleic acid portion is located between the first and the third single stranded nucleic acid portions. The first and the third single stranded nucleic acid portions build a double stranded, intramolecular stem. The second single stranded nucleic acid portion forms a single stranded oligonucleotide loop with a nucleotide sequence that is complementary to individual SNP nucleic acid target sequences. The present invention also relates to a method of detecting single nucleotide polymorphism (SNP) in nucleic acid containing samples, utilizing a pair of stem-loop probes for SNP genotyping of two individual SNP nucleic acid target sequences of a sample.
In particular in forensic medicine, the analysis of DNA (deoxyribonucleic acid) is routinely used, e.g. for identifying human individuals or for profiling a group of individuals. Various methods have been established for different purposes, the most frequently used tool involves the analysis of STR (short tandem repeats). However, in the last years, the interest in the use of SNPs (single nucleotide polymorphism) has increased. Such SNPs are single-based variations at a unique physical location of the genomic DNA of an individual, and are currently considered to be the most common class of human polymorphism (with an estimated occurrence of 1 in every 1000 bases in the human genome). A number of SNPs are known to be associated with distinct diseases, e.g. when being located in the coding region of a gene. Furthermore, there are other SNPs known to be connected to specific populations, e.g. with US Caucasians or Hispanics, which is particularly useful in forensic medicine (taken from “SNP Typing in Forensic Genetics”, B. Sobrino and A. Carracedo, Methods in Molecular Biology 2005, Vol. 297: 107-126).
The use of SNPs offers several advantages over the analysis of STRs:
Generally, four major assay principles of SNP genotyping are known: allele specific hybridization, primer extension, oligonucleotide ligation, and invasive cleavage. Allele specific hybridization involves the generation of two allele-specific hybridization probes specific for the nucleotide polymorphism found in the analyzed SNP. Only the hybridization of probe and SNP region with a perfect nucleotide match results in stable hybrids, while the hybrid with a one-base mismatch is unstable at the same temperature. Known methods of detecting stable and unstable hybrids are e.g. FRET (Fluorescence resonance energy transfer) and Array hybridization.
Array hybridization for genotyping SNPs in human is for example known from the patent document U.S. Pat. No. 7,361,468 B2. Here, short oligonucleotides including both allele specific polymorphism probes are spotted in a microarray. An advantage of this array hybridization is that many SNPs may be analyzed in parallel. However, the design of the probes when analyzing different SNPs in parallel may raise some problems, as the efficiency of hybridization and the stability of hybrids is not only based on the polymorphic site but also on the SNP flanking sequence. This in turn affects the melting temperature of the resulting hybrids. According to this patent document, the use of a multitude of immobilized probes for each SNP, with each probe differing in the respective sequences of the flanking sites may solve this problem.
It is an object of the present invention to suggest a collection of stem-loop probes that act simultaneously in a single sample and in the same temperature range for assessing multiple SNP sites in a single sample.
It is a further object of the present invention to suggest a method of detecting multiple SNP sites in a single sample using a collection of stem-loop probes that act simultaneously in said single sample and in the same temperature range.
A first objective is achieved by a stem-loop probe for single nucleotide polymorphism (SNP) genotyping of individual SNP nucleic acid target sequences. The stem-loop probe comprises first, second, and third single stranded nucleic acid portions. The second single stranded nucleic acid portion is located between the first and the third single stranded nucleic acid portions. The first and the third single stranded nucleic acid portions building a double stranded, intramolecular stem and the second single stranded nucleic acid portion forms a single stranded oligonucleotide loop with a nucleotide sequence that is complementary to individual SNP nucleic acid target sequences. The stem-loop probe according to the present invention is characterized in that the nucleotide sequence of the stem-loop probe is chosen such that perfect match probe/target hybrids have a melting point Tm that is at least 5° C. higher than the Tm of mismatched probe/target hybrids.
A second objective is achieved by proposing a method of detecting single nucleotide polymorphism (SNP) in nucleic acid containing samples, utilizing a pair of stem-loop probes for SNP genotyping of two individual SNP nucleic acid target sequences of a sample. The stem-loop probes comprise first, second, and third single stranded nucleic acid portions. The second single stranded nucleic acid portion is located between the first and the third single stranded nucleic acid portions.
The first and the third single stranded nucleic acid portions build a double stranded, intramolecular stem and the second single stranded nucleic acid portion forms a single stranded oligonucleotide loop with a nucleotide sequence that is complementary to one of the individual SNP nucleic acid target sequences. The method of detecting SNP in nucleic acid containing samples according to the present invention is characterized in that a ratio of perfect match probe/target hybrids to mismatched probe/target hybrids is detected at a certain temperature, and in that the nucleotide sequence of the stem-loop probe is chosen such that the perfect match probe/target hybrids having a melting point Tm that is at least 5° C. higher than the Tm of mismatched probe/target hybrids.
Additional features and preferred embodiments of the present invention derive from the depending claims in each case.
Known techniques (see e.g. U.S. Pat. No. 7,361,468 B2 and U.S. Pat. No. 7,582,421 B2) utilize arrays of capture probes over which a DNA sample is run. In contrast to this prior art and with the present invention, the DNA sample are captured and probes are run over the DNA samples. This provides the particular advantage that there is no need to utilize specially prepared microarrays.
The enclosed drawings are used to explain the present invention and shall not limit its scope. It is shown in:
The present invention relates to the detection of single nucleotide polymorphisms (SNPs) for profiling a group of individuals or for detecting individuals in particular in forensic medicine. The following is an explanation of preferred embodiments of the present invention and shall not limit the gist and scope of this invention.
Design of Stem-Loop Probes:
One of the main ideas and targets of the present invention is the design of a collection of stem-loop probes that can act simultaneously in a single sample and in the same temperature range for assessing multiple SNP sites in a single sample. More particularly, the whole concept is based on how carefully designed the stem-loop probe is in order to have exactly the correct Tm to prefer to form hybrids with matching templates only providing significant discrimination between two very similar sequences that may differ in one nucleotide only.
As a first test model, genotyping of SNP 9 in a representative human genomic DNA sample was chosen. SNP 9 is a biallelic Guanine/Cytosine (G/C) single nucleotide polymorphism (SNP) found in human DNA on chromosome 9 (NCBI identifier rs763869). The inventors of the current invention decided utilizing two stem-loop probes to genotype this SNP since such probes are more discriminating than simple linear probes.
The sequence and length of the stem determines how stable the stem-loop structure is relative to the binding strength of the probe with the target (i.e. as a template duplex structure). Longer sequences make generally stronger stems, higher G-C base pairs make generally stronger stems, but it is well known that the strength of the duplex is sequence context dependent (not just a function of the number of G-C base pairs). According to such consideration, the inventors decided to build a stem with 5 base pairs (see guideline No. 1, below).
The loop is what hybridizes to the SNP region, so the SNP and immediate flanking regions dictate the loop sequence. There could be longer (more tightly binding, higher Tm) loops or shorter (less tightly binding, lower Tm) loops. There could also be non-sense bases at either end of the loop region to make the stem-loop more or less stable and drive the equilibrium either direction. More specifically, the length should not be so long that there is non-specific binding at the hybridization temperature. The experiment to determine these is to examine the background measurements at the desired hybridization temperature. If too long, the mismatch will be more stable and high background will result. According to such consideration, the inventors decided to build a stem-loop with 15-18 nucleotides (see guideline No. 2, below).
During expected hybridization, the double-stranded stem opens, but does not bind to target DNA. There should be no influence of “flanking sequences” in the target sequence, i.e. the target sequence should be free of significant secondary structure at the hybridization temperature, because if the flanking sequences form a somewhat stable structure in the target without the probe present, this would make probe binding less efficient, requiring a stronger binding probe. Also, the immediately adjacent flanking nucleotides (one away from the ends of the nascent duplex) will influence binding energy of the probe in a sequence dependent way. Lastly, if there are other ways the probe can bind to the flanking region, this will effectively compete with the desired duplex. The combination of length and sequence of the loop together with the length and sequence of the stem determine the melting temperature in the hybridization reaction by affecting the equilibrium between stem-loop structure and probe-template duplex structure.
It further has been considered by the inventors to NOT have a G adjacent to the fluorescent label in the probe sequence as the Guanine will quench the fluorescence signal (see guideline No. 5, below and
The two probes, each complementary to one of the two SNP sequences, have been designed using free software available on the IDT website and according to the following guidelines:
Guanine-Cytosine (G-C) (see
According to the above guidelines and due to some empirical determination and with some smart design ideas, the inventors arrived with the stem-loop design according to which the sequence listing attached to this patent application has been prepared. For illustration purposes, one pair of probes is displayed here:
It is thus preferred that, when starting at the 5′ ends:
Accordingly, the preferred full length oligonucleotide comprising the 1st, 2nd, and 3rd nucleic acid portion has the SEQ ID: NO 56 and the preferred full length oligonucleotide comprising the 1st, 4th, and 3rd nucleic acid portion has the SEQ ID: NO 81. A number of 109 preferred full length oligonucleotides with conjugated fluorophors (comprising the full length oligonucleotides with the SEQ ID: NOs 56 and 81 above) are listed in the attached sequence listing.
The chosen fluorophors FAM, and/or Cy5, and/or Q670 are known in the art and exhibit the following characteristics:
These two probes have been found to perfectly match the target DNA of SNP 9 locus, either with or without polymorphism. In order to demonstrate this, the following experiments have been carried out:
Amplification of SNP Sequence from Genomic CEPH DNA:
PCR amplification of SNP 9 from CEPH genomic DNA 6984 (purchased from Coriell Institute, Camden, N.J., USA) was carried out in a total volume of 100 μl prepared by mixing together 63.2 μl water, 20 μl X5 HF buffer (NEW ENGLAND BIOLABS), 5 μl 10 μM primer M9F (5′-AAGTGATGGAGTTA-GGAAAAGAACC), 5 μl 10 μM primer M9R (Biotin-5′-AAGACATTAGGTGGATTC-ATAGCTG), 0.8 μl 25 mM dNTPs, 1.0 μl Phusion DNA polymerase (NEW ENGLAND BIOLABS), and 5 μl of 30 ng/μl CEPH DNA.
Additions were carried out using pipette tips with an aerosol barrier. Amplification was conducted in a thermocycler programmed for 2 min at 98° C. (hot start), 35 cycles of 30 sec at 98° C. (denaturation step), 1 min at 60° C. (annealing step), and 15 sec at 72° C. (extension step) followed by 1 min at 72° C. and storage at 4° C. A 5 μl aliquot of the PCR reaction was analyzed by electrophoresis in a 2% agarose gel in X1 TAE for purity (single band 95 by in length) and yield (40 pmoles of product in the 100 μl PCR reaction). One strand of the PCR product was biotinylated at the 5′ end.
Immobilization of PCR Product on Streptavidin-Coated Magnetic Beads:
A volume of 300 μl of a 10 mg/ml suspension of streptavidin-coated magnetic beads (Dynabeads M-280 Streptavidin; INVITROGEN) was transferred to a 0.6 ml microcentrifuge tube. The beads were then washed 3 times with 300 μl of BW buffer (50 mM Tris-HCl pH 7.5, 0.5 mM EDTA, 1 M NaCl) using a magnetic stand (INVITROGEN) to pellet the beads between washes. After the third wash the beads were resuspended in 150 μl 2× BW buffer and 25 μl aliquots (containing 0.5 mg beads) were dispensed into five 1.5 ml microcentrifuge tubes. To four of the tubes was added 25 μl of 2× BW buffer followed by either 1 μl of 80 μM T39, 1 μl of 80 μM T40, 1 μl of 40 μM T39+1 μl 40 of μM T40, or 1 μl water a (no-target control). Additional water was added to each of these tubes to give a final volume of 100 μl.
T39 and T40 (shown in order below) are single-stranded biotinylated 60-mer oligonucleotides that replicate the two SNP sequences and are used as standards for calibrating the hybridization readout. To a fifth tube was added 25 μl of bead suspension, 75 μl of 2× BW buffer, and 100 μl of PCR reaction mixture. Binding of biotinylated standards or PCR product+excess biotinylated primer to the streptavidin-coated magnetic beads took place at room temperature for 30 min. After pelleting the beads on a magnetic stand, the supernatants were removed. Beads loaded with synthetic single-stranded targets were rinsed 3 times with 100 μl aliquots of BW buffer and 3 times with 100 μl aliquots of HD buffer. After resuspension in 100 μl of HD buffer (10 mM HEPES pH 7.5, 50 mM NaCl, 1 mM MgCl2), the solutions were transferred to 0.5 ml microcentrifuge tubes. Beads loaded with PCR product were processed separately as described in the next step.
Alkaline Stripping of Non-Biotinylated PCR Strand:
Beads loaded with biotinylated PCR product were washed 3 times with 100 μl aliquots of BW buffer followed by 1 wash with 100 μl of 2× BW buffer. The beads were resuspended in 100 μl of 0.1 M NaOH and incubated 5 min at room temperature. After removal of the supernatant, the beads were washed once with 100 μl of fresh 0.1 M NaOH, 3 times with 100 μl aliquots of BW buffer, and 3 times with 100 μl aliquots of HD buffer. Pelleted beads were finally resuspended in 100 μl of HD buffer and transferred to a 0.5 ml microcentrifuge tube for hybridization.
Hybridization:
A volume of 2 μl of probe cocktail (80 μM P52 and 80 μM P77) was added to each 100 μl suspension of target bearing bead suspension and incubated at 49° C. for 20 min. The supernatant solution was removed and the beads were incubated in 100 μl of HD buffer at 49° C. for 15 min. Finally, the hybridized probe was recovered by incubating the bead suspension in 100 μl of HD buffer at 65° C. for 15 min. The supernatants were collected and transferred to a microtiter plate for reading of fluorescence.
Signal Readout and Analysis:
Each supernatant together with a buffer blank and a reference standard (100 μl HD buffer containing 80 pmols (pico mols) each of P52 and P77) were read for fluorescence of FAM (485 nm excitation & 520 nm emission) and Cy5 (633 nm excitation and 666 nm emission). After correcting readings for background fluorescence, the ratio of Cy5 to FAM fluorescence was used to type the CEPH 6984 DNA sample for SNP 9 as shown in
The Method of Detecting SNP in Nucleic Acid Containing Samples:
When, according to the present invention, applying the method of detecting single nucleotide polymorphism (SNP) in nucleic acid containing samples, a pair of stem-loop probes according to the present invention and as already described is utilized for SNP genotyping of two individual SNP nucleic acid target sequences of a sample. In particular, a ratio of perfect match probe/target hybrids to mismatched probe/target hybrids is detected at a certain temperature and the nucleotide sequence of the stem-loop probe is chosen such that the perfect match probe/target hybrids having a melting point Tm that is at least 5° C. higher than the Tm of mismatched probe/target hybrids.
As a possible starting material, a swab head with biologic material (ideally of one single subject, e.g. a person, but may also be of many people as a mixture) is taken. The method is however not restricted to blood investigation; any biological material that yields genomic DNA connection starting material of an individual (human, animal, or plant) can be utilized. Such genomic DNA is isolated from the biological material and purified. Mitochondrial deoxyribonucleic acid (mtDNA) may also co-purify, but is usually not measured or interrogated.
When utilizing microtiter plates or the electrowetting techniques and cartridge of the co-pending U.S. patent application Ser. No. 13/188,584 (which by explicit reference is incorporated herein in its entirety), and polymerase chain reaction (PCR) for one single DNA sample, 13 different genomic targets (=13 different SNP loci) can be interrogated with 26 different stem-loop probes and 13 different normalization probes according to the invention.
However, the number of target sequences may have an influence on the PCR reaction: Since input DNA may be limiting in the instance of trace samples, it would be preferable to maximize the use of the DNA and do this as a multiplex. The maximal number would be the number of amplicons that could be produced in an optimized multiplex PCR. Sanchez et al. (“Development of a multiplex PCR assay detecting 52 autosomal SNPs” International Congress Series 2006, 1288: 67-69, which is introduced herein in its entirety) produced a 52-plex PCR reaction with 104 primers.
To ensure discrimination of certain populations, a minimum number of target sequences preferably is used in order to ensure a specific, desired conclusion as the discrimination power of the assay increases with the number of SNPs. The random match probability (R) is given by the equation:
R=(p)n [1],
where p is the individual match probability for a given SNP and n is the number of SNPs. In a simple example, if the population has 50% allele 1 and 50% allele 2 (which is the best case scenario for biallelic SNPs), then p=0.375 (the possibilities are AA, 2× AB, BB). The discrimination power of the assay is the inverse of the random match probability R, so that with 1 SNP, p=0.375 and the assay could generally discriminate an individual in a population of 3 (which is not so good). With 13 SNPs however, the assay could generally discriminate 1 in 345,000, with 21 SNPs, 1 in 881,745,755, etc. This analysis is described by D. A. Jones (“Blood samples: Probability of Discrimination” J. Forensic Sci. Soc. 1972: 355-359 included, which is introduced herein in its entirety). Here, the inventors choose 13 SNPs as a compromise to work with as few as possible for the engineering of the device as disclosed in the co-pending U.S. patent application Ser. No. 13/188,584, but to have enough to be meaningful. The SNPs of the present patent application are chosen from the paper of Sanchez et al. (see above). Other applications would probably use different SNPs.
With respect to an influence of the type of SNPs used for the analysis, the SNPs preferably are chosen such that they are conservative, non-coding, and best if they are as close as possible to 50% each allele in the populations being screened. Of course there could be other uses for stem-loop probes according to the invention, such as eye color determination of an unknown individual. There would be used SNPs shown to determine eye color. However, it is preferred following the general population rules.
It is preferred to use a 2 step PCR thermal cycling protocol to reduce the time of the assay. The temperatures are determined empirically with the particular amplicons used. More specifically, a PCR protocol was selected that gives robust and specific amplifications of all the PCR products that are to be interrogated in subsequent steps. The steps for optimizing this protocol are the same as they are for forming any multiplex PCR assay, with important factors being: having similar melting temperatures for all the PCR primers in the reaction so that they all work at a single annealing temperature. This is accomplished by adjusting the length and position of the PCR primers so that their calculated and demonstrated melting temperatures are similar. This can also include changing the position of the PCR primers if it is shown that they bind non-specifically (i.e. anneal at multiple places in the genome). There are many software programs that can facilitate this task. The empirical optimization involves conducting PCR at various temperatures regimens to show that PCR is specific and efficient around the chosen profile. The use of conventional 3-temperature PCR would also be possible.
Minimum size of an amplicon would be the total of the length of the 2 PCR primers, enough sequence to allow for the normalization probe (say 20 bp) and the hybridization region of the SNP probes (say another 20). The inventors strived to keep the PCR amplicon size to a minimum so that the complexity of the hybridization target (i.e. the PCR amplicon) is as low as possible: longer targets afford more opportunities for unwanted partial hybridization or possible secondary structure that would lower signal and increase noise.
Whether the sense primer or the antisense primer is Biotin linked in order to isolate one single strand of the two alleles while removing the other, complementary strand is completely arbitrary and is influenced by which strand of the product (sense or antisense) performs better as a SNP assay template. One of the strands may exhibit inhibitory secondary structure that the other does not. The appropriate selection usually is determined empirically. More specifically, one of the two strands may have secondary structure when rendered single stranded which would compete with probe binding. This is seen as low hybridization efficiency and low signal. Further, complementary probes may bind non-specifically to one of the two strands, which is seen as high background in a “mismatch” primer hybridization experiment. Computer programs help to predict these, but they are proven empirically.
After unbound probe material is washed away, what is detected is the fluorescence (proportional to the amount) of each of the e.g. three probes labeled with 3 different fluorophors. A normalization probe should bind stoichiometrically with the target such that amount of the normalization probe indicates the amount of the target. The allele specific probes are in competition for the same binding site surrounding the SNP with the match probe being favored over the mismatch probe.
Exemplary Hybridization Experiments:
In a first series of exemplary hybridization experiments (see
It is important to note here that there are two necessary discrimination measures per SNP, i.e. the results of four experiment and two probes.
In a second series of exemplary hybridization experiments (see
It is important to note here that there are two necessary specificity measures per SNP, i.e. the results of four experiment and two targets.
The specificity was found to be greater than 3.0 or smaller than 0.3 at 47° C., at 49° C., and at 51° C., wherein specificity is defined here as the signal of perfect match hybrid divided by the signal of mismatched hybrid for the same target.
In a third series of exemplary hybridization experiments (see
The resolving power generally was found to be greater or equal 2.5 for paired Cy5/FAM probes. For many paired Cy5/FAM probes, the resolving power was even greater than or equal to 3.5. It is important to note here that the relative signal between FAM and Cy5 is arbitrary (because the gain on each channel is arbitrary), but as the resolving power has double ratio, arbitrariness is removed. It is important to note that (as shown) there are four Resolving Power metrics per SNP.
Probe Design:
The
The
The following Table 1 presents a selection of stem-loop probes and the respectively measured melting temperatures. These melting temperatures refer in each case to a 0.4 μM probe in HEPES buffer with 50 mM NaCl and 1 mM MgCl2. In the Table 1, the SEQ ID: NOs of the stem-loop probes, the probe identity numbers (probe ID), the probe sequences with the attached fluorophors, and the respectively measured melting temperatures (Tm) are indicated. As a key for reading the probe ID, a second number designates the SNP, a second letter indicates whether the probe has a forward or reverse sequence, and a third letter indicates the identity of the SNP (underlined in probe sequence).
Genotyping Assay:
The
First, the probe sequences from
Then, the probe sequences from
It is thus demonstrated that in an assay performed using 3′-FAM labeled stem-loop probes at 47° C., hybridization with stem-loop probes can successfully genotype Amelogenin.
The
As expected, the signal with the match arrangements is much higher than the signal with the mismatch arrangements (more probe binds to the match target).
Below the graph of signals is listed the temperature at which each SNP experiment was carried out. Here, each probe of the set (match and mismatch) was designed to perform at the same temperature as discussed above.
The desired outcome as displayed in this data is both very high signal with the match configurations (reaching a maximum at 1:1 binding of match probe to target) and very low signal in the mismatch configurations (reaching a minimum at zero mismatch probe binding to target) such that the ratio of match signal to mismatch signal is as high as possible.