The present invention relates to a method of analyzing short tandem repeats according to the preamble of the independent claim or claims. A brief overview over the actual field of technology, commercial kits, DNA separations, recovery of information from degraded DNA, and perspectives of the future dealing with short tandem repeats (STRs) that are sometimes also referred to as micro-satellites or simple sequence repeats (SSRs) is given by John M. Butler in the Mini-Review “Short tandem repeat typing technologies used in human identity testing” (BioTechniques 2007, Suppl. to Vol. 43, No. 4).
The analysis of short tandem repeats (STRs) of individual human genomes is routinely used, e.g. in human identity testing, and in testing of other organisms like plants and cells. Such short tandem repeats are simple sequence motifs of a few up to several dozen repeat units. The human genome comprises thousands of such STRs, which are typically located in non-coding regions. As STRs are polymorph with respect to their number of repeat units, human individuals may be distinguished from each other by the unique number of repeat units per allele and per STR locus. Therefore, the analysis of STRs has turned out to be particularly useful in the identification of human individuals, e.g. in forensic medicine or parentage testing.
Typically, the analysis of STRs involves as a first step the isolation of genomic DNA of human individuals, followed by a Polymerase Chain Reaction (PCR) amplification step. Here, specific, selected STR loci are amplified, and multiplexing (the amplification of multiple STR loci simultaneously) has become routine in biological laboratories. This allows that in a single test a high discrimination rate may be achieved due to the assessment of several STRs in parallel, while only minor amounts of DNA have to be employed. This in turn is of particular relevance in forensic medicine, where often only minor amounts of DNA are available (vast amounts of DNA may be degraded).
Depending on the aim of the analysis (e.g. human identification, parentage testing, population analysis), different STRs may be used. In particular when DNA profiles should be compared among different laboratories, standardization of STR analysis is an important aspect. For example, there are at least seven well established Interpol STR loci that are used for STR analysis in European forensic laboratories (see Gill et al., “The evolution of DNA databases—Recommendations for new European STR loci”, Forensic Science International, 156 (2006), 242-244). This standardization of analyzed STR loci allows a direct comparison of DNA profiles throughout the different laboratories involved.
After the amplification step of selected genomic fragments, the length of each amplified STR is determined. Fragment length determination is widely done using e.g. capillary electrophoresis. Here, the amplified DNA products are separated by electrophoresis and detected by comparison to a standardized allelic ladder. Advantages of DNA length determination using capillary electrophoresis include highly precise sizing (e.g. to less than 1 nucleotide), multiplexing by size by making some amplicons bigger than others but labeling all amplicons with the same fluorescent label (increases throughput). Utilizing capillary electrophoresis provides the advantage that mixtures are much more easily interpreted since intensity and size data are both available to the analyst. However, capillary electrophoresis requires the use of large instruments (e.g. the ABI 3730 Genetic Analyzer, Applied Biosystems). This increases the incurring costs and the complexity of the application. Additionally, the relatively long sample run times reduce the sample throughput and thus can result in backlogs in the respective laboratory. Further advantages of capillary electrophoresis comprise that commonly used instruments with significant installed base in genomic laboratories can be utilized and STR analysis is more easily automatable than prior generation of slab gel electrophoresis. Since capillary electrophoresis does not directly interrogate nucleotide sequences, micro-heterogeneity of the STR due to sequence substitutions are not detected if the STR is of the same length; thus, important information can be lost. Moreover capillary electrophoresis instrumentation is delicate, expensive, and sensitive to dust and movement. The detection window (signal between noise at the low end and maxed out at the high end) is relatively narrow, necessitating expensive, time consuming, and cumbersome quantitative PCR quantification of DNA and normalization to get into the “sweet spot”.
Other approaches include the use of hybridization techniques for STR fragment length determination. For example, in the document WO 96/36731, the number of repeats is determined by hybridizing a target DNA with a unique set of complementary probes containing tandem repeats of known length. If a probe containing more repeats than the target DNA hybridizes, a loop structure is formed, while hybridization of a probe with the identical number of repeats, no loop structure is formed. The length is then identified using the different fluorescent labels of the various probes without using electrophoretic separation. This is a multistep process involving digestion with a nuclease specific to S—S bonds and labeling with a DNA polymerase. This method requires synthesis of a solid supported oligonucleotide array, and therefore cannot be done in solution.
In the document U.S. Pat. No. 6,395,493 B1 a method for determination of length polymorphism in DNA is disclosed, which also involves a hybridization reaction. This document describes an assay that involves the use of a silicon microchip composed of an arrayed set of electrodes that each contain a unique “capture probe” for each possible allele of each possible STR loci of interest. For example, in order to determine which of the possible eight alleles at the TPOX locus (e.g., 6-13 repeats) are present; eight different probe sites are required. The DNA sample of interest is amplified and then washed over the chip. It will hybridize to the electrodes with complementary capture probes. The “capture probe” captures the PCR-amplified STR allele by binding to the repeat region and 30-40 bases of the flanking region. After hybridization, an “electronic stringency” is then applied to each probe site by simply adjusting the electric field strength. Samples that are not a perfect match for the probe will be denatured and driven away from the probe.
After removing unbound and denatured DNA, a mixture of “reporter probes” is washed over the chips. The “reporter probe” contains 1-3 repeat units, some flanking sequence and a fluorescent dye. This probe will hybridize to the STR allele of DNA captured on the chip and generates a fluorescence signal at the probe site that can be interpreted to yield the sample's genotype. The read-out provides a genotype that corresponds to the number of repeats present in the sample even though no size-based separation has been performed.
In this method, an array of capture probes must be “printed” on the surface of the reaction vessel and the DNA is subsequently washed over this array. The Tecan method binds the DNA to any surface and then washes the probes over the DNA. In consequence, this Nanogen method requires pre-printing or purchase of a special pre-printed array. With this Nanogen assay, the intensity of the read-out signal is limited by the number of capture oligonucleotides printed on each electrode. Further amplification of the DNA sample cannot increase signal beyond the number of capture electrodes. This Nanogen method requires special instrumentation to denature mismatched hybrids prior to washing. The Tecan method requires no such special instrumentation. The Nanogen method can be performed in a single reaction vessel (i.e. microarray slide).
It is an object of the present invention to suggest a method of determining the number of tandem repeats in a nucleic acid probe.
According to a first aspect, this object is achieved by a method of detecting the number of repeat units in a selected short tandem repeat (STR) in a genomic sample according to the present invention. The method as herein disclosed comprises the steps of:
According to a second aspect, this object is achieved by proposing kits for carrying out partial genotyping by differential hybridization as herein disclosed.
Additional features of the present invention and preferred embodiments are herein disclosed as well.
Advantages of the method according to the present invention comprise:
With the help of the attached drawings, the preferred embodiments of the method and kits of the present invention are illustrated without narrowing the scope of the present invention. It is shown in:
The present invention relates to the detection of the number of repeats in selected STR loci. According to the present invention, the number of repeat units is correlated to the signal intensity of parallel hybridization experiments. By comparison to a normalization probe, exact number of repeats can be easily determined.
Selection of STR Loci:
For human identification it is proposed to use the STR loci that are generally accepted by the respective law enforcement agency. The two major sets are the 13 FBI (US Federal Bureau of Investigation) CODIS Loci and the 10 FSS (United Kingdom Forensic Science Service) SGM and SGM plus loci. Non-human DNA testing and microbial forensics is described by John M. Butler in “Forensic DNA Typing, Biology, Technology, and Genetics of STR Markers” (Elsevier Academic Press, Second Edition 2005; see chapter 11, pages 299-330). There, cat and dog STRs are described (and the sources referenced) as well as plant STRs (e.g. Cannabis sativa) and it is pointed out that “as with human STRs, marijuana STR markers are highly polymorphic, specific to unique sites in the genome, and capable of deciphering mixtures. A heaxanucleotide repeat marker showed repeat units ranging from 3-40 in 108 tested marijuana samples, and primers amplifying this locus produced no cross-reactive amplicons from other 20 species of plants tested (Hsieh et al 2003)”. From microbial forensics, first steps are reported in connection with bioterrorism, including genome sequencing of Bacillus anthracis (anthrax) and phylogenetic analyses of viral strains of HIV.
Among the various types of STR systems, tetranucleotide repeats (4 repeat units in the core repeat) have become more popular than di- or trinucleotides (2 or 3 repeat units). Penta- and hexanucleotides (5 or 6 repeat units) repeats are less common in the human genome but are being examined by some laboratories (see Butler 2005, page 89, 3rd paragraph).
PCR:
A PCR amplification step is necessary when working with STR systems because genomic DNA would be too complex for hybridization assays. Multiplex PCR, where a defined number or a combination of STR loci are treated simultaneously is possible. There actually is no maximum or minimum number of STRs; everything what is empirically possible is preferred. A large number of STR markers have been characterized by academic and commercial laboratories for use in disease and gene location studies. For example, the Marshfield Medical Research Foundation in Marshfield, Wis. (http://research.marshfieldclinic.org/genetics) has gathered geno-type data ob over 8000 STRs that are scattered across the 23 pairs of human chromosomes (see Butler 2005, page 86). There exist many commercial kits, e.g. from Applied Biosystems, Promega, and Qiagen, to accomplish the appropriate multiplex.
Used Oligonucleotides:
In a given experiment, one of the repeat probes is used with the normalization probe, and one or more of the blockers OR one or more of the flankers. When different oligonucleotides are used as a mixture and given together to each experimental sample according to the present invention, the preferred oligonucleotides are chosen such that the melting temperature (Tm) is sufficiently high to bind in all cases. Accordingly, the temperature of the experiment should be lower than Tm (detailed description see below).
Strategy of Partial Genotyping of the Human CSF1PO STR by Differential Hybridization:
CSF1PO is a short tandem repeat (STR) composed of 5-16 consecutive runs of the 5′-ATCT-3′-TAGA tetramer located at a unique position in human chromosome 5.
To partially differentiate between the repeat numbers, a novel hybridization approach was applied. In this approach, a FAM-labeled 16-mer probe (5′-AGAT)4 was hybridized to a series of chemically synthesized single-stranded DNAs which contained 2-16 ATCT repeats. Each of these 5′-ATCT-3′ repeats was embedded in the middle of a longer unrelated sequence which was biotinylated at the 5′-end (see
In
In general, any discernible labeling that allows discrimination of the two probes can be applied to the STR probe P1 and to the reference probe P2. These can be fluorescent dyes as already indicated; however, also donor-acceptor fluorescent pairs e.g. FAM-3-TAM, or FAM-3-ROX, or FAM-4-ROX as disclosed in U.S. Pat. No. 5,654,419 can be used (the rhodamine derivatives TAM and ROX are dyes of Applied Biosystems Inc.). Even if some alternatives to fluorescence labeling may exist, fluorescence is preferred because of its ability of providing multiple colors, being fast, and being sensitive. However, any sort of measurable label that can be attached to primers and that can be multiplexed could be used, i.e. radioactive, luminescent, chromogenic, tagged beads, etc.
Preferably, the kit also comprises at least one blocking oligonucleotide B1,B2 for hybridizing with a fragment of a target STR and with a fragment of the relevant single-stranded target DNA. Preferably and as depicted in
Preferably, the kit also comprises at least one flanking oligonucleotide F1,F2 for hybridizing with a fragment of the relevant single-stranded target DNA adjacent to the target STRs. Preferably and as depicted in
Importantly, the flanking oligonucleotides F1,F2 hybridize immediately adjacent to the CSF1PO repeat while each of the blocking oligonucleotides B1,B2 hybridizes to 12 nucleotides (3 repeats) of the CSF1PO sequence and to 13 nucleotides of the 5′- or 3′-flanking target sequences. It is also important to note that on the same 5′- or 3′-flanking region of the target DNA, only a blocking oligonucleotide B1,B2 or a flanking oligonucleotide F1,F2 can hybridize; thus, either two blocking oligonucleotides, i.e. B1 & B2, two flanking oligonucleotides, i.e. F1 & F2, or one blocking oligonucleotide plus one flanking oligonucleotide, i.e. B1+F2 or F1+B2 are to be used.
The following kits are preferred:
Depending on the actual number of tetranucleotide repeats present in the target DNA, the extent of hybridization of the 16-mer CSF1PO specific STR probe P1 in the presence of the various combinations of blocking oligonucleotides B1,B2 and flanking oligonucleotides F1,F2, the target DNAs theoretically can be divided into 10 different groups A-3 thus partially genotyping this tetranucleotide repeat (see Table 1).
In Table 1, one repeat corresponds to four base pairs (bp); a 16-mer probe P1 hybridizes to 4 repeats; each blocking oligonucleotide B1,B2 hybridizes to 3 repeats (see also
The attached
According to
According to
According to
According to
According to
According to
According to
According to
According to
Having practiced enough with theoretical expectations, inspection of the results of some practical experiments shall now be made. The diagrams of
Initially, a series of single-stranded DNAs which contained 2-16 ATCT repeats was chemically synthesized. Each of these 5′-ATCT-3′ repeats was embedded in the middle of a longer unrelated sequence which was biotinylated at the 5′-end (see
Prior to hybridization, each of the single-stranded target DNAs as depicted in
In addition to a 16-mer STR probe P1 (as indicated in
The reference oligomers P2, flanking oligomers F1,F2, and blocking oligomers B1,B2 were 25-mers complementary to the regions of the target DNA indicated in
For hybridization, 160 pMoles of the Cy5-labeled reference probe P2 together with 160 pMoles each of the desired combination of blocking oligonucleotides B1,B2 and flanking oligonucleotides F1,F2 were added to 0.5 mg of streptavidine-coated magnetic beads previously loaded with 80 pMoles of a specific single-stranded target DNA. Hybridization was conducted for 5 min at 65° C. followed by 15 min at 37° C. in 100 μl of Buffer A (10 mM Hepes pH 8.0, 50 mM NaCl, 10 mM MgCl2). Next, 160 pMoles of FAM-labeled STR probe P1 was added to the bead suspension. Hybridization of this STR probe P1 to the target DNA was conducted for 15 min at 47° C. After removal of the hybridization solution, the beads were incubated for 15 min at 47° C. in 100 μl of fresh Buffer A. Finally, bound probes were eluted from the washed beads by incubation for 10 min at 65° C. in Buffer B (10 mM Hepes pH 8.0, mM NaCl). Supernatants containing eluted probes were collected and transferred to a microtiter plate for reading of FAM and Cy5 fluorescence in a TECAN INFINITE® 200 microplate reader (Tecan Austria GmbH, Groedig, Austria).
On the horizontal axis of the diagrams in
It was thus expected that the target DNAs of the group C will show one hybridization when utilizing the kits c3), e), or d) and no hybridization when utilizing the kit b3). The height of the vertical-bar graphs (relative fluorescence signals) representing the use of the kits c3), e), or d) in the group C is strikingly higher than the height of the vertical-bar graph representing the use of the kit b3). It was also expected that the target DNAs of the group D will show two hybridizations when utilizing the kit c3), one hybridization when utilizing the kits e) or d), and no hybridization when using the kit b3). The height of the vertical-bar graph (relative fluorescence signal) representing the use of the kit c3) in the group D is about double the height of the vertical-bar graphs representing the use of the kits e) or d); the height of the vertical-bar graph (relative fluorescence signal) representing the use of the kit b3) is considerably lower than the height of the vertical-bar graphs representing the use of the kits e) or d). Even if there is some noticeable hybridization in group D when utilizing the kit b3), if compared with the results of the other kits very little signal is achieved, however. In consequence, the results expected for the groups C and D are regarded as clearly verified.
It was also expected that the target DNAs of the group E will show two hybridizations when utilizing the kit c3) and one hybridization when utilizing the kits e), d), or b3). The height of the vertical-bar graphs (relative fluorescence signals) representing the use of the kits c3) in the group E is about double the height of the vertical-bar graphs representing the use of the kits e), d) or b3). In consequence, the results expected for the group E are regarded as clearly verified.
It was further expected that the target DNAs of the group F will show two hybridizations when utilizing the kits c3), e), or d) and one hybridization when utilizing the kit b3). The height of the vertical-bar graphs (relative fluorescence signals) representing the use of the kits c3), e), or d) in the group F is about double the height of the vertical-bar graph representing the use of the kit b3). In consequence, the results expected for the group F are regarded as clearly verified.
It was expected that the target DNAs of the group G will show three hybridizations when utilizing the kit c3), two hybridizations when utilizing the kits e) or d), and one single hybridization when utilizing the kit b3). The height of the vertical-bar graphs (relative fluorescence signals) representing the use of the kits e) or d) in the group G in each case is about equal and double the height of the vertical-bar graphs representing the use of the kit b3). The height of the vertical-bar graph (relative fluorescence signal) representing the use of the kit c3) in the group G in each case is considerably higher than the height of the vertical-bar graphs representing the use of the kits e) or d) and about triple the height of the vertical-bar graphs representing the use of the kit b3). In consequence, the results expected for the group G are regarded as verified.
It was expected that the target DNAs of the group H will show three hybridizations when utilizing the kit c3) and two hybridizations when utilizing the kits e), d), or b3). The height of the vertical-bar graphs (relative fluorescence signals) representing the use of the kits e), d), or b3) in the group H is about equal. The height of the vertical-bar graph (relative fluorescence signal) representing the use of the kit c3) in the group H is considerably higher than the height of the vertical-bar graphs representing the use of the kits e), d) or b3). In consequence, the results expected for the group H are regarded as verified.
It was expected that the target DNAs of the group I will show there hybridizations when utilizing the kit c3), e), or d) and two hybridizations when utilizing the kit b3). The height of the vertical-bar graphs (relative fluorescence signals) representing the use of the kits c3), e), or d) in the group I is about equal. The height of the vertical-bar graph (relative fluorescence signal) representing the use of the kit b3) in the group I is considerably lower than the height of the vertical-bar graphs representing the use of the kits c3, e), or d). In consequence, the results expected for the group I are regarded as verified.
It was expected that the target DNAs of the group J will show four hybridizations when utilizing the kit c3) (compare with
The above analysis has been discussed on the base of the
The preferred STR probe oligonucleotide P1 as utilized:
The preferred reference oligonucleotide P2 as utilized:
The preferred blocking oligonucleotide B1,B2 as utilized:
The preferred flanking oligonucleotides F1,F2 as utilized:
When carrying out the method of the present invention, the number of repeat unit sequences of the target DNA available for the STR probe oligo P1 is reduced by the addition of blocking oligos B1,B2. When the number of repeat units in the blocking oligo is known, the number of repeat units which are not available any more after binding of the blocking oligo is known too. In any case for the 3′-end and for the 5′-end of the STR fragment of the target DNA, one blocking oligo B1,B2 or one flanking oligo F1,F2 may be used. Thus, the reduction of available repeat units on the target DNA results in a reduction of fluorescence intensity compared to an experiment carried out only with the STR probe oligo but without blocking oligo, and the measured difference of fluorescence intensity is used to determine the number of repeats in the STR (comparison of max. intensityprobe oligo alone vs. reduced intensityprobe oligo+blocking oligo).
Insert Probes P3:
An insert probe P3 (see
The insert probe P3:
In general, the oligos (blockers and probes) are designed so that each possible STR can be uniquely identified with a minimum number of probes. For carrying out the above discussed experiments, a number of oligonucleotides have been chosen for model the CSF1PO test system (5′-AGAT-375′-ATCT-3′ repeat flanked by artificial sequences). These oligonucleotides are described by the sequence listing attached to this patent application. This sequence listing comprises:
SEQ ID: NO 1, a reference target strand with 5 AGAT repeats (not synthesized);
SEQ ID: NO 2, a 16-mer probe to ATCT repeat (STR probe P1);
SEQ ID: NO 3, a 20-mer probe to ATCT repeat;
SEQ ID: NO 4, a 25-mer 5′-complementary oligo (flanking oligo F1);
SEQ ID: NO 5, a 25-mer 5′-blocking oligo (blocking oligo B1);
SEQ ID: NO 6, a 25-mer 3′-complementary oligo (flanking oligo F2);
SEQ ID: NO 7, a 25-mer 3′-blocking oligo (blocking oligo B2);
SEQ ID: NO 8, a 25-mer 3′-reference oligo (reference probe P2);
SEQ ID: NO 9, a target DNA with 2 5′-ATCT-3′ repeats;
SEQ ID: NO 10, a target DNA with 3 5′-ATCT-3′ repeats;
SEQ ID: NO 11, a target DNA with 4 5′-ATCT-3′ repeats;
SEQ ID: NO 12, a target DNA with 5 5′-ATCT-3′ repeats;
SEQ ID: NO 13, a target DNA with 6 5′-ATCT-3′ repeats;
SEQ ID: NO 14, a target DNA with 7 5′-ATCT-3′ repeats;
SEQ ID: NO 15, a target DNA with 8 5′-ATCT-3′ repeats;
SEQ ID: NO 16, a target DNA with 9 5′-ATCT-3′ repeats;
SEQ ID: NO 17, a target DNA with 10 5′-ATCT-3′ repeats;
SEQ ID: NO 18, a target DNA with 11 5′-ATCT-3′ repeats;
SEQ ID: NO 19, a target DNA with 12 5′-ATCT-3′ repeats;
SEQ ID: NO 20, a target DNA with 13 5′-ATCT-3′ repeats;
SEQ ID: NO 21, a target DNA with 14 5′-ATCT-3′ repeats;
SEQ ID: NO 22, a target DNA with 15 5′-ATCT-3′ repeats; and
SEQ ID: NO 23, a target DNA with 16 5′-ATCT-3′ repeats.
The mixtures or kits used to analyze any STR loci are very dependant on the sequence of the STR loci. But, to analyze one unknown STR locus, one would need to perform at least 3 experiments as follows:
The asterix (*) refers to probes that contain a fluorescent label. Each label within a single experiment must be different. If an STR locus contains an insert (“island”), then all three experiments would also contain an insert probe P3, also labeled. More complex STRs might also require a third probe of a different length (also fluorescently labeled) and/or additional experiment(s) using different length blocker(s).
For distinguishing alleles of one particular STR, an example is given in