The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled 9936WOO1.txt. The information in the electronic format of the sequence listing is incorporated herein by reference in its entirety.
This invention relates generally to the fields of genetic mapping and genetic identity testing, including forensic testing and paternity testing. In certain aspects, the invention relates to the use of amplification and mass spectrometry in DNA analysis using tandem repeat regions of DNA. In other aspects, the invention provides for rapid and accurate forensic analysis by using mass spectrometry to characterize informative regions of DNA.
The process of human identification through DNA analysis is a common objective of forensics investigations. As used herein, “forensics” is the study of evidence, for example, that discovered at a crime or accident scene that is then used in a court of law. “Forensic science” is any science used to answer questions of interest to the legal system, in particular the criminal or civil justice system, providing impartial scientific evidence for use in the courts of law, for example, in criminal investigations and trials. Forensic science is a multidisciplinary subject, drawing principally from chemistry and biology, but also from physics, geology, psychology and social science, for example. The goal of one aspect of human forensics, forensic DNA typing, is to determine the identity or genotype of DNA acquired from a forensic sample, for example, evidence from a crime scene or DNA sample from an individual. Typical sources of such DNA evidence include hair, bones, teeth, and body fluids such as saliva, semen, and blood. There often exists a need for rapid identification of a large number of humans, human remains and/or biological samples. Such remains or samples may be associated with war-related casualties, aircraft crashes, and acts of terrorism, for example.
Tandem DNA repeat regions, which are prevalent in the human genome and exhibit a high degree of variability among individuals, are used in a number of fields, including human forensics and identity testing, genetic mapping, and linkage analysis. Various types of DNA repeat regions exist within eukaryotic genomes and can be classified based on length of their core repeat regions. Short tandem repeats (STRs), also called simple sequence repeats (SSRs), or microsatellites are repeat regions having core units of between 2-6 nucleotides in length. For a particular STR locus, individuals in a population differ in the number of these core repeat units.
STR typing involves the amplification of multiple STR DNA loci that display a collection of alleles in the human population that differ in repeat number. Typically, the products of such amplification reactions are analyzed by polyacrylamide gel or capillary electrophoresis using fluorescent detection methods, and subsequent discrimination among different alleles based on amplification product length. Because a typical STR typing analysis will use multiple STR loci that are not genetically linked, the product rule can be applied to estimate the probability of a random match to any STR profile where population allele frequencies have been characterized for each locus (Holt C L, et. al. (2000) Forensic Sci. Int. 112(2-3): 91-109; Holland M M, et. al. (2003) Croat. Med. J. 44(3): 264-72). This leads to extremely high differentiation power with low random match probabilities within the human population. Because of the short length of STR repeats and the high degree of variability in number of repeats among individuals in a population, STR typing has become a standard in human forensics where sufficient nuclear DNA is available.
A number of tetranucleotide STRs and methods for STR-typing have been explored for application in human forensics. Commercial STR-typing kits are available that target different STR loci, including a common set of loci. The FBI Laboratory has established 13 nationally recognized core STR loci that are included in a national forensic DNA database known as the Combined DNA Index System (CODIS). The 13 CODIS core loci are CSF1PO, FGA, TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11. Sequence information for these loci are available from STRBase. The range of numbers of repeat units for reported alleles for these CODIS 13 loci is 6-16, 15-51.2, 3-14, 6-13, 10-24, 9-20, 7-16, 6-15, 8-19, 5-15, 5-15, 7-27, and 24-38 respectively (Butler, J M, 2001 Forensic DNA Typing Academic Press). When profiles are available with allele information for all 13 of these core STR loci, the average probability of a random match is lower than one in a trillion among non-related individuals. STR-typing by DNA sequencing is less desirable as it presents time constraints and is labor intensive.
Y-STRs are STRs located on the Y chromosome and are designated by “DYS numbers” where “DYS” refers to “DNA Y chromosome Segment.” A core group of minimum haplotype markers has been defined which includes DYS393, DYS19, DYS391, DYS385a/b, DYS390, DYS392, DYS437, DYS438, DYS439, and DYS389I/II (Butler, J. M. Forensic DNA Typing, 2nd ed.; Elsevier Academic Press: Burlington, 2005). Y-STRs have been used by forensic laboratories to examine sexual assault evidence. In a sexual assault case, evidence will contain both female and male DNA. Differential extraction is often used to separate the male component from the female component. More often, however, the male and female components cannot be separated completely. As a result, the female component could exist prominently even in the male component after separation. When the “male DNA sample” undergoes the PCR amplification process, the female DNA component is amplified as well, sometimes masking the male DNA, which makes analysis difficult. Masking does not occur when Y-STRs are examined Since there is no Y-STR in the female evidence, Y-STR data can only come from the assailant(s) in such a sexual assault case. The male component will be easily detected, since only this part of DNA will be amplified. The Y-STR system is especially helpful in cases with more man one assailant. The mixed pattern in the evidence can help to identify those males responsible for the assault. Y-STR analysis is also used for non-sexual assault cases where mixed samples are collected from evidence. A conventional STR analysis will often cause the masking effect if there is a very small quantity of male DNA in the mixed sample. Performing Y-STR testing can help to identify all males who have contributed to the evidence.
STR-typing using STR markers has become the human forensic “gold standard” as the combined information derived from the 13 distinct CODIS alleles provide enough information to uniquely identify an individual's DNA signature to a statistical significance of 1 in 109. Standard or conventional STR-typing methods, which typically use amplification and electrophoretic size determination to resolve individual alleles, have certain limitations. At low STR copy number it is not uncommon to observe allele “drop out” in which a heterozygous individual is typed as a homozygote because one of the alleles is not detected. Additionally, in cases of highly degraded or low copy DNA samples, entire markers may drop out leaving only a few STRs from which to derive a DNA profile. In certain situations for example, such as mass disaster victim identification, a large number of samples with varying DNA quantity and quality can exist, many of which produce only partial STR profiles. While in some cases a partial profile can be used to include or exclude a potential suspect or identity, conventional STR typing methods sometimes do not provide sufficient resolution at the available loci in the case of a partial profile. Thus, there is a need within the forensics community to increase resolution of STR-typing methods, such that it is possible to derive additional information from degraded DNA samples which yield an incomplete set of STR markers and from other samples where detection of the complete STR set is not possible.
Techniques would be beneficial that could resolve sequence polymorphisms in alleles and thus increase the observed allelic variation for several common STR loci, while maintaining the advantages of amplification-based techniques, such as rapidness and the ability to automate the procedure for high-throughput typing. Thus, there is a need for STR typing methods that provide a higher level of resolution compared with standard techniques. Moreover, there exists a need for the development of an automated platform capable of high-throughput sample processing to enable analysis of a large number of samples produced simultaneously or over a short period of time, as in the case of mass disaster or war.
Mass spectrometry provides detailed information about the molecules being analyzed, including high mass accuracy. It is also a process that can be easily automated. Electrospray ionization mass spectrometry (ESI-MS) provides a platform capable of automated sample processing, and can resolve sequence polymorphisms between STR alleles (Ecker et. al. J. Assoc. Laboratory Automation 2006, 11, 341-51).
Matrix-assisted laser desorption-ionization time-of-flight mass spectrometry (MALDI TOF MS) has been employed to analyze STR, SNP, and Y-chromosome markers. (Butler, J.; Becker, C. H. Science and Technology Research Report to NIJ 2001, NCJ 188292, October; Monforte, J. A.; Becker, C. H. Nat Med 1997, 3, 360-362; Taranenko, N. I.; Golovlev, V. V.; Allman, S. L.; Taranenko, N. V.; Chen, C. H.; Hong, J.; Chang, L. Y. Rapid Commun Mass Spectrom 1998, 12, 413-418; Butler, J. M.; Li, J.; Shaler, T. A.; Monforte, J. A.; Becker, C. H. Int J Legal Med 1999, 112, 45-49; Ross, P. L.; Belgrader, P. Anal Chem 1997, 69, 3966-3972). To obtain routinely the necessary mass accuracy and resolution using MALDI TOF MS, the amplicon size must be less than 100 bp, which often requires strategies such as enzymatic digestion and nested linear amplification. In the MALDI approach, PCR amplicons must be thoroughly desalted and co-crystallized with a suitable matrix prior to mass spectrometric analysis. The size reduction schemes and clean-up schemes employed for STR and SNP analyses in the cited reports resulted in the mass spectrometric analysis of only one strand of the PCR amplicon. By measuring the mass of only one strand of the amplicon, an unambiguous base composition may be difficult to determine and only the length of the allele may be obtained. Even with the size reduction schemes, mass measurement errors of 12 to 60 Daltons (Da) are observed for products in the size range 15000 to 25000 Da. This corresponds to mass measurement errors of the 800 to 2400 ppm. Because of poor mass accuracy and mass resolution typical of MALDI, multiplexing of STRs is difficult and not routine, although in one published report three STR loci were successfully multiplexed. The issue of allelic balance has not been addressed for MALDI-TOF-MS based assays.
U.S. Pat. Nos. 6,764,822 and 6,090,558 relate to methods for STR-typing using mass spectrometry (MS). Use of electrospray ionization (ESI)-MS to resolve STR alleles has been reported (Hannis and Muddiman, 2001, Rapid Commun. Mass. Spectrom. 15(5): 348-50; Hannis et. al, 2000, Advances in Nucleic Acid and Protein Analysis, Manipulation and Sequencing, 3926: 1017-2661). ESI-MS provides a platform capable of automated sample processing and analysis that can resolve sequence polymorphisms (Ecker et. al. (2006) JALA. 11:341-51).
Several groups have described detection of PCR products using high resolution electrospray ionization-Fourier transform-ion cyclotron resonance mass spectrometry (ESI-FT-ICR MS). Accurate measurement of exact mass combined with knowledge of the number of at least one nucleotide allowed calculation of the total base composition for PCR duplex products of approximately 100 base pairs. (Aaserud et al., J. Am. Soc. Mass Spec., 1996, 7, 1266-1269; Muddiman et al., Anal. Chem., 1997, 69, 1543-1549; Wunschel et al., Anal. Chem., 1998, 70, 1203-1207; Muddiman et al., Rev. Anal. Chem., 1998, 17, 1-68). Electrospray ionization-Fourier transform-ion cyclotron resistance (ESI-FT-ICR) MS may be used to determine the mass of double-stranded, 500 base-pair PCR products via the average molecular mass (Hurst et al., Rapid Commun. Mass Spec. 1996, 10, 377-382).
There is an unmet need for methods and compositions for analysis of DNA forensic markers that approach the level of resolution sequencing affords, that is capable of scanning a substantial amount of the variation contained within an amplified fragment, yet that is also rapid, amenable to automation, and provides relevant information without the burden of extensive manual data interpretation. Preferably, such a method would not require a priori knowledge of the potentially informative sites within a sample to carry out an analysis. Preferably, such methods would be able to provide substantial resolving capability for forensic analyses in cases of degraded DNA or with relatively low amounts of DNA, for example, by allowing resolution of sequence polymorphisms that may allow discrimination of equal or same-length alleles based on small differences in sequence or base composition.
The methods compositions and kits provided herein are directed to forensic analysis and identity testing based on using mass spectrometry to “weigh” DNA forensic markers with enough accuracy to yield an unambiguous base composition (i.e. the number of A's, G's, C's and T's) which in turn can be used to derive a DNA profile for an individual. Importantly, these base composition profiles can be referenced to existing forensics databases derived from STR or other forensic marker profiles. The present disclosure provides methods, primer pair compositions and kits that are capable of resolving human forensic DNA samples using STR loci based upon length and sequence polymorphisms, as measured by base composition, in a high throughput manner.
The present invention is directed to methods of forensic analysis of DNA. In some embodiments the methods comprise identity testing. In some embodiments they comprise STR-typing. The methods provided herein can be distinguished from conventional amplification based STR-typing. For example, the methods provided herein provide the ability to assign allele designations for STR loci based upon size as determined by mass. In addition, the methods provided herein can further resolve apparently similar alleles which differ only by one or more SNPs by deriving information from the loci nucleotide sequence as measured by mass or base composition uncovering additional alleles within the loci.
In some embodiments methods are provided for identifying a known STR allele or characterizing a previously unknown STR allele in a nucleic acid sample. A nucleic acid locus which includes the STR allele is selected and at least a portion of the locus is amplified using an oligonucleotide primer pair comprising a forward and a reverse primer, each between 13 and 40 nucleobases in length. An amplification product with a length of about 45 to about 200 nucleobases is thus generated. The amplification product duplicates the sequence of the known or unknown STR allele. The molecular mass of one or both strands of the amplification product is measured and the base composition of one or both of the strands is determined The base composition is then compared to a plurality of database-stored base compositions of strands of amplification products of known alleles of the locus. When a match is identified between the base composition and at least one of the database-stored base compositions of amplification products comprising the sequence of the STR allele produced with the primer pair, the allele is identified. Alternatively, when the comparison fails to identify a match between the base composition and at least one of the database-stored base compositions, a previously unknown STR allele is characterized. In a preferred embodiment, the locus is located on a human Y chromosome.
In some embodiments, the base composition of the previously unknown STR allele is added to the plurality of database stored base compositions. The base composition of the previously unknown STR allele may include a single nucleotide polymorphism relative to a known STR allele. The database-stored base compositions may include molecular masses which are calculated from theoretical amplification products of known sequences of known alleles and may also include measured molecular masses or actual amplification products of known sequences of known alleles or newly characterized alleles. Newly characterized alleles are, for example, alleles which have a SNP relative to a known allele.
In some embodiments, the step of measuring the molecular mass is performed by mass spectrometry, preferably ESI-TOF mass spectrometry.
In some embodiments, the forward primer and the reverse primer each comprise a thymidine reside at the 5′ end, thereby minimizing non-templated adenylation of the amplification product.
In another embodiment, the amplification is performed using deoxynucleotide triphosphates comprising 13C-enriched dGTP or a 13C-enriched analogue of dGTP. Preferably, this step is also performed using deoxynucleotide triphosphates comprising non-isotope enriched dCTP, dTTP and dATP.
In some embodiments, the locus is selected from the group consisting of DYS393, DYS19, DYS391, DYS385a/b, DYS390, DYS392, DYS437, DYS438, DYS439, DYS389I, and DYS389II.
In certain embodiments, the locus is DYS393. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 1:43, 63:54, 67:12, 62:64, 62:55, 33:31 and 34:30, wherein, with respect to pairs of sequence identifiers (X:Y) for primer pairs, the convention as defined herein is that the sequence identifier to the left of the colon (X:) represents the forward primer and the sequence identifier to the right of the colon (:Y) represents the reverse primer. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 63:54. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 63:54.
In some embodiments, the locus is DYS19. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 16:28, 51:17 and 45:60. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 51:17. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 51:17.
In some embodiments, the locus is DYS391. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence 30 identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 32:50, 19:13, 19:48, and 70:57. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 19:48. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 19:48.
In certain embodiments, the locus is DYS385a/b. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 10:27, 42:27, 10:35, 42:66 and 72:67. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 72:67. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 72:67.
In some embodiments, the locus is DYS390. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 59:20, 21:49, 59:49, 39:68 and 73:74. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 39:68. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 39:68.
In certain embodiments, the locus is DYS392. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 26:11, 53:29, 25:18, and 69:18. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 53:29. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 53:29.
In some embodiments, the locus is DYS437. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 65:44, 36:14, 8:14, 38:61, and 36:37. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 36:37. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 36:37.
In some embodiments, the locus is DYS438. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 7:56, 71:41, 22:6, and 71:9. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 22:6. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 22:6.
In some embodiments, the locus is DYS439. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 3:58, 2:40, 4:52, and 2:52. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 4:52. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 4:52.
In certain embodiments, the locus is DYS389I. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 23:15, and 23:5. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 23:5. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 23:5.
In some embodiments, the locus is DYS389II. In one aspect, it is preferred if each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with SEQ ID NO: 24:47. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs 24:47.
Another aspect is a purified oligonucleotide primer pair for identifying a known STR allele or characterizing a previously unknown STR allele in a nucleic acid sample. The primer pair is configured to produce an amplification product of at least a portion of an STR locus. The amplification product duplicates the sequence of the known STR allele or the previously unknown STR allele. Each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 16:28, 51:17, 45:60, 10:27, 42:27, 10:35, 24:46, 23:15, 23:5, 24:47, 59:20, 21:49, 59:49, 39:68, 32:50, 19:13, 19:48, 70:57, 26:11, 53:29, 25:18, 69:18, 1:43, 63:54, 67:12, 62:64, 65:44, 36:14, 8:14, 38:61, 36:37, 7:56, 71:41, 22:6, 71:9, 3:58, 2:40, 4:52, 2:52, 62:55, 33:31, 34:30, 73:74, 42:66 and 72:67.
At least one member of the primer pair may include a mass-modified nucleobase, a universal nucleobase, or a non-templated 5′-thymidine residue or any combination thereof.
In some embodiments, the primer pair is configured to produce an amplification product of at least a portion of an STR locus selected from the group consisting of DYS393, DYS19, DYS391, DYS385a/b, DYS390, DYS392, DYS437, DYS438, DYS439, DYS389I and DYS389II.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS393. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 1:43, 63:54, 67:12, 62:64, 62:55, 33:31 and 34:30. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 63:54. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 63:54.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS19. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 16:28, 51:17 and 45:60. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 51:17. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 51:17.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS391. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 32:50, 19:13, 19:48, and 70:57. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 19:48. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 19:48.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS391. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 10:27, 42:27, 10:35, 42:66 and 72:67. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 72:67. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 72:67.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS390. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 59:20, 21:49, 59:49, 39:68 and 73:74. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 39:68. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 39:68.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS437. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 65:44, 36:14, 8:14, 38:61, and 36:37. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 36:37. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 36:37.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS438. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 7:56, 71:41, 22:6, and 71:9. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 22:6. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 22:6.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS439. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 3:58, 2:40, 4:52, and 2:52. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of: SEQ ID NOs: 4:52. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 4:52.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS389I. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 23:15, and 23:5. In another aspect, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100sequence identity with a corresponding member of: SEQ ID NOs: 23:5. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 23:5.
In some embodiments, the locus from which the primer pair produces the amplification product is DYS389II. In one aspect of this embodiment, each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 24:47. In a preferred aspect, the primer pair is the primer pair of SEQ ID NOs: 24:47.
Another aspect is a kit which includes one or more purified oligonucleotide primer pairs for identifying a known STR allele or characterizing a previously unknown STR allele in a nucleic acid sample. The one or more primer pairs is configured to produce an amplification product of an STR locus. The amplification product duplicates the sequence of the known STR allele or the previously unknown STR allele. Each member of the one or more primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of one or more primer pairs selected from the group consisting of: SEQ ID NOs: 16:28, 51:17, 45:60, 10:27, 42:27, 10:35, 24:46, 23:15, 23:5, 24:47, 59:20, 21:49, 59:49, 39:68, 32:50, 19:13, 19:48, 70:57, 26:11, 53:29, 25:18, 69:18, 1:43, 63:54, 67:12, 62:64, 65:44, 36:14, 8:14, 38:61, 36:37, 7:56, 71:41, 22:6, 71:9, 3:58, 2:40, 4:52, 2:52, 62:55, 33:31, 34:30, 73:74, 42:66 and 72:67.
In one embodiment of the kit, one or more primer pairs are contained within the same reaction vessel, preferably a well of a 96-well plate. In some embodiments, the well includes five primer pairs and each member of the primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 23:5, 53:29, 19:48, 63:54 and 39:68. This kit may further include at least a first additional well which includes four primer pairs and each member of the primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 24:47, 22:6, 4:52 and 36:37. This kit may further include at least a second additional well comprising an additional primer pair. Each member of this additional primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 51:17. This kit may further include at least a third additional well comprising a primer pair. Each member of this primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 72:67.
In some embodiments, the kit includes deoxynucleotide triphosphates comprising: 13C-enriched dGTP, dTTP, dCTP and/or dATP. In an additional embodiment, the kits and methods described herein include or use all of the components to perform polymerase chain reaction (PCR). These components include, but are not limited to, deoxynucleotide triphosphates (dNTPs) for each nucleobase, a thermostable DNA polymerase and buffers useful in performing PCR.
In another embodiment, there is provided a method of identifying an individual. A DNA-containing sample is obtained from the individual and a plurality of STR alleles of the DNA is identified according to the methods described above. The plurality of STR alleles provides an allelic profile for the individual. The allelic profile of the individual is then compared with a plurality of database-stored allelic profiles of known individuals. A match between the allelic profile and a member of the plurality of database-stored allelic profiles identifies the individual. In some embodiments, a plurality of amplification products is produced in the same reaction vessel, preferably a 96-well plate.
In some embodiments of method of identifying an individual, the plurality of amplification products comprises five amplification products produced with five primer pairs. Preferably, each member of the five primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 23:5, 53:29, 19:48, 63:54 and 39:68. In an additional embodiment, the method includes producing four additional amplification products in at least one additional reaction vessel. The four additional amplification products are produced with four primer pairs. Preferably, each member of the four primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 24:47, 22:6, 4:52, and 36:37. In an additional embodiment, the method includes producing two additional amplification products in separate reaction vessels with two primer pairs. Preferably, each member of the two primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of SEQ ID NOs: 51:17 and 72:67.
In another embodiment, a system is provided which includes a mass spectrometer configured to detect one or more molecular masses of amplicons produced using at least one purified oligonucleotide primer pair that comprises forward and reverse primers. The forward and reverse primers comprise nucleic acid sequences independently having at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of a primer pair selected from the group consisting of: SEQ ID NOs: 16:28, 51:17, 45:60, 10:27, 42:27, 10:35, 24:46, 23:15, 23:5, 24:47, 59:20, 21:49, 59:49, 39:68, 32:50, 19:13, 19:48, 70:57, 26:11, 53:29, 25:18, 69:18, 1:43, 63:54, 67:12, 62:64, 65:44, 36:14, 8:14, 38:61, 36:37, 7:56, 71:41, 22:6, 71:9, 3:58, 2:40, 4:52, 2:52, 62:55, 33:31, 34:30, 73:74, 42:66 and 72:67. The system further includes a controller operably connected to the mass spectrometer. The controller is configured to correlate the molecular masses of the amplicons with an identity of a known STR allele. The controller is further configured to characterize a previously unknown molecular mass as representing a previously unknown STR allele.
In some embodiments, the controller is configured to determine base compositions of the amplicons from the molecular masses of the amplicons. The base compositions correspond to known STR alleles. In one aspect, the controller includes or is operably connected to a database of known molecular masses and/or known base compositions of amplicons of known STR alleles produced with the primer pair.
As used herein a “sample” refers to anything capable of being analyzed by the methods provided herein. In preferred embodiments, the sample comprises or is suspected one or more nucleic acids capable of analysis by the methods. Preferably, the samples comprise DNA. Samples can be forensic samples, which can include, for example, evidence from a crime scene, blood, blood stains, semen, semen stains, bone, teeth, hair saliva, urine, feces, fingernails, muscle tissue, cigarettes, stamps, envelopes, dandruff, fingerprints, and personal items. In some embodiments, me samples are mixture samples, when comprise nucleic acids from more than one subject or individual. In some embodiments, the methods provided herein comprise purifying the sample or purifying the nucleic acid(s) from the sample. In some embodiments, the sample is purified nucleic acid or DNA.
As used herein, “repeated DNA sequence,” “tandem repeat locus,” “tandem DNA repeat” and “satellite DNA” refer to repeated DNA sequences present in eukaryotic genomes. “VNTRs” (variable nucleotide tandem repeats) or “minisatellites” refer to medium sized repeat units that are about 10-100 linked nucleotides in length. The terms “short tandem repeat,” “STR”, “simple sequence repeats” “SSR” and “microsatellite” refer to tandem DNA repeat regions having core units of between 2-6 nucleotides in length. STRs are characterized by the number of nucleotides in the core repeat unit. Dinucleotide, trinucleotide, and tetranucleotide STRs represent STRs with core repeat units of 2, 3, and 4 respectively.
The term “STR locus,” (also known as “STR marker”) refers to a particular place on a chromosome where the region of short tandem repeats is located. Particular sequence variations (number of repeat units and sequence polymorphisms) found at an STR locus are called “STR alleles.” There are often several STR alleles for one STR locus within any given population. An individual can have more than one STR allele (one on each chromosome—maternal and paternal) for a given STR locus. Such an individual is said to be “heterozygous” at the particular STR locus. Individual variations of such loci are called alleles. An individual with identical alleles on both chromosomes is said to be “homozygous.” It is notable that, in context of Y-STRs (STRs located on the human Y chromosome which is found only in males) each human male will carry only one instance of the STR locus and therefore, characterization as homozygous or heterozygous is not applicable. For a particular STR locus, individuals in a population differ in the number of these core repeat units. Alleles at a particular STR locus can be said to be corresponding to that STR locus.
As used herein, “same-length STR alleles” or “same-length alleles” are used to refer to two or more alleles that share a common number of linked nucleotides or sequence length at the STR locus. Same-length alleles can differ in base composition or sequence. “Sequence length” refers to the number of linked nucleotides for a given nucleic acid, nucleic acid sequence or portion or region of such a sequence.
For certain STR loci, microvariant alleles have been identified that differ from common allele variants by one or more base pairs. These variations can be in the form of nucleotide insertion, deletion or nucleotide base changes. One such variation, “single nucleotide polymorphism” or “SNP” refers to a single nucleotide change compared with a reference sequence or common sequence. In some embodiments, the methods provided herein can discriminate alleles based on one or more SNPs, and can identify SNPs in STR loci.
A common nomenclature for STR loci and STR alleles developed by the International Society of Forensic Haemogenetics (ISFH) (Bar et al. Int. J. Legal Med. 1997, 107, 159-160). Alleles are named based on number of the core repeat unit. For example, an allele designated 12 for a particular STR locus would have 12 repeat units. Incomplete repeat units are designated with a decimal point following the whole number, for example, 12.2.
As used herein, “forensic DNA typing” refers to forensic methods for determining a genotype of any one or more loci of an individual, nucleic acid, sample, or evidence. “STR-typing” refers to forensic DNA typing or DNA typing using methods to determine genotype of one or more STR loci. STR-typing can be used for such purposes as forensics, identity testing, paternity testing, and other human identification means. Often, STR typing involves the amplification of multiple STR DNA loci that display a collection of alleles in the human population that differ in repeat number for each locus examined.
As used herein, “conventional STR-typing” or “standard STR-typing” refer to the most common available methods used for STR typing. Specifically, the terms “conventional amplification-based STR typing” and “standard amplification-based STR typing” refer to the most common methods where STR loci are identified by amplification and resolved by assigning allele designations based on size or sequence length. Often, the products of such amplification reactions are analyzed by electrophoresis using fluorescent detection methods, and subsequent discrimination among different alleles based on amplification product length. The methods provided herein can be distinguished from conventional amplification based STR-typing. For example, the methods provided herein provide the ability to assign allele designations for STR loci based upon size as determined by mass. In addition, the methods provided herein can further resolve apparently homozygous alleles by deriving information from the loci nucleotide sequence as measured by mass or base composition uncovering additional alleles within the loci. “Allele call” in STR-typing refers to a genotype, STR-type or particular allele identified by a STR-typing method for an individual, nucleic acid or sample.
As used herein, “primers,” “primer pairs” or “oligonucleotide primer pairs” are oligonucleotides that are designed to hybridize to conserved sequence regions within target nucleic acids, wherein the conserved sequence regions are conserved among two or more nucleic acids, alleles, or individuals. A primer pair is a pair of primers and thus comprises a forward and a reverse primer. In some embodiments, the conserved sequence regions (and thus the hybridized primers) flank an intervening variable nucleic acid region that varies among two or more alleles or individuals. Upon amplification, the primer pairs yield amplification products (also called amplicons) that comprise base composition variability between two or more individuals or nucleic acids. The variability of the base compositions allows for the identification of one or more individuals or a genotype of one or more individuals based on the amplicons and their base composition distinctions. In a preferred embodiment, primer pairs are designed to hybridize to regions that are directly adjacent to or nearly adjacent to the STR locus. It will be apparent, however, that some variations of the primers provided herein will serve to provide effective amplification of desired sequences. Such variations could include, for example, adding or deleting one or a few bases from the primer and/or shifting the position of the primer relative to the STR locus or variable region.
In some embodiments of the invention, the oligonucleotide primer pairs described herein can be purified. As used herein, “purified oligonucleotide primer pair,” “purified primer pair,” or “purified” means an oligonucleotide primer pair that is chemically-synthesized to have a specific sequence and a specific number of linked nucleosides. This term is meant to explicitly exclude nucleotides that are generated at random to yield a mixture of several compounds of the same length each with randomly generated sequence.
The primer pairs are designed to generate amplicons that are amenable to molecular mass analysis. Standard primer pair nomenclature is used herein, and includes naming of a reference sequence, hybridization coordinates, and other identifying information. For example, the forward primer for primer pair number 4578 is named DYS19_AC017019_RC—118941—118971_F. The reference sequence for this primer (referred to in the name) is the reverse complement of Gen Bank Accession Number: AC017019. The number range “118941—118971” indicates that the primer hybridizes to these nucleotide coordinates within the reference sequence. The “F” denotes that this particular primer is the forward primer of the pair. The “RC,” when present, indicates that the primer pair was designed using the reverse complement of the indicated GenBank sequence as the reference sequence. The beginning of the primer name refers to the locus, gene, or other nucleic acid region or feature to which the primer is targeted, and thus hybridizes within. The person skilled in the art will recognize that in order to design a primer pair which has a forward and a reverse primer which hybridize to opposite strands of a double stranded DNA in order to amplify the DNA, the forward primer is designed to hybridize to a sequence of a first strand while the reverse primer is designed to hybridize to the opposite strand. The information for designing the reverse primer is included in the first strand and is conveniently obtained by generating its “reverse complement.” Continuing with the example above, primer pair number 4578 has a forward primer (DYS19_AC017019-RC—118941—118971_F) which was designed to hybridize to a reference sequence represented by the reverse complement of GenBank Accession number AC017019 at a segment extending from position 118941 to 118971. Primer pair number 4578 has a reverse primer (DYS19_AC017019-RC—119096—119119_R) which is designed to hybridize to the reverse complement of the reference sequence at a segment extending from position 119096 to 119119. The primer names indicate that the primers are targeted to DYS19, a particular human STR locus. The primer pairs are selected and designed; however, to hybridize with two or more nucleic acids or nucleic acids from two or more individuals. So, the nomenclature used is merely to provide a reference sequence, and not to indicate that the primers hybridize with and generate an amplification product only from the reference sequence. Further, the sequences of the primer members of the primer pairs are not necessarily fully complementary to the conserved region of the reference sequence. Rather, the sequences are designed to be “best fit” amongst a plurality of nucleic acids at these conserved binding sequences. Therefore, the primer members of the primer pairs have substantial complementarity with the conserved regions of the nucleic acids, including the reference sequence nucleic acid.
As is used herein, the term “substantial complementarity means that a primer member or a primer pair comprises between about 70%-100%, or between about 80-100%, or between about 90-100%, or between about 95-100%, or between about 99-100% complementarity with the conserved binding sequence of a nucleic acid from an individual. Similarly, the primer pairs provided herein may comprise between about 70%-100%, or between about 80-100%, or between about 90-100%, or between about 95-100% identity, or between about 99-100% sequence identity with the primer pairs disclosed in Table 5 These ranges of complementarity and identity are inclusive of all whole or partial numbers embraced within the recited range numbers. For example, and not limitation, 75.667%, 82%, 91.2435% and 97% complementarity or sequence identity are all numbers that fall within the above recited range of 70% to 100%, therefore forming a part of this description. In some embodiments, any oligonucleotide primer pair may have one or both primers with less then 70% sequence homology with a corresponding member of any of the primer pairs of Table 5 if the primer pair has the capability of producing an amplification product corresponding to the desired STR-identifying amplicon.
In some embodiments, the oligonucleotide primers are 13 to 40 nucleobases in length (13 to 35 linked nucleotide residues). These embodiments comprise oligonucleotide primers 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 nucleobases in length, or any range therewithin. The present invention contemplates using both longer and shorter primers. Furthermore, the primers may also be linked to one or more other desired moieties, including, but not limited to, affinity groups, ligands, regions of nucleic acid that are not complementary to the nucleic acid to be amplified, labels, etc. In other embodiments, any oligonucleotide primer pair may have one or both primers with a length greater than 40 nucleobases if the primer pair has the capability of producing an amplification product corresponding to the desired STR-identifying amplicon.
As used herein, the term “variable region” is used to describe a region that, in some embodiments, falls between the conserved regions to which primer pairs described herein hybridize. The primers described herein can be designed such that, when hybridized to the target, they flank variable regions. Variable regions possess distinct base compositions between two or more individuals or alleles, such that at least two alleles, nucleic acids from at least two individuals, or at least two nucleic acids can be resolved from one another by determining the base composition of the amplicon generated by the primers that flank such a variable region when bound, or in other words bind to sequence regions that flank the variable region. In one embodiment, the variable region comprises an STR locus. In one aspect, the variable region comprises a distinct base composition among two or more amplicons generated from two distinct alleles that comprise the same number of nucleotides, and are thus the same length. In one aspect, the base composition of the variable region differs only in sequence, and not in length among two or more alleles.
As used herein, the term “amplicon” and “amplification product” refer to a nucleic acid generated or capable of generation using the primer pairs and methods described herein. In particular, “STR-identifying amplicons,” also called “STR-typing amplicons,” “STR-typing amplification products,” and “STR-identifying amplification products” are amplicons that can be used to determine the genotype (or identify the particular allele) for an individual nucleic acid at an STR locus. In some embodiments, the STR-typing amplicons are generated using in silico methods using electronic PCR and an electronic representation of primer pairs. The amplicons generated using in silico methods can be used to populate a database. The amplicon is preferably double stranded DNA; however, it can be RNA and/or DNA:RNA. The amplicon comprises the sequences of the conserved regions/primer pairs and the intervening variable region. As discussed herein, primer pairs are designed to generate amplicons from two or more alleles. The base composition of any given amplicon will include the primer pair, the complement of the primer pair, the conserved regions and the variable region from the nucleic acid that was amplified to generate the amplicon. One skilled in the art understands that the incorporation of the designed primer pair sequences into any amplicon will replace the native sequences at the primer binding site, and complement thereof. After amplification of the target region using the primers the resultant amplicons, including the primer sequences, generate the molecular mass data. Amplicons having any native sequences at the primer binding sites, or complement thereof, are undetectable because of their low abundance. Such is accounted for when identifying one or more nucleic acids from one or more alleles using any particular primer pair. The amplicon further comprises a length that is compatible with mass spectrometry analysis. STR-identifying amplicons (STR-typing amplicons) generate base composition signatures that are preferably unique to the identity of an STR allele.
Preferably, amplicons comprise from about 45 to about 200 consecutive nucleobases (i.e., from about 45 to about 200 linked nucleosides). One of ordinary skill in the art will appreciate that this range expressly embodies compounds of 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, and 200 nucleobases in length. One ordinarily skilled in the art will further appreciate that the above range is not an absolute limit to the length of an amplicon, but instead represents a preferred length range. Amplicons lengths falling outside of this range are also included herein so long as the amplicon is amenable to calculation of a base composition signature as herein described. As used herein, the term “about” means encompassing plus or minus 10%. For example, the term “about 200 nucleotides” refers to a range encompassing between 180 and 220 nucleotides.
As used herein, the term “molecular mass” refers to the mass of a compound as determined using mass spectrometry. Herein, the compound is preferably a nucleic acid, more preferably a double stranded nucleic acid, still more preferably a double stranded DNA nucleic acid and is most preferably an amplicon. When the nucleic acid is double stranded the molecular mass is determined for both strands. Here, the strands are separated either before introduction into the mass spectrometer, or the strands are separated by the mass spectrometer (for example, electro-spray ionization will separate the hybridized strands). The molecular mass of each strand is measured by the mass spectrometer.
As used herein, the term “base composition” refers to the number of each residue comprising an amplicon, without consideration for the linear arrangement of these residues in the strand(s) of the amplicon. The amplicon residues comprise, adenosine (A), guanosine (G), cytidine, (C), (deoxy)thymidine (T), uracil (U), inosine (I), nitroindoles such as 5-nitroindole or 3-nitropyrrole, dP or dK (Hill et al.), an acyclic nucleoside analog containing 5-nitroindazole (Van Aerschot et al., Nucleosides and Nucleotides, 1995, 14, 1053-1056), the purine analog 1-(2-deoxy-.beta.-D-ribofuranosyl)-imidazole-4-carboxamide, 2,6-diaminopurine, 5-propynyluracil, 5-propynylcytosine, phenoxazines, including G-clamp, 5-propynyl deoxy-cytidine, deoxy-thymidine nucleotides, 5-propynylcytidine, 5-propynyluridine and mass tag modified versions thereof, including 7-deaza-2′-deoxyadenosine-5-triphosphate, 5-iodo-2′-deoxyuridine-5′-triphosphate, 5-bromo-2′-deoxyuridine-5′-triphosphate, 5-bromo-2′-deoxycytidine triphosphate, 5-iodo-2′-deoxycytidine-5′-triphosphate, 5-hydroxy-2′-deoxyuridine-5′-triphosphate, 4-thiothymidine-5′-triphosphate, 5-aza-2′-deoxyuridine-5′-triphosphate, 5-fluoro-2′-deoxyuridine-5′-triphosphate, O6-methyl-2′-deoxyguanosine-5′-triphosphate, N2-methyl-2′-deoxyguanosine-5′-triphosphate, 8-oxo-2′-deoxyguanosine-5′-triphosphate or thiothymidine-5′-triphosphate. In some embodiments, the mass-modified nucleobase comprises 15.sup.N or 13.sup.C or both 15.sup.N and 13.sup.C. Preferably, the non-natural nucleosides used herein include 5-propynyluracil, 5-propynylcytosine and inosine. Herein the base composition for an unmodified DNA amplicon is notated as A.sub.wG.sub.xC.sub.yT.sub.z, wherein w, x, y and z are each independently a whole number representing the number of said nucleoside residues in an amplicon. Base compositions for amplicons comprising modified nucleosides are similarly notated to indicate the number of said natural and modified nucleosides in an amplicon. Base compositions are calculated from a molecular mass measurement of an amplicon, as described below. The calculated base composition for any given amplicon is then compared to a database of base compositions. In one embodiment, the database comprises base compositions of STR-typing amplicons. A match between the calculated base composition and a single database entry reveals the identity of the target nucleic acid or a genotype of an individual.
As is used herein, the term “base composition signature” refers to the base composition generated by any one particular amplicon.
As used herein, the term “database” is used to refer to a collection of base composition or molecular mass data. The base composition and/or molecular mass data in the database is indexed to specific individuals (subjects), alleles, or reference alleles and also to specific STR-identifying amplicons and primer pairs. In one embodiment, the data are indexed to particular STR loci. As used herein, a “reference allele” is an allele comprised in a database that has been previously determined to have a certain base composition, length, molecular mass, size and/or genotype. The reference allele may be indexed to primer pairs and amplicons provided herein. The base composition data reported in the database comprises the number of each nucleoside in an amplicon that would be generated for each allele or individual using each primer. The database can be populated by empirical data. In this aspect of populating the database, a nucleic acid with a particular allele or from a particular individual is selected and a primer pair is used to generate an amplicon. The molecular mass of the amplicon is determined using a mass spectrometer and the base composition calculated therefrom. An entry in the database is made to associate the base composition with the allele or individual and the primer pair used. The database may also be populated using other databases comprising allele or individual nucleic acid information. For example, using the GenBank database it is possible to perform electronic PCR using an electronic representation of a primer pair. Databases can be populated from other databases, such as FBI databases. This in silico method will provide the base composition for any or all selected allele(s) and/or individuals stored in the database. The information is then used to populate the base composition database as described above. A base composition database can be in silico, a written table, a reference book, a spreadsheet or any form generally amenable to databases. Preferably, it is in silico.
As used herein, the term “nucleobase” is synonymous with other terms in use in the art including “nucleotide,” “deoxynucleotide,” “nucleotide residue,” “deoxynucleotide residue,” “nucleotide triphosphate (NTP),” or deoxynucleotide triphosphate (dNTP). As is used herein, a nucleobase includes natural and modified residues, as described herein.
As used herein, a “wobble base” is a variation in a codon found at the third nucleotide position of a DNA triplet. Variations in conserved regions of sequence are often found at the third nucleotide position due to redundancy in the amino acid code.
The terms “homology,” “homologous” and “sequence identity” refer to a degree of identity. There may be partial homology or complete homology. A partially homologous sequence is one that is less than 100% identical to another sequence. Determination of sequence identity is described in the following example: a primer 20 nucleobases in length which is otherwise identical to another 20 nucleobase primer but having two non-identical residues has 18 of 20 identical residues (18/20=0.9 or 90% sequence identity). In another example, a primer 15 nucleobases in length having all residues identical to a 15 nucleobase segment of a primer 20 nucleobases in length would have 15/20=0.75 or 75% sequence identity with the 20 nucleobase primer. In context of the present invention, sequence identity is meant to be properly determined when the query sequence and the subject sequence are both described and aligned in the 5′ to 3′ direction. Sequence alignment algorithms such as BLAST, will return results in two different alignment orientations. In the Plus/Plus orientation, both the query sequence and the subject sequence are aligned in the 5′ to 3′ direction. On the other hand, in the Plus/Minus orientation, the query sequence is in the 5′ to 3′ direction while the subject sequence is in the 3′ to 5′ direction. It should be understood that with respect to the primers of the present invention, sequence identity is properly determined when the alignment is designated as Plus/Plus. Sequence identity may also encompass alternate or “modified” nucleobases that perform in a functionally similar manner to the regular nucleobases adenine, thymine, guanine and cytosine with respect to hybridization and primer extension in amplification reactions. In a non-limiting example, if the 5-propynyl pyrimidines propyne C and/or propyne T replace one or more C or T residues in one primer which is otherwise identical to another primer in sequence and length, the two primers will have 100% sequence identity with each other. In another non-limiting example, Inosine (I) may be used as a replacement for G or T and effectively hybridize to C, A or U (uracil). Thus, if inosine replaces one or more G or T residues in one primer which is otherwise identical to another primer in sequence and length, the two primers will have 100% sequence identity with each other. Other such modified or universal bases may exist which would perform in a functionally similar manner for hybridization and amplification reactions and will be understood to fall within this definition of sequence identity.
As used herein, “triangulation identification” means the employment of more than one primer pair, two or more primer pairs, three or more primer pairs, or a plurality of primer pairs to generate amplicons necessary for the identification or typing of a nucleic acid or individual. The more than one primer pair can be used in individual wells or in a multiplex PCR assay. In a “multiplex” assay, the methods provided herein are performed with two or more primer pairs simultaneously. Alternatively, a PCR reaction may be carried out in single wells comprising a different primer pair in each well. Following amplification the amplicons are pooled into a single well or container which is then subjected to molecular mass analysis. The combination of pooled amplicons can be chosen such that the expected ranges of molecular masses of individual amplicons are not overlapping and thus will not complicate identification of signals. Triangulation works as a process of elimination, wherein a first primer pair identifies that an unknown allele may be one of a group of alleles. Subsequent primer pairs are used in triangulation identification to further refine the identity of the allele amongst the subset of possibilities generated with the earlier primer pair. Triangulation identification is complete when the identity of the allele is determined. The triangulation identification process is also used to reduce false negative and false positive signals. Alternatively, if more than one primer pair are used in a multiplex assay, the combination of amplicons are generated simultaneously and can be analyzed simultaneously, comparing the multiple resultant molecular masses or base compositions to multiple amplicons in a database that are indexed to the different primer pairs used in the multiplex assay.
Provided herein are methods and compositions directed to unbiased forensic analysis and identity testing including STR typing of samples comprising nucleic acids using amplicons and ESI-MS to determine mass and base composition. The methods herein provide substantial accuracy to yield an unambiguous base composition (i.e. the number of A's, G's, C's and T's) which in turn can be used to derive a DNA profile for an individual. Importantly, these base composition profiles can be referenced to existing forensics databases derived from STR or other forensic marker profiles and/or can be added to such databases. Because the methods use molecular mass and base compositions to derive specific alleles, the methods and compositions provided herein are capable of detecting SNPs within STR regions that go undetected by conventional electrophoretic STR-typing analyses. For example, all instances of “allele type 18” for the DYS389II STR locus are not equivalent. A particular individual may contain an A to G (A→G) SNP, which distinguishes this individual from individuals containing the normal allele type 13 (see for example, sample JT51471 in the first row of Table 9A). Such an example of a SNP within an STR locus would not be expected to be detected by standard STR-typing methods and kits that use electrophoretic size discrimination to resolve STR alleles.
In a preferred embodiment, the amplicons are STR-identifying amplicons or STR-identifying amplification products. In this embodiment, primers are selected to hybridize to conserved sequence regions of nucleic acids, which flank a variable nucleic acid sequence region, derived from the samples to yield an STR-typing amplicon that can be amplified and is amenable to molecular mass determination. A base composition is calculated from the molecular mass, which indicates the number of each nucleotide in the amplicon. The molecular mass or corresponding base composition or base composition signature of the amplicon is then compared to a database comprising molecular masses or base composition signatures that are indexed to alleles and/or individuals and the primer pair that was used to generate the amplicon. A match of the determined molecular mass or calculated base composition to a molecular mass or base composition in the database associates the nucleic acid from the sample with an allele or individual indexed in the database. In some cases, the nucleic acid from the sample or a particular allele associates with more than one individual or identity. In these cases, one or more additional primer pairs are used either subsequently or simultaneously to generate one or more additional amplicons. The mass and base composition of the one or more additional amplicons are determined/calculated and the methods provided herein are used to compare the results to a database and further characterize and preferably identity the sample. This type of analysis can be carried out as described herein using triangulation, or using multiplex assays. The present method provides rapid throughput analysis and does not require nucleic acid sequencing for identification of nucleic acids from samples.
In one embodiment, the method is carried out with two or more primer pairs in a multiplex reaction. In one aspect, when the method is carried out in a multiplex reaction, it may be advantageous to use PCR reagents with high magnesium concentrations, for example, 3 mM magnesium chloride. As is known in the art, such reagents favor adenylation of amplification products. In one embodiment, it is advantageous to minimize split-peak results that can occur when there is adenylation of only a fraction of the amplification products in the sample, for example, generation of a fraction of the amplification products with a slightly different length than other products. Thus, in a preferred aspect, it is desired to promote full or about full adenylation. In one aspect, the primer pairs are configured so as to promote full adenylation such that one or both of the forward and reverse primer comprises a C or a G nucleobase at the 5′ end. Temperatures in the cycle reaction may also be adjusted to promote full adenylation while retaining efficacy, for example, by using an annealing temperature of about 61 degrees C.
In some embodiments, amplicons amenable to molecular mass determination which are produced by the primers described herein are either of a length, size or mass compatible with the particular mode of molecular mass determination or compatible with a means of providing a predictable fragmentation pattern in order to obtain predictable fragments of a length compatible with the particular mode of molecular mass determination. Such means of providing a predictable fragmentation pattern of an amplicon include, but are not limited to, cleavage with restriction enzymes or cleavage primers, for example. Thus, in some embodiments, amplicons are larger than 200 nucleobases and are amenable to molecular mass determination following restriction digestion. Methods of using restriction enzymes and cleavage primers are well known to those with ordinary skill in the art.
In some embodiments, amplicons are obtained using the polymerase chain reaction (PCR) which is a routine method to those with ordinary skill in the molecular biology arts. In some embodiments, the PCR is accomplished by using the polymerase chain reaction and a polymerase chain reaction is catalyzed by a polymerase enzyme whose function is modified relative to a native polymerase. In some embodiments the modified polymerase enzyme is exo(−) Pfu polymerase which catalyzes the addition of nucleotide residues to staggered restriction digest products to convert the staggered digest products to blunt-ended digest products. Other amplification methods may be used such as ligase chain reaction (LCR), low-stringency single primer PCR, and multiple strand displacement amplification (SDA). These methods are also known to those with ordinary skill. (Michael, S F., Biotechniques 1994, 16, 411-412 and Dean et al., Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 5261-5266).
Mass spectrometry (MS)-based detection of PCR products provides a means for determination of BCS which has several advantages. MS is intrinsically a parallel detection scheme without the need for radioactive or fluorescent labels, since every amplification product is identified by its molecular mass. The current state of the art in mass spectrometry is such that less than femtomole quantities of material can be readily analyzed to afford information about the molecular contents of the sample. An accurate assessment of the molecular mass of the material can be quickly obtained, irrespective of whether the molecular weight of the sample is several hundred, or in excess of one hundred thousand atomic mass units (amu) or Daltons. Intact molecular ions can be generated from amplification products using one of a variety of ionization techniques to convert the sample to gas phase. These ionization methods include, but are not limited to, electrospray ionization (ES), matrix-assisted laser desorption ionization (MALDI) and fast atom bombardment (FAB). For example, MALDI of nucleic acids, along with examples of matrices for use in MALDI of nucleic acids, are described in WO 98/54751. The accurate measurement of molecular mass for large DNAs is limited by the adduction of cations from the PCR reaction to each strand, resolution of the isotopic peaks from natural abundance .sup.13C and .sup.15N isotopes, and assignment of the charge state for any ion. The cations are removed by in-line dialysis using a flow-through chip that brings the solution containing the PCR products into contact with a solution containing ammonium acetate in the presence of an electric field gradient orthogonal to the flow. The latter two problems are addressed by operating with a resolving power of >100,000 and by incorporating isotopically depleted nucleotide triphosphates into the DNA. The resolving power of the instrument is also a consideration. At a resolving power of 10,000, the modeled signal from the [M-14H+].sup.14-charge state of an 84-mer PCR product is poorly characterized and assignment of the charge state or exact mass is impossible. At a resolving power of 33,000, the peaks from the individual isotopic components are visible. At a resolving power of 100,000, the isotopic peaks are resolved to the baseline and assignment of the charge state for the ion is straightforward. The [.sup.13C, .sup.15N]-depleted triphosphates are obtained, for example, by growing microorganisms on depleted media and harvesting the nucleotides (Batey et al., Nucl. Acids Res., 1992, 20, 4515-4523).
While mass measurements of intact nucleic acid regions are believed to be adequate, tandem mass spectrometry (MS.sup.n) techniques may provide more definitive information pertaining to molecular identity or sequence. Tandem MS involves the coupled use of two or more stages of mass analysis where both the separation and detection steps are based on mass spectrometry. The first stage is used to select an ion or component of a sample from which further structural information is to be obtained. The selected ion is then fragmented using, e.g., blackbody irradiation, infrared multiphoton dissociation, or collisional activation. For example, ions generated by electrospray ionization (ESI) can be fragmented using IR multiphoton dissociation. This activation leads to dissociation of glycosidic bonds and the phosphate backbone, producing two series of fragment ions, called the w-series (having an intact 3′ terminus and a 5′ phosphate following internal cleavage) and the a-Base series (having an intact 5′ terminus and a 3′ furan).
The second stage of mass analysis is then used to detect and measure the mass of these resulting fragments of product ions. Such ion selection followed by fragmentation routines can be performed multiple times so as to essentially completely dissect the molecular sequence of a sample.
If there are two or more targets of similar molecular mass, or if a single amplification reaction results in a product which has the same mass as two or more reference standards, they can be distinguished by using mass-modifying “tags.” Such an oligonucleotide is said to be mass-modified. In this embodiment, a nucleotide analog or “tag” is incorporated during amplification (e.g., a 5-(trifluoromethyl)deoxythymidine triphosphate) which has a different molecular weight than the unmodified base so as to improve distinction of masses. Such tags are described in, for example, WO 97/33000, which is incorporated herein by reference in its entirety. This further limits the number of possible base compositions consistent with any mass. For example, 5-(trifluoromethyl)deoxythymidine triphosphate can be used in place of dTTP in a separate nucleic acid amplification reaction. Measurement of the mass shift between a conventional amplification product and the tagged product is used to quantitate the number of thymidine nucleotides in each of the single strands. Because the strands are complementary, the number of adenosine nucleotides in each strand is also determined.
In contrast the mass tag approach, in a preferred embodiment mass-modified dNTPs are employed to further limit the number of base pair combinations and also to resolve SNPs that are not resolvable when using unmodified dNTPs.
In another amplification reaction, the number of G and C residues in each strand is determined using, for example, the cytidine analog 5-methylcytosine (5-meC) or 5-prolynylcytosine (propyne C). The combination of the A/T reaction and G/C reaction, followed by molecular weight determination, provides a unique base composition. This method is summarized in Table 1.
In the example shown in Table 1, the mass tag phosphorothioate A (A*) was used to distinguish a Bacillus anthracis cluster. The B. anthracis (A14G9C14T9) had an average MW of 14072.26, and the B. anthracis (AiA*13G9C14T9) had an average molecular weight of 14281.11 and the phosphorothioate A an average molecular weight of +16.06 as determined by ESI-TOF MS.
In another example, assume the measured molecular masses of each strand are 30,000.115 Da and 31,000.115 Da respectively, and the measured number of dT and dA residues are (30, 28) and (28, 30). If the molecular mass is accurate to 100 ppm, there are 7 possible combinations of dG+dC possible for each strand. However, if the measured molecular mass is accurate to 10 ppm, there are only 2 combinations of dG+dC, and at 1 ppm accuracy there is only one possible base composition for each strand.
Signals from the mass spectrometer may be input to a maximum-likelihood detection and classification algorithm such as is widely used in radar signal processing. Processing may end with a Bayesian classifier using log likelihood ratios developed from the observed signals and average background levels. Background signal strengths are estimated and used along with the matched filters to form signatures which are then subtracted. The maximum likelihood process is applied to this “cleaned up” data in a similar manner employing matched filters and a running-sum estimate of the noise-covariance for the cleaned up data.
In some embodiments, the DNA analyzed is human DNA obtained from forensic samples, for example, human saliva, hair, blood, or nail.
Embodiments provided herein comprise primer pairs which are designed to bind to highly conserved sequence regions of DNA. In some embodiments, the conserved sequence regions flank an intervening variable region such as the variable sections found within regions STRs and yield amplification products which ideally provide enough variability to provide a forensic conclusion, and which are amenable to molecular mass analysis. By the term “highly conserved,” it is meant that the sequence regions exhibit from about 80 to 100%, or from about 90 to 100%, or from about 95 to 100% identity, or from about 80 to 99%, or from about 90 to 99%, or from about 95 to 99% identity. The molecular mass of a given amplification product provides a means of drawing a forensic conclusion due to the variability of the variable region. Thus, design of primers involves selection of a variable section with optimal variability in the DNA of different individuals.
The primer pairs are configured to produce an amplification product of an STR locus. The amplification product duplicates the sequence of the known STR allele or the previously unknown STR allele. Each member of the one or more primer pairs has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with a corresponding member of one or more primer pairs selected from the group consisting of: SEQ ID NOs: 16:28, 51:17, 45:60, 10:27, 42:27, 10:35, 24:46, 23:15, 23:5, 24:47, 59:20, 21:49, 59:49, 39:68, 32:50, 19:13, 19:48, 70:57, 26:11, 53:29, 25:18, 69:18, 1:43, 63:54, 67:12, 62:64, 65:44, 36:14, 8:14, 38:61, 36:37, 7:56, 71:41, 22:6, 71:9, 3:58, 2:40, 4:52, 2:52, 62:55, 33:31, 34:30, 73:74, 42:66 and 72:67. In some embodiments, the STR locus is a Y-STR locus (located on a human Y chromosome).
In some embodiments, the conserved sequence region of DNA to which the primer pairs hybridize flank STR loci. Preferably, the STR loci are in a group of core “DYS” loci which include but are not limited to DYS393, DYS19, DYS391, DYS385a/b, DYS390, DYS392, DYS437, DYS438, DYS439, DYS389I, and DYS389II.
In one embodiment, the STR locus comprises DYS393. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 1:43, 63:54, 67:12, 62:64, 62:55, 33:31 and 34:30.
In one embodiment, the STR locus comprises DYS19. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 16:28, 51:17 and 45:60.
In one embodiment, the STR locus comprises DYS391. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 32:50, 19:13, 19:48, and 70:57.
In one embodiment, the STR locus comprises DYS385a/b. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identit with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 10:27, 42:27, 10:35, 42:66 and 72:67.
In one embodiment, the STR locus comprises DYS390. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 59:20, 21:49, 59:49, 39:68 and 73:74.
In one embodiment, the STR locus comprises DYS392. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 26:11, 53:29, 25:18, and 69:18.
In one embodiment, the STR locus comprises DYS437. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identit with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 65:44, 36:14, 8:14, 38:61, and 36:37.
In one embodiment, the STR locus comprises DYS438. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 7:56, 71:41, 22:6, and 71:9.
In one embodiment, the STR locus comprises DYS439. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by any one or more of the following SEQ ID NOs: 3:58, 2:40, 4:52, and 2:52.
In one embodiment, the STR locus comprises DYS389I. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of a primer pair represented by one or both of SEQ ID NOs: 23:15, and 23:5.
In one embodiment, the STR locus comprises DYS389II. In one aspect each member of the primer pair has at least 70%, at least 80%, at least 90%, at least 95% or at least 100% sequence identity with the sequence of the corresponding member of the primer pair represented by SEQ ID NOs: 24:47
In another embodiment, the primer pairs are combined and used in one or more multiplex reactions to generate an allelic profile for a sample obtained from an individual with the objective of identifying the individual. One aspect of this multiplex embodiment is configured to analyze 11 loci in four separate reactions comprising a five-plex reaction, a four-plex reaction and two single-plex reactions.
One aspect of this embodiment is configured, for example, with primer pairs targeting DYS389I, DYS392, DYS391, DYS393 and DYS390 in a five-plex reaction; primer pairs targeting DYS389II, DYS438, DYS439 and DYS437 in a four-plex reaction; a primer pair targeting DYS19 in a first single-plex reaction; and a primer pair targeting DYS385a/b in a second single-plex reaction. In this embodiment, 24 samples may be analyzed on a single 96-well plate which also includes four positive and four negative PCR control wells.
Ideally, primer hybridization sites are highly conserved in order to facilitate the hybridization of the primer. In cases where primer hybridization is less efficient due to lower levels of conservation of sequence, the primers provided herein can be chemically modified to improve the efficiency of hybridization. For example, because any variation (due to codon wobble in the 3rd position) in these conserved regions among species is likely to occur in the third position of a DNA triplet, oligonucleotide primers can be designed such that the nucleotide corresponding to this position is a base which can bind to more than one nucleotide, referred to herein as a “universal base.” For example, under this “wobble” pairing, inosine (I) binds to U, C or A; guanine (G) binds to U or C, and uridine (U) binds to U or C. Other examples of universal bases include nitroindoles such as 5-nitroindole or 3-nitropyrrole (Loakes et al., Nucleosides and Nucleotides, 1995, 14, 1001-1003), the degenerate nucleotides dP or dK (Hill et al.), an acyclic nucleoside analog containing 5-nitroindazole (Van Aerschot et al., Nucleosides and Nucleotides, 1995, 14, 1053-1056) or the purine analog 1-(2-deoxy-beta-D-ribofuranosyl)-imidazole-4-carboxamide (Sala et al., Nucl. Acids Res., 1996, 24, 3302-3306).
In another embodiment, to compensate for the somewhat weaker binding by the “wobble” base, the oligonucleotide primers are designed such that the first and second positions of each triplet are occupied by nucleotide analogs which bind with greater affinity than the unmodified nucleotide. Examples of these analogs include, but are not limited to, 2,6-diaminopurine which binds to thymine, propyne T (5-propynyluridine) which binds to adenine and propyne C (5-propynylcytidine) and phenoxazines, including G-clamp, which binds to G. Propynylated pyrimidines are described in U.S. Pat. Nos. 5,645,985, 5,830,653 and 5,484,908, each of which is commonly owned and incorporated herein by reference in its entirety. Propynylated primers are claimed in U.S. Ser. No. 10/294,203 which is also commonly owned and incorporated herein by reference in entirety. Phenoxazines are described in U.S. Pat. Nos. 5,502,177, 5,763,588, and 6,005,096, each of which is incorporated herein by reference in its entirety. G-clamps are described in U.S. Pat. Nos. 6,007,992 and 6,028,183, each of which is incorporated herein by reference in its entirety. Thus, in other embodiments, the primer pair has at least one modified nucleobase such as 5-propynylcytidine or 5-propynyluridine.
Also provided herein are isolated DNA amplicons which are produced by the process of amplification of a sample of DNA with any of the above-mentioned primers.
While the methods compounds and compositions provided herein have been described with specificity in accordance with certain of its embodiments, the following examples serve only to illustrate the invention and are not intended to limit the same. The examples provided are only examples, and one skilled in the art will understand that other techniques can be used by those skilled in the art and such different techniques will not depart from the spirit of the invention (T. Maniatis et al., in Molecular Cloning. A. Laboratory Manual. CSH Lab. N.Y. (2001).
General Genomic DNA Sample Prep Protocol: Raw samples were filtered using Supor-200 0.2 μm membrane syringe filters (VWR International). Samples were transferred to 1.5 ml Eppendorf tubes pre-filled with 0.45 g of 0 7 mm Zirconia beads followed by the addition of 350 μl of ATL buffer (Qiagen, Valencia, Calif.). The samples were subjected to bead beating for 10 minutes at a frequency of 19 l/s in a Retsch Vibration Mill (Retsch). After centrifugation, samples were transferred to an S-block plate (Qiagen, Valencia, Calif.) and DNA isolation was completed with a BioRobot 8000 nucleic acid isolation robot (Qiagen, Valencia, Calif.).
Isolation of Blood DNA—Blood DNA was isolated using an MDx Biorobot according to according to the manufacturer's recommended procedure (Isolation of blood DNA on Qiagen QIAamp® DNA Blood BioRobot® MDx Kit, Qiagen, Valencia, Calif.). In some cases, DNA from blood punches were processed with a Qiagen QIAmp DNA mini kit using the manufacturer's suggested protocol for dried blood spots.
Isolation of Buccal Swab DNA—Since the manufacturer does not support a full robotic swab protocol, the blood DNA isolation protocol was employed after each swab was first suspended in 400 ml PBS+400 ml Qiagen AL buffer+20 μl Qiagen Protease solution in 14 ml round-bottom falcon tubes, which were then loaded into the tube holders on the MDx robot.
Isolation of DNA from Nails and Hairs—The following procedure employs a Qiagen DNeasy® tissue kit and represents a modification of the manufacturer's suggested procedure: hairs or nails were cut into small segments with sterile scissors or razorblades and placed in a centrifuge tube to which was added 1 ml of sonication wash buffer (10 mM TRIS-Cl, pH 8.0+10 mM EDTA+0.5% Tween-20. The solution was sonicated for 20 minutes to dislodge debris and then washed 2× with 1 ml ultrapure double deionized water before addition of 100 μl of Buffer X1 (10 mM TRIS-Cl, ph 8.0+10 mM EDTA+100 mM NaCl+40 mM DTT+2% SDS+250:g/ml Qiagen proteinase K). The sample was then incubated at 55° C. for 1-2 hours, after which 200 μl of Qiagen AL buffer and 210 μl isopropanol were solution was mixed by vortexing. The sample was then added to a Qiagen DNeasy mini spin column placed in a 2 ml collection tube and centrifuged for 1 min at 6000 g (8000 rpm). Collection tube and flow-through were discarded. The spin column was transferred to a new collection tube and 500 μl of buffer AW2 was added before centrifuging for 3 min. at 20,000 g (14,000 rpm) to dry the membrane. For elution, 50-100 μl of buffer AE was pipetted directly onto the DNeasy membrane and eluted by centrifugation (6000 g-8000 rpm) after incubation at room temperature for 1 min.
Amplification by PCR—An exemplary PCR procedure for amplification of DNA is the following: A 50 μl total volume reaction mixture contained 1× GenAmp® PCR buffer II (Applied Biosystems)—10 mM TRIS-Cl, pH 8.3 and 50 mM KCl, 1.5 mM MgCl2, 400 mM betaine, 200 μM of each dNTP (Stratagene 200415), 250 nM of each primer, and 2.5-5 units of Pfu exo(−) polymerase Gold (Stratagene 600163) and at least 50 pg of template DNA. All PCR solution mixing was performed under a HEPA-filtered positive pressure PCR hood. An example of a programmable PCR cycling profile is as follows: 95° C. for 10 minutes, followed by 8 cycles of 95° C. for 20 sec, 62° C. for 20 sec, and 72° for 30 sec—wherein the 62° C. annealing step is decreased by 1° C. on each successive cycle of the 8 cycles, followed by 28 cycles of 95° C. for 20 sec, 55° C. for 20 sec, and 72° C. for 30 sec, followed by holding at 4° C. For multiplex reactions, in a preferred embodiment, PCR is carried out using 1 the Qiagen Multiplex PCR kit and buffers therein (Qiagen, Valencia, Calif.), which comprises 3 mM MgCl2. 1 ng template DNA and 200 mM of each primer are used for a 40 μL reaction volume. The cycle conditions for an exemplary multiplex reaction are:
Development and optimization of PCR reactions is routine to one with ordinary skill in the art and can be accomplished without undue experimentation.
Procedure for Semi-automated Purification of a PCR mixture using Commercially Available ZipTips®—As described by Jiang and Hofstadler (Y. Jiang and S. A. Hofstadler Anal. Biochem. 2003, 316, 50-57) an amplified nucleic acid mixture can be purified by commercially available pipette tips containing anion exchange resin. For pre-treatment of ZipTips® AX (Millipore Corp. Bedford, Mass.), the following steps were programmed to be performed by an Evolution™ P3 liquid handler (Perkin Elmer) with fluids being drawn from stock solutions in individual wells of a 96-well plate (Marshall Bioscience): loading of a rack of ZipTips®AX; washing of ZipTips®AX with 15 μl of 10% NH4OH/50% methanol; washing of ZipTips® AX with 15 μl of water 8 times; washing of ZipTips® AX with 15 μl of 100 mM NH4OAc.
For purification of a PCR mixture, 20 μl of crude PCR product was transferred to individual wells of a MJ Research plate using a BioHit (Helsinki, Finland) multichannel pipette. Individual wells of a 96-well plate were filled with 300 μl of 40 mM NH4HCO3. Individual wells of a 96-well plate were filled with 300 μl of 20% methanol. An MJ research plate was filled with 10 μl of 4% NH4OH. Two reservoirs were filled with deionized water. All plates and reservoirs were placed on the deck of the Evolution P3 (EP3) (Perkin-Elmer, Boston, Mass.) pipetting station in pre-arranged order. The following steps were programmed to be performed by an Evolution P3 pipetting station: aspiration of 20 μl of air into the EP3 P50 head; loading of a pre-treated rack of ZipTips® AX into the EP3 P50 head; dispensation of the 20 μl NH4HCO3 from the ZipTips® AX; loading of the PCR product into the ZipTips® AX by aspiration/dispensation of the PCR solution 18 times; washing of the ZipTips® AX containing bound nucleic acids with 15 μl of 40 mM NH4 HCO3 8 times; washing of the ZipTips® AX containing bound nucleic acids with 15 μl of 20% methanol 24 times; elution of the purified nucleic acids from the ZipTips® AX by aspiration/dispensation with 15 μl of 4% NH4OH 18 times. For final preparation for analysis by ESI-MS, each sample was diluted 1:1 by volume with 70% methanol containing 50 mM piperidine and 50 mM imidazole.
Solution Capture Purification of PCR products for Mass Spectrometry with Ion-Exchange Resin-Magnetic Beads—The following procedure is disclosed in published U.S. Patent application US2005-0130196, filed on Sep. 17, 2004, which is commonly owned and incorporated herein by reference. For solution capture of nucleic acids with ion exchange resin linked to magnetic beads, 25 microliters of a 2.5 mg/mL suspension of BioClone amine-terminated supraparamagnetic beads are added to 25 to 50 microliters of a PCR or RT-PCR reaction containing approximately 10 pM of a typical PCR amplification product. The suspension is mixed for approximately 5 minutes by vortexing, pipetting or shaking, after which the liquid is removed following use of a magnetic separator to separate magnetic beads. The magnetic beads containing the amplification product are then washed 3 times with 50 mM ammonium bicarbonate/50% methanol or 100 mM ammonium bicarbonate/50% methanol, followed by three additional washes with 50% methanol. The bound PCR amplicon is eluted with electrospray-compatible elution buffer comprising 25 mM piperidine, 25 mM imidazole, 35% methanol, which can also comprise calibration standards. Steps of this procedure can be performed in multi-well plates and using a liquid handler, for example the Evolution™ P3 liquid handler and/or under the control of a robotic arm. The eluted nucleic acids in this condition are amenable to analysis by ESI-MS. The time required for purification of samples in a single 96-well plate using a liquid handler is approximately five minutes.
The ESI-FTICR mass spectrometer used is a Bruker Daltonics (Billerica, Mass.) Apex II 70e electrospray ionization Fourier transform ion cyclotron resonance mass spectrometer (ESI-FTICR-MS) that employs an actively shielded 7 Tesla superconducting magnet. The active shielding constrains the majority of the fringing magnetic field from the superconducting magnet to a relatively small volume. Thus, components that might be adversely affected by stray magnetic fields, such as CRT monitors, robotic components, and other electronics can operate in close proximity to the ESI-FTICR mass spectrometer. All aspects of pulse sequence control and data acquisition are performed on a 1.1 GHz Pentium II data station miming Bruker's Xmass software. 20 μL sample aliquots are extracted directly from 96-well microtiter plates using a CTC HTS PAL autosampler (LEAP Technologies, Carrboro, N.C.) triggered by the data station. Samples are injected directly into the ESI source at a flow rate of 75 μL/hr. Ions are formed via electrospray ionization in a modified Analytica (Bradford Conn.) source employing an off axis, grounded electrospray probe positioned ca. 1.5 cm from the metalized terminus of a glass desolvation capillary. The atmospheric pressure end of the glass capillary is biased at 6000 V relative to the ESI needle during data acquisition. A counter-current flow of dry N2/O2 is employed to assist in the desolvation process. Ions are accumulated in an external ion reservoir comprised of an rf-only hexapole, a skimmer cone, and an auxiliary gate electrode, prior to injection into the trapped ion cell where they are mass analyzed.
Spectral acquisition is performed in the continuous duty cycle mode whereby ions are accumulated in the hexapole ion reservoir simultaneously with ion detection in the trapped ion cell. Following a 1.2 ms transfer event, in which ions are transferred to the trapped ion cell, the ions are subjected to a 1.6 ms chirp excitation corresponding to 8000-500 m/z. Data was acquired over an m/z range of 500-5000 (1M data points over a 225 K Hz bandwidth). Each spectrum is the result of co-adding 32 transients. Transients are zero-filled once prior to the magnitude mode Fourier transform and post calibration using the internal mass standard. The ICR-2LS software package (G. A. Anderson, J. E. Bruce (Pacific Northwest National Laboratory, Richland, Wash., 1995) is used to deconvolute the mass spectra and calculate the mass of the monoisotopic species using an “averaging” fitting routine (M. W. Senko, S. C. Beu, F. W. McLafferty, J. Am. Soc. Mass Spectrom. 1995, 6, 229) modified for DNA. Using this approach, monoisotopic molecular weights are calculated.
The ESI-TOF mass spectrometer used is based on a Bruker Daltonics MicroTOF™. Ions from the ESI source undergo orthogonal ion extraction and are focused in a reflectron prior to detection. The TOF is equipped with the same automated sample handling and fluidics as described for the FTICR above. Ions are formed in the standard MicroTOF™ ESI source that is equipped with the same off-axis sprayer and glass capillary as the FTICR ESI source. Consequently, source conditions are the same as those described above. External ion accumulation is also employed to improve ionization duty cycle during data acquisition. Each detection event on the TOF comprises 75,000 data points digitized over 75 μs.
The sample delivery scheme allows sample aliquots to be rapidly injected into the electrospray source at high flow rate and subsequently be electrosprayed at a much lower flow rate for improved ESI sensitivity. Prior to injecting a sample, a bolus of buffer is injected at a high flow rate to rinse the transfer line and spray needle to avoid sample contamination/carryover. Following the rinse step, the autosampler injects the next sample and the flow rate is switched to low flow. Following a brief equilibration delay, data acquisition begins. As spectra are co-added, the autosampler continues rinsing the syringe and picking up buffer to rinse the injector and sample transfer line. In general, two syringe rinses and one injector rinse are required to minimize sample carryover. During a routine screening protocol, a new sample mixture is injected every 106 seconds. A fast wash station for the syringe needle has also been implemented which, when combined with shorter acquisition times, facilitates the acquisition of mass spectra at a rate of just under one spectrum per minute.
Raw mass spectra are post-calibrated with an internal mass standard and deconvoluted to monoisotopic molecular masses. Unambiguous base compositions are derived from the exact mass measurement of the complementary single-stranded oligonucleotides. Quantitative results are obtained by comparing the peak heights with an internal PCR calibration standard present in every PCR well at 500 molecules per well. Calibration methods are commonly owned and disclosed in U.S. provisional patent Application Ser. No. 60/545,425, which is incorporated herein by reference in its entirety.
Because the molecular masses of the four natural nucleotides have a relatively narrow molecular mass range (A=313.058, G=329.052, C=289.046, T=304.046—See Table 2), a persistent source of ambiguity in assignment of base composition can occur as follows: two nucleic acid strands having different base composition may have a difference of about 1 Da when the base composition difference between the two strands is G⇄A (−15.994) combined with C⇄T (+15.000). For example, one 99-mer nucleic acid strand having a base composition of A27G30C21T21 has a theoretical molecular mass of 30779.058 while another 99-mer nucleic acid strand having a base composition of A26G31C22T20 has a theoretical molecular mass of 30780.052. A 1 Da difference in molecular mass may be within the experimental error of a molecular mass measurement and thus, the relatively narrow molecular mass range of the four natural nucleotides imposes an uncertainty factor.
The present example provides for a means for removing this theoretical 1 Da uncertainty factor through amplification of a nucleic acid with one mass-tagged nucleotide and three natural nucleotides.
Addition of significant mass to one of the 4 nucleotides (dNTPs) in an amplification reaction, or in the primers themselves, will result in a significant difference in mass of the resulting amplification product (significantly greater than 1 Da) arising from ambiguities arising from the G⇄A combined with C⇄T event (Table 1). Thus, the same the G⇄A (−15.994) event combined with 5-Iodo-C⇄T (−110.900) event would result in a molecular mass difference of 126.894. If the molecular mass of the base composition A27G305-Iodo-C21T21 (33422.958) is compared with A26G315-Iodo-CT20, (33549.852) the theoretical molecular mass difference is +126.894. The experimental error of a molecular mass measurement is not significant with regard to this molecular mass difference. Furthermore, the only base composition consistent with a measured molecular mass of the 99-mer nucleic acid is A27G305-Iodo-C21T21. In contrast, the analogous amplification without the mass tag has 18 possible base compositions.
Mass spectra of amplification products are analyzed independently using a maximum-likelihood processor, such as is widely used in radar signal processing, which is described in U.S. Patent Application 20040209260, which is incorporated herein by reference in entirety. This processor, referred to as GenX, first makes maximum likelihood estimates of the input to the mass spectrometer for each primer by running matched filters for each base composition aggregate on the input data. This includes the GenX response to a calibrant for each primer.
The algorithm emphasizes performance predictions culminating in probability-of-detection versus probability-of-false-alarm plots for conditions involving complex backgrounds of naturally occurring organisms and environmental contaminants Matched filters consist of a priori expectations of signal values given the set of primers used for each of the bioagents. A genomic sequence database is used to define the mass base count matched filters. The database contains the sequences of known bacterial bioagents and includes threat organisms as well as benign background organisms. The latter is used to estimate and subtract the spectral signature produced by the background organisms. A maximum likelihood detection of known background organisms is implemented using matched filters and a running-sum estimate of the noise covariance. Background signal strengths are estimated and used along with the matched filters to form signatures which are then subtracted. The maximum likelihood process is applied to this “cleaned up” data in a similar manner employing matched filters for the organisms and a running-sum estimate of the noise-covariance for the cleaned up data.
The amplitudes of all base compositions of bioagent identifying amplicons for each primer are calibrated and a final maximum likelihood amplitude estimate per organism is made based upon the multiple single primer estimates. Models of all system noise are factored into this two-stage maximum likelihood calculation. The processor reports the number of molecules of each base composition contained in the spectra. The quantity of amplification product corresponding to the appropriate primer set is reported as well as the quantities of primers remaining upon completion of the amplification reaction.
One of ordinary skill in the art will recognize that the signal processing methodologies of this example can be used in the context of the methods of STR analysis described herein.
Due to the natural abundance of .sub.13C and other heavy isotopes in biological macromolecules, exact mass measurements are more difficult at increasing molecular weight. Additionally, the width of the isotopic distribution is inherently broader at high molecular weight thus making accurate monoisotopic molecular weight measurements difficult. There is also an inherent sensitivity loss as signals from a single amplicon are spread over more and more isotope peaks. An analogous problem occurs with ESI-MS analysis of proteins.
Isotope-depleted dNTPs suitable for use in PCR reactions can be produced from bacteria grown in isotope-depleted media in which the primary carbon source is .sub.13C depleted glucose and 15N depleted ammonium sulfate. Once the bacteria are grown to critical density, the isotope-depleted genomic DNA is extracted. DNA is then digested to mononucleotides from which deoxynucleotide triphosphates are enzymatically synthesized. In this manner, it should be possible to produce isotope-depleted reagents at modest cost. Proof-of-principle for this approach was recently published by Tang and coworkers (Tang et al., Anal. Chem., 2002, 74, 226-231). We expect that generating isotope depleted PCR products will result in a 3-5 fold improvement in sensitivity (as the signal is spread over fewer isotope peaks). More importantly, this approach should relieve the spectral congestion observed in the mass spectra and reduce the extent that species of similar mass or m/z produce overlapping MS peaks.
Primers were designed against each of the 11 core DYS loci according to the procedure outlined in this figure. Allele reference sequences were obtained for each STR locus from the STRbase database (Ruitberg, C. M.; Reeder, D. J.; Butler, J. M. Nucleic Acids Res. 2001, 29, 20-322). Multiple primers were designed for all but one STR locus. The multiple primers were designed to hybridize to conserved sequence regions adjacent or nearly adjacent (in close proximity) to the STR repeat. For example, Table 3 lists a series of named primers designed to hybridize within conserved regions flanking the core Y-STR loci. The sequences of these primers are provided in Table 5.
In cases where conventional priming strategies are in conflict with parameters dictated by measurement of amplification products by mass spectrometry, alternative priming schemes were investigated. For example, the conventional products of the DYS385a/b locus are appreciably longer than the amplification products of other loci (241-324 nucleobases for the shortest primer set listed in the STRbase (Ruitberg, C. M.; Reeder, D. J.; Butler, J. M. Nucleic Acids Res. 2001, 29, 20-322; Wu, F. C.; Pu, C. E.; Forensic Sci. Int. 2001, 120, 213-222; Furedi, S., et al. Int. J. Legal Med. 1999, 113, 38-42; Schneider, P. M., et al. Forensic Sci. Int. 1998, 97, 61-70). There is substantial length contributed to the PCR product by an extended A/G region upstream of the ‘GAAA’ repeat. To take advantage of a distinct pattern of ‘A’ and ‘G’ present in this region, a primer binding site was chosen to reduce the product length range to 109-193 nucleobases. In another example, DYS389/II is one of the conventional loci of the 12 core Y-STR loci. In conventional Y-STR typing methods, the primer pair produces two amplification products, a smaller product designated DYS3 891 and a larger product designated DYS389II. This occurs because there is a duplicated binding site in the locus for the forward primer. In the present work, this complexity is eliminated by amplification of two regions separately and thus, primer pairs have been designed for each of two sub-loci, DYS389I and DYS389II. This is accomplished using a 3′ end difference in the forward primer binding region to favor formation of the shorter DYS389I product. The same forward primer with the first region at the 3′ end is used along with a reverse primer extending upstream of the second forward primer site to favor formation of the first part of DYS389II which is designated in the primer pair name as DYS389II-1 (excluding the repeat region of DYS389I). It was recognized that these two amplification products should not be produced in the same multiplex reaction.
A database was assembled which includes expected masses and base compositions of expected STR-identifying amplicons comprising the STR region and the flanking sequences to which the primers hybridize for each characterized allele. The base compositions and molecular masses were indexed to the primer pairs and alleles in the database.
Table 4 displays the reference alleles used to design primers for each of the 11 core Y-STR loci, along with the corresponding GenBank Accession number. Minimum and maximum product lengths were calculated using all characterized alleles. Each of the primers includes a 5′ T residue for the purpose of minimizing non-templated adenylation produced by Taq polymerase.
Primer pairs designed to the 11 core Y-STR loci are listed in Table 5. The forward and reverse primer names in this table follow standard primer pair naming as described above.
Initial primer testing was carried out using standard PCR reactions similar to the methods described herein. Each 40 μl reaction contained 10 mM Tris-Cl, 75 mM KCl, 1.5 mM MgCl2, 400 mM betaine, 200 μM each of dATP, dCTP, and dTTP (BioLine), 200 μM 13C-enriched dGTP (Cambridge Isotope Laboratories), and 1.5 U/reaction of Immolase™ DNA polymerase (BioLine). All primers were tested in duplicate in single primer pair reactions using 1 ng of template DNA (male blood sample SC35495 from SeraCare, Inc.). The thermocycling steps included 96° C. for 10 min, 40 cycles of (96° C., 25 sec, 56° C., 1.5 min, 72° C., 40 sec), followed by 72° C. for 4 min, and a 4° C. hold. Amplification products were analyzed by mass spectrometry as described herein.
The first test of the Y-STR primer pairs suggested that there was at least one primer pair per locus that was likely to perform to a sufficient extent to carry forward to a final assay. The results of this test produced three groups of primer pairs, one group to carry forward as assay candidate primers, one group of backup primers to be further tested or redesigned as backups and one group to be discarded due to poor performance. Reasons for discarding primer pairs or relegating primer pairs to the backup group included any or all of the following reasons: ineffective priming (poor signal representing an amplification product) high extent of adenylation, production of more than one product, production of a large product, and high baseline noise in mass spectra. Table 6 provides the results of this first round of testing of the original group of primer pairs.
Importantly, the strategy used to shorten the products from DYS385a/b to a maximum size of less than 200 nucleobases and to split the DYS389I/II locus appeared to be working effectively.
Interestingly, an additional allele was amplified with the primer pairs for DUS393 (see
Development of multiplexed reactions is a worthwhile endeavor because it enables more assays to be carried out within a single reaction vessel and therefore increases the efficiency of Y-STR typing processes. Multiplexing tests were initiated using the primer pairs and concentrations shown in Table 7. An aspect of multiplexing which must be considered is the possibility of overlapping signals due to DNA strands that have similar molecular masses. The primer pairs combined in multiplex reaction 1 and multiplex reaction 2 were thus chosen with respect to having sufficient separation in the sizes and masses of the amplification products that they would provide for the known alleles.
The same buffer and thermocycling conditions were used as described above for single-plex testing. Primer pairs in multiplexes were used at equal concentrations designed to total 800 nM for all primers combined (average of 200 nM per primer for the 4-plex reaction, or 160 nM per primer for the 5-plex). Blood sample SC35495 was tested in duplicate using 1 ng/reaction of DNA.
The mass spectrum of amplification products of the five-plex reaction containing primer pairs targeting DYS389I, DYS392, DYS391, DYS393 and DYS390 is shown in
The initial test indicated that the relative yields of the amplification products were not well balanced. Iteratively over a series of four experiments, a final set of the concentrations of the primer pairs in the two multiplex reactions was obtained to achieve more balanced yields of amplification products. Additionally, the original primer pair chosen for DYS385a/b (4582) was modified (4692) to obtain an increased product yield and to reduce the extent of adenylation (not shown). The thermocycling parameters were also modified to include a 99° C., 10 min. step at the end to reduce post-PCR non-templated adenylation of PCR products prior to analysis. The final reaction layout (four reactions per sample) allows 24 samples to be run on a single 96-well plate.
Using the primer pair panel shown in Table 7 (with primer pair number 4692 in place of primer pair number 4582) 95 male population samples obtained from the National Institute of Standards and Technology (NIST) were tested using 1 ng/reaction of template. These samples included 31 Caucasians, 32 African Americans and 32 Hispanics.
An example of a mass spectrum of an amplification product of the four-plex reaction of sample NIST-WT5137 is shown in
Typing results are shown in Tables 9A and 9B. The additional column designated “deduced DYS389II” was derived by adding the allele numbers for DYS389I and DYS389II-1. The concordance of the allele for DYS389II-1 was deduced by the allele being equal to the truth data for DYS389II minus DYS389I.
All 95 samples produced full profiles with no apparent drop-outs. Base allele calls were consistent with truth data for the 92 samples for which truth data were available (truth data were not available for samples MT97172, UT57301 and WT51354, indicated by asterisks in Tables 9A and 9B). All 95 samples produced two alleles for locus DYS393. Unlike the control sample run for initial primer panel testing, however, the genotypes for DYS393 did not all consist of two same-length alleles. In fact, 78% of the samples had two different-length alleles at DYS39 as noted above in Example 8. Each sample had one allele at DYS393 that was consistent with a known allele with a T→C SNP and in every case the other allele was consistent with a non-polymorphic allele. For these 95 samples, the non-polymorphic allele was consistent with the truth data in all 92 cases where there was truth data. The initial interpretation of this result was that additional individual-differentiating information obtained with the second DYS393 allele could be exploited by inclusion of primer pair 4602 in our final primer panel. It appears, however, that the additional alleles are the result of amplifying the homolog of DYS393 from the X-chromosome (Dupuy, B. M. et al. Forensic Sci. Int. 2000, 112, 111-21; Mayntz-Press, K. A.; Ballantyne, J. J. Forensic Sci. 2007, 52, 1025-34).
In addition to being concordant with existing truth data, polymorphisms were revealed in four of the twelve loci listed in Tables 9A and 9B. The identification of these polymorphisms has resulted in the characterization of new alleles. Interestingly, the highest frequency of polymorphisms was seen in DYS389II. All of these were in the 5′ repeat region of the double locus (no polymorphisms were seen in DYS389I). For all 92 samples having truth data, the sum of the base allele numbers for DYS389I and DYS389II-1 was the same as the truth data allele number for DYS389I/II, suggesting that the strategy of splitting DYS389I/II into two separately analyzed products will still remain backwards-compatible with existing databases because the sum of the two alleles can be used to compare to existing genotypes for DYS389I/II.
To test the preliminary primer pairs from Table 5 for specificity to the Y-chromosome, all 39 candidate primer pairs were tested individually with 1 ng/reaction of female DNA sample N31774 (SeraCare, Inc.). All reactions were done in duplicate. Primer pairs 4600, 4602 and 4603 all produced two clear products (not shown), showing alleles from both X-chromosomes. The genotype was consistent with DYS393 (de Knijff et al. Int. J. Legal Med. 1997, 110, 141-149; Dupuy, B. M. et al. T. Forensic Sci. Int. 2000. 112, 111-21). Primer pair number 4601 did not produce an appreciable product, and the signal output from both replicates corresponded to unconsumed primer pairs (not shown). For this reason, future work will include switching primer pair 4601 in for primer pair 4602 in the panel shown in Table 7.
In addition to DYS393, one primer pair for locus DYS389I (primer pair 4586) produced a single product from the female DNA that was smaller than the smallest DYS389I allele in the database (allele 9 which has a base composition of A18 G5 C26 T39). The product appeared to have a base composition of [A19 G4 C26 T35]. This composition is not consistent with a simple difference in TCTA and/or TCTG repeats. The alternative primer pair for DYS389I (4585) did not produce a product with female DNA (not shown). However, the products produced for primer pair 4585 are considerably larger than for 4586. Testing of male DNA in the presence of excess female DNA will be required to characterize the extent to which the possibility of cross-reactivity of primer pair 4585 will be a problem. In addition, tests with excess female DNA (beyond 10 ng/reaction) should be performed to understand if homologous loci on the X-chromosome will interfere with correct typing results for all finalized primer pairs (Mayntz-Press, K. A.; Ballantyne, J. J. Forensic Sci. 2007, 52, 1025-34).
The system used in measuring the molecular masses of the amplification products described herein includes a mass spectrometer in conjunction with a controller which is operably connected to the mass spectrometer. After the mass spectral data is acquired, the controller queries the database for primer pairs in each well and triggers an assessment of allelic mass ranges for each well. Data processing is automatically performed over a suitable mass range for each well in an assay plate. No manual interface is required for processing of amplification products.
The controller includes an integrated function to register and store STR and Y-STR profiles directly from the analysis interface. An additional interface is provided to query STR and Y-STR profiles that have been stored in the database by sample name, database and/or population. Profiles may be queried with polymorphisms or by base allele call only (for concordance comparisons or for backwards-compatibility). There is also a query option to show SNPs descriptively (e.g. A→G), or using a string that allows allele designations to be more easily analyzed in existing software packages.
The analysis interface is generalized to allow analysis of STRs, Y-STRs or autosomal SNPs or any other products that can be represented as labeled alleles. A sample status query has been added to allow tracking of the time points when profiles were run, the identifier of the source plate and the well in which each sample originates, as well as the identifier of the mass spectrometry plate(s).
A database-integrated repeat queue is implemented to improve the sample tracking efficiency. The controller includes a base composition browser enhanced for STR and Y-STR analyses (or analysis based upon named alleles) to allow browsing hypotheses by allele name as well as by base composition.
Signal processing functions are integrated to automatically assist in proper assignment of overlapping masses, such as the case of same-length heterozygous states where the alleles differ by an A⇄T SNP (in this case, masses would overlap because they differ by only 9 Da).
Various modifications of the invention, in addition to those described herein, will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. Each reference cited in the present application is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US10/33898 | 5/6/2010 | WO | 00 | 9/23/2011 |
Number | Date | Country | |
---|---|---|---|
61176028 | May 2009 | US |