Methods for rapid identification and quantitation of nucleic acid variants

Information

  • Patent Grant
  • 8026084
  • Patent Number
    8,026,084
  • Date Filed
    Friday, July 21, 2006
    18 years ago
  • Date Issued
    Tuesday, September 27, 2011
    13 years ago
Abstract
There is a need for nucleic acid analysis which is both specific and rapid, and in which no nucleic acid sequencing is required. The present invention addresses this need, among others by providing a method of nucleic acid amplification of overlapping sub-segments of a nucleic acid followed by molecular mass measurement of resulting amplification products by mass spectrometry, and determination of the base compositions of the amplification products.
Description
REFERENCE TO THE SEQUENCE LISTING

Reference is made to the sequence listing submitted via EFS-Web, which consists of a file named “DIBIS007.txt” (33,662 bytes), created on Feb. 3, 2010, the contents of which are incorporated herein by reference.


FIELD OF THE INVENTION

The present invention relates generally to the field of nucleic acid analysis and provides methods, compositions and kits useful for this purpose when combined with mass spectrometry.


BACKGROUND OF THE INVENTION

Characterization of nucleic acid variants is a problem of great importance in various fields of molecular biology such as, for example, genotyping and identification of strains of bacteria and viruses which are subject to evolutionary pressures via mechanisms including mutation, natural selection, genetic drift and recombination. Nucleic acid heterogeneity is a common feature of RNA viruses, for example. Populations of RNA viruses often exhibit high levels of heterogeneity due to mutations which enhance the ability of the viruses to adapt to growth conditions. Mixed populations of RNA virus quasispecies are known to exist in viral vaccines. It would be advantageous to have a method for monitoring the heterogeneity of viral vaccines. Likewise, new strains of bacterial species are also known to evolve rapidly.


Characterization and quantitiation of newly-evolving bacteria and viruses such as the SARS coronavirus, for example, is typically the first step in containment of an epidemic or infectious disease outbreak. In addition to characterization of naturally occurring variants of bacteria and viruses, there is a need for characterization of genetically engineered bacterial or viral bio-weapons in forensic or bio-warfare investigations. Unfortunately, the process of sequencing entire bacterial or viral genomes or vaccine vector sequences is time consuming and is not effective at resolving mixtures of nucleic acid variants.


Mitochondrial DNA is found in eukaryotes and differs from nuclear DNA in its location, its sequence, its quantity in the cell, and its mode of inheritance. The nucleus of the human cell contains two sets of 23 chromosomes, one paternal set and one maternal set. However, cells may contain hundreds to thousands of mitochondria, each of which may contain several copies of mitochondrial DNA. Nuclear DNA has many more bases than mitochondrial DNA, but mitochondrial DNA is present in many more copies than nuclear DNA. This characteristic of mitochondrial DNA is useful in situations where the amount of DNA in a sample is very limited. Typical sources of DNA recovered from crime scenes include hair, bones, teeth, and body fluids such as saliva, semen, and blood.


In humans, mitochondrial DNA is inherited strictly from the mother (Case J. T. and Wallace, D. C., Somatic Cell Genetics, 1981, 7, 103-108; Giles, R. E. et al. Proc. Natl. Acad. Sci. 1980, 77, 6715-6719; Hutchison, C. A. et al. Nature, 1974, 251, 536-538). Thus, the mitochondrial DNA sequences obtained from maternally related individuals, such as a brother and a sister or a mother and a daughter, will exactly match each other in the absence of a mutation. This characteristic of mitochondrial DNA is advantageous in missing persons cases as reference mitochondrial DNA samples can be supplied by any maternal relative of the missing individual (Ginther, C. et al. Nature Genetics, 1992, 2, 135-138; Holland, M. M. et al. Journal of Forensic Sciences, 1993, 38, 542-553; Stoneking, M. et al. American Journal of Human Genetics, 1991, 48, 370-382).


The human mitochondrial DNA genome is approximately 16,569 bases in length and has two general regions: the coding region and the control region. The coding region is responsible for the production of various biological molecules involved in the process of energy production in the cell and includes about 37 genes (22 transfer RNAs, 2 ribosomal RNAs, and 13 peptides), with very little intergenic sequence and no introns. The control region is responsible for regulation of the mitochondrial DNA molecule. Two regions of mitochondrial DNA within the control region have been found to be highly polymorphic, or variable, within the human population (Greenberg, B. D. et al. Gene, 1983, 21, 33-49). These two regions are termed “hypervariable Region I” (HV1), which has an approximate length of 342 base pairs (bp), and “hypervariable Region II” (HV2), which has an approximate length of 268 bp. Forensic mitochondrial DNA examinations are performed using these two hypervariable regions because of the high degree of variability found among individuals.


There exists a need for rapid identification of humans wherein human remains and/or biological samples are analyzed. Such remains or samples may be associated with war-related casualties, aircraft crashes, and acts of terrorism, for example. Analysis of mitochondrial DNA enables a rule-in/rule-out identification process for persons for whom DNA profiles from a maternal relative are available. Human identification by analysis of mitochondrial DNA can also be applied to human remains and/or biological samples obtained from crime scenes.


The process of human identification is a common objective of forensics investigations. As used herein, “forensics” is the study of evidence discovered at a crime or accident scene and used in a court of law. “Forensic science” is any science used for the purposes of the law, in particular the criminal justice system, and therefore provides impartial scientific evidence for use in the courts of law, and in a criminal investigation and trial. Forensic science is a multidisciplinary subject, drawing principally from chemistry and biology, but also from physics, geology, psychology and social science, for example.


Forensic scientists generally use the two hypervariable regions of human mitochondrial DNA for analysis. These hypervariable regions, or portions thereof, provide only one non-limiting example of a region of mitochondrial DNA useful for identification analysis.


A typical mitochondrial DNA analysis begins when total genomic and mitochondrial DNA is extracted from biological material, such as a tooth, blood sample, or hair. The polymerase chain reaction (PCR) is then used to amplify, or create many copies of, the two hypervariable portions of the non-coding region of the mitochondrial DNA molecule, using flanking primers. When adequate amounts of PCR product are amplified to provide all the necessary information about the two hypervariable regions, sequencing reactions are performed. Where possible, the sequences of both hypervariable regions are determined on both strands of the double-stranded DNA molecule, with sufficient redundancy to confirm the nucleotide substitutions that characterize that particular sample. The entire process is then repeated with a known sample, such as blood or saliva collected from a known individual. The sequences from both samples are compared to determine if they match. Finally, in the event of an inclusion or match, The Scientific Working Group on DNA Analysis Methods (SWGDAM) mitochondrial DNA database, which is maintained by the FBI, is searched for the mitochondrial sequence that has been observed for the samples. The analysts can then report the number of observations of this type based on the nucleotide positions that have been read. A written report can be provided to the submitting agency. This process is described in more detail in M. M. Holland and T. J. Parsons 1999, Forensic Science Review, volume 11, pages 25-51.


Approximately 610 bp of mitochondrial DNA are currently sequenced in forensic mitochondrial DNA analysis. Recording and comparing mitochondrial DNA sequences would be difficult and potentially confusing if all of the bases were listed. Thus, mitochondrial DNA sequence information is recorded by listing only the differences with respect to a reference DNA sequence. By convention, human mitochondrial DNA sequences are described using the first complete published mitochondrial DNA sequence as a reference (Anderson, S. et al., Nature, 1981, 290, 457-465). This sequence is commonly referred to as the Anderson sequence. It is also called the Cambridge reference sequence or the Oxford sequence. Each base pair in this sequence is assigned a number. Deviations from this reference sequence are recorded as the number of the position demonstrating a difference and a letter designation of the different base. For example, a transition from A to G at position 263 would be recorded as 263 G. If deletions or insertions of bases are present in the mitochondrial DNA, these differences are denoted as well.


In the United States, there are seven laboratories currently conducting forensic mitochondrial DNA examinations: the FBI Laboratory; Laboratory Corporation of America (LabCorp) in Research Triangle Park, N.C.; Mitotyping Technologies in State College, Pennsylvania; the Bode Technology Group (BTG) in Springfield, Va.; the Armed Forces DNA Identification Laboratory (AFDIL) in Rockville, Md.; BioSynthesis, Inc. in Lewisville, Tex.; and Reliagene in New Orleans, La.


Mitochondrial DNA analyses have been admitted in criminal proceedings from these laboratories in the following states as of April 1999: Alabama, Arkansas, Florida, Indiana, Illinois, Maryland, Michigan, New Mexico, North Carolina, Pennsylvania, South Carolina, Tennessee, Texas, and Washington. Mitochondrial DNA has also been admitted and used in criminal trials in Australia, the United Kingdom, and several other European countries.


Since 1996, the number of individuals performing mitochondrial DNA analysis at the FBI Laboratory has grown from 4 to 12, with more personnel expected in the near future. Over 150 mitochondrial DNA cases have been completed by the FBI Laboratory as of March 1999, and dozens more await analysis. Forensic courses are being taught by the FBI Laboratory personnel and other groups to educate forensic scientists in the procedures and interpretation of mitochondrial DNA sequencing. More and more individuals are learning about the value of mitochondrial DNA sequencing for obtaining useful information from evidentiary samples that are small, degraded, or both. Mitochondrial DNA sequencing is becoming known not only as an exclusionary tool but also as a complementary technique for use with other human identification procedures. Mitochondrial DNA analysis will continue to be a powerful tool for law enforcement officials in the years to come as other applications are developed, validated, and applied to forensic evidence.


Presently, the forensic analysis of mitochondrial DNA is rigorous and labor-intensive. Currently, only 1-2 cases per month per analyst can be performed. Several molecular biological techniques are combined to obtain a mitochondrial DNA sequence from a sample. The steps of the mitochondrial DNA analysis process include primary visual analysis, sample preparation, DNA extraction, polymerase chain reaction (PCR) amplification, post-amplification quantification of the DNA, automated DNA sequencing, and data analysis. Another complicating factor in the forensic analysis of mitochondrial DNA is the occurrence of heteroplasmy wherein the pool of mitochondrial DNAs in a given cell is heterogeneous due to mutations in individual mitochondrial DNAs. There are different forms of heteroplasmy found in mitochondrial DNA. For example, sequence heteroplasmy (also known as point heteroplasmy) is the occurrence of more than one base at a particular position or positions in the mitochondrial DNA sequence. Length heteroplasmy is the occurrence of more than one length of a stretch of the same base in a mitochondrial DNA sequence as a result of insertion of nucleotide residues.


Heteroplasmy is a problem for forensic investigators since a sample from a crime scene can differ from a sample from a suspect by one base pair and this difference may be interpreted as sufficient evidence to eliminate that individual as the suspect. Hair samples from a single individual can contain heteroplasmic mutations at vastly different concentrations and even the root and shaft of a single hair can differ. The detection methods currently available to molecular biologists cannot detect low levels of heteroplasmy. Furthermore, if present, length heteroplasmy will adversely affect sequencing runs by resulting in an out-of-frame sequence that cannot be interpreted.


Mass spectrometry provides detailed information about the molecules being analyzed, including high mass accuracy. It is also a process that can be easily automated.


There is a need for a mitochondrial DNA forensic analysis which is both specific and rapid, and in which no nucleic acid sequencing is required. There is also a need for a method of rapid characterization and quantitation of nucleic acids which have variant positions relative to a reference sequence. These needs, as well as others, are addressed herein below.


SUMMARY OF THE INVENTION

Described herein are compositions and methods for analyzing a nucleic acid by performing the steps of obtaining a sample of nucleic acid for base composition analysis; selecting at least two primer pairs that will generate overlapping amplification products of at least two sub-segments of the nucleic acid; amplifying at least two nucleic acid sequences of a region of the nucleic acid designated as a target for base composition analysis using the primer pairs, thereby generating at least two overlapping amplification products; obtaining base compositions of the amplification products by measuring molecular masses of one or more of the amplification products using a mass spectrometer; and converting one or more of the measured molecular masses to base compositions; comparing one or more of the base compositions with one or more base compositions of reference sub-segments of a reference sequence; and identifying the presence of a particular nucleic acid sequence or variant thereof.


The nucleic acid analyzed is obtained from a human, bacterium, virus, fungus, synthetic nucleic acid source, recombinant nucleic acid source, or encodes a biological product such as a vaccine, antibody or other biological product.


Further described herein are compositions and methods for identifying a human by obtaining a sample comprising mitochondrial DNA of the human for base composition analysis; selecting at least two primer pairs that will generate overlapping amplification products representing overlapping sub-segments of the mitochondrial DNA; amplifying at least two nucleic acid sequences of a region of the mitochondrial DNA designated as a target for base composition analysis using the at least two primer pairs, thereby generating at least two overlapping amplification products; obtaining base compositions of the amplification products by measuring molecular masses of one or more of the amplification products generated using a mass spectrometer and converting one or more of the measured molecular masses to base compositions; and comparing one or more of the base compositions with one or more base compositions of reference sub-segments of a reference sequence thereby identifying the human.


Also described herein are compositions and methods for characterizing heteroplasmy of mitochondrial DNA comprising the steps of obtaining a sample comprising mitochondrial DNA for base composition analysis; selecting at least two primer pairs that will generate overlapping amplification products representing sub-segments of the mitochondrial DNA; amplifying at least two nucleic acid sequences of a region of the mitochondrial DNA designated as a target for base composition analysis using the at least two primer pairs, thereby generating at least two overlapping amplification products; obtaining base compositions of the amplification products by measuring molecular masses of one or more of the amplification products using a mass spectrometer; and converting one or more of the measured molecular masses to base compositions; comparing one or more of the base compositions with one or more base compositions of reference sub-segments of a reference sequence; and identifying at least two distinct amplification products with distinct base compositions obtained by the same pair of primers, thereby characterizing the heteroplasmy.


Also disclosed are primer pair compositions and kits comprising the same which are useful for obtaining amplification products used in genotyping organisms.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a schematic diagram of the definition of sub-segments of a reference sequence for amplification. Arrows indicate the position of primer hybridization for obtaining an amplification product corresponding to a sub-segment. For example, FWD-A indicates the hybridization position of the forward primer for obtaining an amplification product corresponding to Sub-segment A, while REV-A indicates the hybridization position of the reverse primer for obtaining an amplification product corresponding to sub-segment A. Overlap of one sub-segment A, which has a length of 120 nucleobases (bp) with sub-segment B is shown on the left side.



FIG. 2 is mass spectrum of three amplification products of a sample of mitochondrial DNA displaying six peaks corresponding to the individual strands of each of the three amplification products, each corresponding to sub-segments of the target mitochondrial DNA. Peaks labeled A and B are from a single amplification product of the HV1 region obtained with primer pair number 2892 (SEQ ID NOs: 4:29). Peaks labeled C and D are from a single amplification product of the HV1 region obtained with primer pair number 2901 (SEQ ID NOs: 12:37). Peaks labeled E and F are from a single amplification product of the HV2 region obtained with primer pair number 2906 (SEQ ID NOs: 17:42).



FIG. 3 represents a refinement of peaks from a mass spectrum of a sample mitochondrial DNA displaying six peak lines corresponding to the individual strands of each of the three amplification products. Detection of heteroplasmy in one of the amplified regions is indicated. Peaks labeled A and B are from a single amplification product of the HV1 region obtained with primer pair number 2904 (SEQ ID NOs: 15:40). Peaks labeled C and D are from a single amplification product of the HV1 region obtained with primer pair number 2896 (SEQ ID NO: 8:33). Peaks labeled C′ and D′ are from a single amplification product of the HV1 region obtained with primer pair number 2896 which represents one heteroplasmic variant of the amplification product represented by peaks C and D. Peaks labeled C″ and D″ are from a single amplification product of the HV1 region obtained with primer pair number 2896 which represents another heteroplasmic variant of the amplification product represented by peaks C and D. Peaks labeled E and F are from a single amplification product of the HV2 region obtained with primer pair number 2913 (SEQ ID NO: 22:47).



FIG. 4 is an illustration of the names and chromosome locations for the CODIS 13 markers, as well as for the AMEL markers on the X and Y chromosomes. The CODIS 13 short tandem repeats are commonly used by law enforcement for determining the source identity for a given nucleic acid.





DEFINITIONS

A number of terms and phrases are defined below:


As described herein, nucleic acids are analyzed to generate a base composition profile. Nucleic acids include, but are not limited to, human mitochondrial DNA, human, chromosomal DNA, bacterial genomic DNA, fungal DNA, viral DNA, viral RNA, commercially available plasmids or vectors or vaccines. The nucleic acids are referred to as having regions, which define as being a portion of the nucleic acid that are known or suspected to comprise genetic sequence differences that allow for the characterization of the nucleic acid. By use of the term “characterization” it is meant that the source of the nucleic acid can be identified (e.g., genetic identification of a human, identification of a recombination event in a plasmid, diagnosis of a human genetic disposition towards a disease or trait, HN typing of influenza virus strains). Part or all of a region may form the target for analysis using the disclosed material and methods. Alternatively, an entire nucleic acid can be analyzed, which is typically more useful when there are not defined regions for characterization. Thus, the whole nucleic acid will be referred to herein as region and a target. Within a target there are sub-segments. Sub-segments are the portions of nucleic acid that are flanked by primer to generate individual amplified products or amplicons. These sub-segments preferably overlap.


As used herein, “Mitochondrial DNA” refers to a circular ring of DNA which is separate from chromosomal DNA and contained as multiple copies within mitochondria. Mitochondrial DNA is often abbreviated as “mtDNA” and will be recognized as such by one with ordinary skill in the arts of mitochondrial DNA analysis. In a preferred embodiment, the objective is to identify a human. Nucleic acid is obtained from a human cell, such as a blood cell, hair, cell, skin cell or any other human cell appropriate for obtaining nucleic acid. In some embodiments, the nucleic acid is mitochondrial DNA. In some embodiments, certain portions of mitochondrial DNA are appropriate for base composition analysis such as, for example, HV1 and HV2.


As used herein, the term “HV1” refers to a region within mitochondrial DNA known as “hypervariable region 1.” With respect to the reference Anderson/Cambridge mitochondrial DNA sequence, the HV1 region is represented by coordinates 15924 . . . 16428. This region is useful for identification of humans because it has a high degree of variability among different human individuals. In some embodiments, a defined portion of the HV1 region is analyzed by base composition analysis of “sub-segments” of the defined portion. In this embodiment, the defined portion of HV1 represents the “target.” In preferred embodiments, the entire HV1 region (coordinates 15924 . . . 16428) is divided into overlapping sub-segments. In this embodiment, the entire HV1 region represents the “target.”


As used herein, the term “HV2” refers to a region within mitochondrial DNA known as “hypervariable region 2.” With respect to the reference Anderson/Cambridge mitochondrial DNA sequence, the HV1 region is represented by coordinates 31 . . . 576. As for HV1, the HV2 region is useful for identification of humans because it also has a high degree of variability among different human individuals. In some embodiments, a defined portion of the HV2 region is analyzed by base composition analysis of “sub-segments” of the defined portion. In this embodiment, the defined portion of HV2 represents the “target.” In preferred embodiments, the entire HV1 region (coordinates 31 . . . 576) is divided into overlapping sub-segments. In this embodiment, the entire HV2 region represents the “target.”


In other embodiments, additional target regions within the mitochondrial DNA may be chosen for base composition analysis.


As used herein, the term “target” generally refers to a nucleic acid sequence to be detected or characterized. Thus, the “target” is sought to be sorted out from other nucleic acid sequences.


As used herein, “sub-segments” are portions of a given target which are of useful size for base composition analysis. In some embodiments, the sizes of sub-segments range between about 45 to about 150 nucleobases in length. In preferred embodiments, the “sub-segments” overlap with each other and cover the entire target as shown in FIG. 1. Amplification products representing the sub-segments are obtained by amplification methods, such as PCR that are well known to those with ordinary skill in molecular biology techniques. The amplification products representing the sub-segments are analyzed by mass spectrometry to determine their molecular masses and base compositions of the amplification products are calculated from the molecular masses. The experimentally-determined base compositions are then compared with base compositions of “reference sub-segments” of a “reference nucleic acid” whose sequence and/or base composition is known. In preferred embodiments a database containing base compositions of reference nucleic acids and sub-segments thereof is used for comparison with the experimentally-determined base compositions. A match of one or more experimentally-determined base compositions of one or more sub-segments with one or more base compositions of reference sub-segments will provide the identity of the human.


The same definitions of the terms “target,” “sub-segment,” “reference sub-segment” and “reference nucleic acid” are applicable to other preferred embodiments where base composition analysis is used to identify a human by analysis of specific human chromosomal target regions such as CODIS markers for example. FIG. 4 is an illustration of the names and chromosome locations for the CODIS 13 markers, as well as for the AMEL markers on the X and Y chromosomes.


The same definitions of the terms “target,” “sub-segment,” “reference sub-segment” and “reference nucleic acid” are applicable to other preferred embodiments where base composition analysis is used to identify or characterize a genotype of a microorganism such as a bacterium, virus, or fungus for example. Characterization of genotypes of microorganisms is useful in infectious disease diagnostics for example. In these embodiments, a given target may represent the entire genome of a microorganism or a portion thereof. The target is analyzed by characterization of amplification products representing sub-segments of the target.


The same definitions of the terms “target,” “sub-segment,” “reference sub-segment” and “reference nucleic acid” are applicable to other preferred embodiments where base composition analysis is used to validate a “test nucleic acid” with respect to a reference nucleic acid. Validation of test nucleic acids is desirable in quality control of pharmaceutical production such as in production of vectors carrying genes encoding therapeutic proteins such as vaccines for example. In this embodiment, the “test nucleic acid” is expected to be identical in sequence and base composition to the reference nucleic acid. Comparison of experimentally determined base compositions of amplification products representing sub-segments of the target with base compositions of reference sub-segments may either indicate that the base compositions are identical, thereby validating the test nucleic acid, or identify a variant of the reference nucleic acid.


“Amplification” is a special case of nucleic acid replication involving template specificity. It is to be contrasted with non-specific template replication (i.e., replication that is template-dependent but not dependent on a specific template). Template specificity is here distinguished from fidelity of replication (i.e., synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-) specificity. Template specificity is frequently described in terms of “target” specificity.


Template or target specificity is achieved in-most amplification techniques by the choice of enzyme. Amplification enzymes are enzymes that, under conditions they are used, will process only specific sequences of nucleic acid in a heterogeneous mixture of nucleic acid. For example, in the case of Qβ replicase, MDV-1 RNA is the specific template for the replicase (D. L. Kacian et al., Proc. Natl. Acad. Sci. USA 69:3038 [1972]). Other nucleic acid will not be replicated by this amplification enzyme. Similarly, in the case of T7 RNA polymerase, this amplification enzyme has a stringent specificity for its own promoters (Chamberlin et al., Nature 228:227 [1970]). In the case of T4 DNA ligase, the enzyme will not ligate the two oligonucleotides or polynucleotides, where there is a mismatch between the oligonucleotide or polynucleotide substrate and the template at the ligation junction (D. Y. Wu and R. B. Wallace, Genomics 4:560 [1989]). Finally, Taq and Pfu polymerases, by virtue of their ability to function at high temperature, are found to display high specificity for the sequences bounded and thus defined by the primers; the high temperature results in thermodynamic conditions that favor primer hybridization with the target sequences and not hybridization with non-target sequences (H. A. Erlich (ed.), PCR Technology, Stockton Press [1989]).


As used herein, the term “sample template” refers to nucleic acid originating from a sample that is analyzed for the presence of “target” (defined below). In contrast, “background template” is used in reference to nucleic acid other than sample template that may or may not be present in a sample. Background template is most often inadvertent. It may be the result of carryover, or it may be due to the presence of nucleic acid contaminants sought to be purified away from the sample. For example, nucleic acids from organisms other than those to be detected may be present as background in a test sample.


As used herein, the term “primer” refers to an oligonucleotide, whether occurring naturally, such as a purified fragment from a restriction digest, or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, (i.e., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. Preferably, the primer is sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method. The primers can be any useful length. Lengths of about 13 to about 35 nucleobases are preferred. One with ordinary skill in the art of molecular biology can design primers appropriate for amplification methods.


As used herein, a “pair of primers” or “a primer pair” is used for amplification of a nucleic acid sequence. A pair of primers comprises a forward primer and a reverse primer. The forward primer hybridizes to a sense strand of a target gene sequence to be amplified and primes synthesis of an antisense strand (complementary to the sense strand) using the target sequence as a template. A reverse primer hybridizes to the antisense strand of a target gene sequence to be amplified and primes synthesis of a sense strand (complementary to the antisense strand) using the target sequence as a template.


As used herein, the term “polymerase chain reaction” (“PCR”) refers to the method of K. B. Mullis U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, hereby incorporated by reference, that describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one “cycle”; there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the “polymerase chain reaction” (hereinafter “PCR”).Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified.”


With PCR, it is possible to amplify a single copy of a specific target sequence in genomic DNA to a level detectable by several different methodologies (e.g., hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of 32P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, into the amplified segment). In addition to genomic DNA, any oligonucleotide or polynucleotide sequence can be amplified with the appropriate set of primer molecules. In particular, the amplified segments created by the PCR process itself are, themselves, efficient templates for subsequent PCR amplifications.


As used herein, the terms “PCR product,” “PCR fragment,” and “amplification product” refer to the nucleic acid product obtained after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.


As used herein, the term “amplification reagents” refers to those reagents (deoxyribonucleotide triphosphates, buffer, etc.), needed for amplification except for primers, nucleic acid template, and the amplification enzyme. Typically, amplification reagents along with other reaction components are placed and contained in a reaction vessel (test tube, microwell, etc.).


As used herein, the terms “complementary” or “complementarity” are used in reference to polynucleotides (i.e., a sequence of nucleotides such as an oligonucleotide or a target nucleic acid) related by the base-pairing rules. For example, for the sequence “5′-A-G-T-3′,” is complementary to the sequence “3′-T-C-A-5′.” Complementarity may be “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be “complete” or “total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods which depend upon binding between nucleic acids. Either term may also be used in reference to individual nucleotides, especially within the context of polynucleotides. For example, a particular nucleotide within an oligonucleotide may be noted for its complementarity, or lack thereof, to a nucleotide within another nucleic acid strand, in contrast or comparison to the complementarity between the rest of the oligonucleotide and the nucleic acid strand.


The terms “homology,” “homologous” and “sequence identity” refer to a degree of identity. There may be partial homology or complete homology. A partially homologous sequence is one that is less than 100% identical to another sequence. Determination of sequence identity is described in the following example: a primer 20 nucleobases in length which is otherwise identical to another 20 nucleobase primer but having two non-identical residues has 18 of 20 identical residues (18/20=0.9 or 90% sequence identity). In another example, a primer 15 nucleobases in length having all residues identical to a 15 nucleobase segment of primer 20 nucleobases in length would have 15/20=0.75 or 75% sequence identity with the 20 nucleobase primer. In context of the present invention, sequence identity is meant to be properly determined when the query sequence and the subject sequence are both described in the 5′ to 3′ direction.


As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is influenced by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, and the Tm of the formed hybrid. “Hybridization” methods involve the annealing of one nucleic acid to another, complementary nucleic acid, i.e., a nucleic acid having a complementary nucleotide sequence. The ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction is a well-recognized phenomenon. The initial observations of the “hybridization” process by Marmur and Lane, Proc. Natl. Acad. Sci. USA 46:453 (1960) and Doty et al., Proc. Natl. Acad. Sci. USA 46:461 (1960) have been followed by the refinement of this process into an essential tool of modem biology.


The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5′ end of one sequence is paired with the 3′ end of the other, is in “antiparallel association.” Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarity need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.


As used herein, the term “Tm” is used in reference to the “melting temperature.” The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. Several equations for calculating the Tm of nucleic acids are well known in the art. As indicated by standard references, a simple estimate of the Tm value may be calculated by the equation: Tm=81.5+0.41 (% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl (see e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985). Other references (e.g., Allawi, H. T. & SantaLucia, J., Jr. Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry 36, 10581-94 (1997) include more sophisticated computations which take structural and environmental, as well as sequence characteristics into account for the calculation of Tm.


The term “gene” refers to a DNA sequence that comprises control and coding sequences necessary for the production of an RNA having a non-coding function (e.g., a ribosomal or transfer RNA), a polypeptide or a precursor. The RNA or polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or function is retained.


The term “wild-type” refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the “normal” or “wild-type” form of the gene. In contrast, the term “modified”, “mutant” or “polymorphic” refers to a gene or gene product which displays modifications in sequence and or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally-occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.


The term “oligonucleotide” as used herein is defined as a molecule comprising two or more deoxyribonucleotides or ribonucleotides, preferably at least 5 nucleotides, more preferably at least about 13 to 35 nucleotides. The exact size will depend on many factors, which in turn depend on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, PCR, or a combination thereof.


Because mononucleotides are reacted to make oligonucleotides in a manner such that the 5′ phosphate of one mononucleotide pentose ring is attached to the 3′ oxygen of its neighbor in one direction via a phosphodiester linkage, an end of an oligonucleotide is referred to as the “5′-end” if its 5′ phosphate is not linked to the 3′ oxygen of a mononucleotide pentose ring and as the “3′-end” if its 3′ oxygen is not linked to a 5′ phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5′ and 3′ ends. A first region along a nucleic acid strand is said to be upstream of another region if the 3′ end of the first region is before the 5′ end of the second region when moving along a strand of nucleic acid in a 5′ to 3′ direction. All oligonucleotide primers disclosed herein are understood to be presented in the 5′ to 3′ direction when reading left to right.


When two different, non-overlapping oligonucleotides anneal to different regions of the same linear complementary nucleic acid sequence, and the 3′ end of one oligonucleotide points towards the 5′ end of the other, the former may be called the “upstream” oligonucleotide and the latter the “downstream” oligonucleotide. Similarly, when two overlapping oligonucleotides are hybridized to the same linear complementary nucleic acid sequence, with the first oligonucleotide positioned such that its 5′ end is upstream of the 5′ end of the second oligonucleotide, and the 3′ end of the first oligonucleotide is upstream of the 3′ end of the second oligonucleotide, the first oligonucleotide may be called the “upstream” oligonucleotide and the second oligonucleotide may be called the “downstream” oligonucleotide.


The term “primer” refers to an oligonucleotide that is capable of acting as a point of initiation of synthesis when placed under conditions in which primer extension is initiated. An oligonucleotide “primer” may occur naturally, as in a purified restriction digest or may be produced synthetically. A primer is selected to be “substantially” complementary to a strand of specific sequence of the template. A primer must be sufficiently complementary to hybridize with a template strand for primer elongation to occur. A primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5′ end of the primer, with the remainder of the primer sequence being substantially complementary to the strand. Non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridize and thereby form a template primer complex for synthesis of the extension product of the primer.


The term “target nucleic acid” refers to a nucleic acid molecule containing a sequence that has at least partial complementarity with an oligonucleotide primer. The target nucleic acid may comprise single- or double-stranded DNA or RNA.


The term “variable sequence” as used herein refers to differences in nucleic acid sequence between two nucleic acids. For example, the same gene of two different bacterial species may vary in sequence by the presence of single base substitutions and/or deletions or insertions of one or more nucleotides. These two forms of the structural gene are said to vary in sequence from one another.


The term “nucleotide analog” as used herein refers to modified or non-naturally occurring nucleotides such as 5-propynyl pyrimidines (i.e., 5-propynyl-dTTP and 5-propynyl-dTCP), 7-deaza purines (i.e., 7-deaza-dATP and 7-deaza-dGTP). Nucleotide analogs include base analogs and comprise modified forms of deoxyribonucleotides as well as ribonucleotides.


The term “microorganism” as used herein means an organism too small to be observed with the unaided eye and includes, but is not limited to bacteria, virus, protozoans, fungi; and ciliates.


The term “microbial gene sequences” refers to gene sequences derived from a microorganism.


The term “bacteria” or “bacterium” refers to any member of the groups of eubacteria and archaebacteria.


The term “virus” refers to obligate, ultramicroscopic, intracellular parasites incapable of autonomous replication (i.e., replication requires the use of the host cell's machinery).


The term “sample” in the present specification and claims is used in its broadest sense. On the one hand it is meant to include a specimen or culture (e.g., microbiological cultures). On the other hand, it is meant to include both biological and environmental samples. A sample may include a specimen of synthetic origin.


Biological samples may be animal, including human, fluid, solid (e.g., stool) or tissue, as well as liquid and solid food and feed products and ingredients such as dairy items, vegetables, meat and meat by-products, and waste. Biological samples may be obtained from all of the various families of domestic animals, as well as feral or wild animals, including, but not limited to, such animals as ungulates, bear, fish, lagamorphs, rodents, etc.


Environmental samples include environmental material such as surface matter, soil, water and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, utensils, disposable and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the present invention.


The term “source of target nucleic acid” refers to any sample that contains nucleic acids (RNA or DNA). Particularly preferred sources of target nucleic acids are biological samples including, but not limited to blood, saliva, cerebral spinal fluid, pleural fluid, milk, lymph, sputum and semen. The source of nucleic acid may also be an organism such as a human, animal, bacterium, virus or fungus for example.


The term “polymerization means” or “polymerization agent” refers to any agent capable of facilitating the addition of nucleoside triphosphates to an oligonucleotide. Preferred polymerization means comprise DNA and RNA polymerases.


The term “adduct” is used herein in its broadest sense to indicate any compound or element that can be added to an oligonucleotide. An adduct may be charged (positively or negatively) or may be charge-neutral. An adduct may be added to the oligonucleotide via covalent or non-covalent linkages. Examples of adducts include, but are not limited to, indodicarbocyanine dye amidites, amino-substituted nucleotides, ethidium bromide, ethidium homodimer, (1,3-propanediamino)propidium, (diethylenetriamino)propidium, thiazole orange, (N-N′-tetramethyl-1,3-propanediamino)propyl thiazole orange, (N-N′-tetramethyl-1,2-ethanediamino)propyl thiazole orange, thiazole orange-thiazole orange homodimer (TOTO), thiazole orange-thiazole blue heterodimer (TOTAB), thiazole orange-ethidium heterodimer 1 (TOED1), thiazole orange-ethidium heterodimer 2 (TOED2) and fluorescein-ethidium heterodimer (FED), psoralens, biotin, streptavidin, avidin, etc.


Where a first oligonucleotide is complementary to a region of a target nucleic acid and a second oligonucleotide has complementary to the same region (or a portion of this region) a “region of overlap” exists along the target nucleic acid. The degree of overlap will vary depending upon the nature of the complementarity.


As used herein, the term “purified” or “to purify” refers to the removal of contaminants from a sample.


As used herein the term “portion” when in reference to a protein (as in “a portion of a given protein”) refers to fragments of that protein. The fragments may range in size from four amino acid residues to the entire amino acid sequence minus one amino acid (e.g., 4, 5, 6, . . . , n−1).


The term “nucleic acid” or “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single or double stranded, and represent the sense or antisense strand. Similarly, “amino acid sequence” as used herein refers to peptide or protein sequence.


The term “peptide nucleic acid” (“PNA”) as used herein refers to a molecule comprising bases or base analogs such as would be found in natural nucleic acid, but attached to a peptide backbone rather than the sugar-phosphate backbone typical of nucleic acids. The attachment of the bases to the peptide is such as to allow the bases to base pair with complementary bases of nucleic acid in a manner similar to that of an oligonucleotide. These small molecules, also designated anti gene agents, stop transcript elongation by binding to their complementary strand of nucleic acid (Nielsen, et al. Anticancer Drug Des. 8:53 63 [1993]).


The term “locked nucleic acid (“LNA”) as used herein, refers to a conformationally restricted nucleic acid analogue, in which the ribose ring is locked into a rigid C3′-endo (or Northern-type) conformation by a simple 2′-O, 4′-C methylene bridge. Duplexes involving LNA (hybridized to either DNA or RNA) display a large increase in melting temperatures of between +3.0 to +9.3° C. per LNA modification, in comparison to corresponding unmodified reference duplexes. LNA recognizes both DNA and RNA with remarkable affinities and selectivities. Incorporation of a given number of LNA monomers into oligonucleotides is a very convenient way of vastly improving the stability and specificity of duplexes toward complementary RNA or DNA such as, for example, primer binding regions.


As used herein, the terms “purified” or “substantially purified” refer to molecules, either nucleic or amino acid sequences, that are removed from their natural environment, isolated or separated, and are at least 60% free, preferably 75% free, and most preferably 90% free from other components with which they are naturally associated. An “isolated polynucleotide” or “isolated oligonucleotide” is therefore a substantially purified polynucleotide.


The term “duplex” refers to the state of nucleic acids in which the base portions of the nucleotides on one strand are bound through hydrogen bonding the their complementary bases arrayed on a second strand. The condition of being in a duplex form reflects on the state of the bases of a nucleic acid. By virtue of base pairing, the strands of nucleic acid also generally assume the tertiary structure of a double helix, having a major and a minor groove. The assumption of the helical form is implicit in the act of becoming duplexed.


The term “template” refers to a strand of nucleic acid on which a complementary copy is built from nucleoside triphosphates through the activity of a template-dependent nucleic acid polymerase. Within a duplex the template strand is, by convention, depicted and described as the “bottom” strand. Similarly, the non-template strand is often depicted and described as the “top” strand.


The term “template-dependent RNA polymerase” refers to a nucleic acid polymerase that creates new RNA strands through the copying of a template strand as described above and which does not synthesize RNA in the absence of a template. This is in contrast to the activity of the template-independent nucleic acid polymerases that synthesize or extend nucleic acids without reference to a template, such as terminal deoxynucleotidyl transferase, or Poly A polymerase.


The term “in silico” when used in relation to a process indicates that the process is simulated on or embedded in a computer.


The term “priming region” refers to a region on a target nucleic acid sequence to which a primer hybridizes for the purpose of extension of the complementary strand of the target nucleic acid sequence.


The term “non-templated T residue” as used herein refers to a thymidine (T) residue added to the 5′ end of a primer which does not necessarily hybridize to the target nucleic acid being amplified.


The term “genotype” as used herein refers to at least a portion of the genetic makeup of an individual. A portion of a genome can be sufficient for assignment of a genotype to an individual provided that the portion of the genome contains a representative sequence or base composition to distinguish the genotype from other genotypes.


The term “nucleobase” as used herein is synonymous with other terms in use in the art including “nucleotide,” “deoxynucleotide,” “nucleotide residue,” “deoxynucleotide residue,” “nucleotide triphosphate (NTP),” or deoxynucleotide triphosphate (dNTP).


As defined herein, “base composition” refers to the numbers of each of the four standard nucleobases that are present within a given standard sequence or corresponding amplification product of a standard, test or variant sequence. Methods including steps of measuring base compositions are disclosed and claimed in commonly owned published U.S. Patent Application Nos: 20030124556, 20030082539, 20040209260, 20040219517, and 20040180328 and U.S. Ser. Nos. 10/728,486, 10/829,826, 10/660,998, 10/853,660, 60/604,329, 60/632,862, 60/639,068, 60/648,188, 11/060,135, 11/073,362, and 60/658,248, each of which is incorporated herein by reference in entirety.


As used herein, the term “base composition analysis” refers to determination of the base composition of an amplification product representing a sub-segment of a target nucleic acid sequence from the molecular mass of the amplification product determined by mass spectrometry. In embodiments of the present invention, base composition analysis may include determination of base compositions of two or more amplification products representing overlapping sub-segments of a nucleic acid sequence which are to be compared with the defined base compositions of the corresponding overlapping sub-segments of one or more reference nucleic acids


As used herein, the term “reference nucleic acid” or “reference nucleic acid segment” is a characterized nucleic acid of known sequence and/or known base composition. A reference nucleic acid segment is compared with uncharacterized sequences in various embodiments of the present invention. For example, a characterized vector or portion thereof can be used as a reference nucleic acid segment. A characterized portion of human nucleic acid may also be used as a reference nucleic acid provided the genotype, identity or race of the human from which the reference nucleic acid is obtained is known. A genome or a portion thereof of a bacterium, virus or fungus may also be employed as a reference nucleic acid provided that the species or genotype of the bacterium, virus or fungus is known.


As used herein, the term “reference base composition” refers to a characterized base composition. For example, a sub-segment of a reference nucleic acid having the defined sequence AAAAATTTTCCCGG (SEQ ID NO: 52) has a standard base composition of A5 T4 C3 G2.


As used herein, the term “test nucleic acid sequence” refers to an uncharacterized nucleic acid sequence whose base composition is to be characterized and compared with one or more standard nucleic acid segments.


As used herein, term “overlap” or “overlapping sub-segments” refers to sub-segments of a standard nucleic acid segment which have overlap as illustrated by the following example which employs a standard nucleic acid segment of length of 300 nucleobases. A first sub-segment may, for example, extend from position 1 to position 100. A second sub-segment may, for example, extend from position 60 to position 160, having overlap from position 60 to position 100. A third sub-segment may, for example, extend from position 120 to position 220, having overlap from position 120 to position 160. A fourth sub-segment may, for example, extend from position 180 to position 280, having overlap from position 180 to position 220. Producing sub-segments with overlap is useful because it provides redundancy and reduces the likelihood that sub-segments containing variants relative to a given standard sub-segment will be mischaracterized. If a primer used to amplify a given sub-segment hybridizes to a position with a mutation relative to the reference sequence, the amplification product will not contain the mutation because the primer extension product is used as a subsequent template in subsequent amplification cycles. Thus, having overlap of two sub-segments wherein overlap of the second sub-segment over the first sub-segment extends past the reverse primer hybridization site of the first sub-segment eliminates the possibility that the reverse primer for the first sub-segment will mask a given mutation within the first sub-segment reverse primer hybridization site. The extent of minimal overlap should be determined by the length of the primer hybridization site of a given sub-segment. Generally, overlap of sub-segments by several nucleobases is appropriate but shorter overlap lengths may also be appropriate provided the primer hybridization sites are shorter nucleobases. The avoidance of overlap of primer hybridization sites on overlapping sub-segments is preferred.


As used herein, the term “co-amplification” or “co-amplified” refers to the process of obtaining more than one amplification product in the same amplification reaction mixture using the same pair of primers.


As used herein, the term “vector” refers to a nucleic acid adapted for transfection into a host cell. Examples of vectors include, but are not limited to, plasmids, cosmids, bacteriophages and the like.


As used herein, the term “therapeutic protein” refers to any protein product produced by biotechnological methods for use as a therapeutic product. Examples of therapeutic proteins include, but are not limited to protein products such as vaccines, antibodies, structural proteins, hormones, and cell signaling proteins such as receptors, cytokines and the like.


As used herein, the term “recombinant” refers to having been created by genetic engineering. For example, a “recombinant insert” refers to a nucleic acid segment inserted into another nucleic acid sequence using techniques well known to those with ordinary skill in the arts of genetic engineering and molecular biology.


A “nucleic acid variant” is herein defined as a nucleic acid having substantial similarity or sequence identity with a “standard” nucleic acid sequence. For example, between about 70% up to but not including 100% sequence identity.


As used herein, a “triplex combination of primer pairs” refers to three primer pairs which is to be included in an amplification mixture for the purpose of obtaining three distinct amplification products from a given target nucleic acid.


DESCRIPTION OF EMBODIMENTS

Provided herein are compositions and methods for determining the presence of a nucleic acid variant or a genotype relative to a known and defined “reference” nucleic acid sequence. Identification of a distinct genotype in certain embodiments is satisfied by identification of a distinct base composition of a given sub-segment of a target nucleic acid.


In the methods described herein where the genotype, and in turn the identity, of a nucleic acid sample is determined, the nucleic acid is measured to deliver a base composition profile. That measured base composition profile is then compared to a reference base composition profile that is further associated with an identity. The reference base composition can be a head-to-head comparison or a standard reference database. In both the head-to-head comparison and the standard reference database comparison, the unknown sample is analyzed using the disclosed compositions and methods to generate a measured base composition profile. For the head-to-head comparison, the reference base composition profile is generated by similarly analyzing samples from a selected suspect population using the disclosed compositions and methods. The measured base composition is then compared to the reference base compositions and if a match occurs between the unknown and a suspect, then the identity is determined. In the standard reference database comparison the measured base composition is compared to a pre-existing database of reference base compositions. This database can be populated using standard reference nucleic acids, previously measured base composition and converted data to generate base compositions. For example, but not limitation, a standard reference nucleic acid can include commercially available vectors like pUC, the certified values for CODIS 13 loci (SRM 2391b available from the National Institute of Standards and Technology) and the Anderson mitochondrial DNA sequence. Converted data can include, but is not limited to, previously obtained sequence data, such as the reference data that is stored in the SWGDAM database that is bioinformatically converted to base composition data.


Also provided herein are compositions and methods for identifying a human by comparison of base compositions of amplification products representing overlapping sub-segments of a target nucleic acid with base compositions of reference sub-segments of one or more reference nucleic acids.


Amplification products of portions of the target nucleic acid which correspond to the sub-segments are produced and their molecular masses are measured by mass spectrometry. Base compositions of the amplification products are calculated from their molecular masses and the base compositions are compared with the base compositions of the corresponding sub-segments of the reference nucleic acid. A given target region can have any length depending upon the type of analysis to be conducted and in recognition of the numbers of primer pairs required to obtain amplification products representing overlapping sub-segments of the target, If a bacterium with a large genome is to be analyzed, and the target is the entire genome, a target nucleic acid may have a length of several kilobases. Alternatively, a target region may be of a length of about 300 to about 1000 nucleobases in length.


In some embodiments, the nucleic acid variant has a sequence identical to the standard sequence with the exception of having one or more single nucleotide polymorphisms, insertions or deletions.


In some embodiments, the reference nucleic acid and variant nucleic acid is either single stranded or double stranded DNA or RNA. In some embodiments, the standard and variant nucleic acid originates from the genome of a bacterium or a virus or is a synthesized nucleic acid such as a PCR product, for example.


A set of sub-segments within the reference nucleic acid sequence is defined. In some embodiments, the members of the set of standard sub-segments are from about 45 to about 150 nucleobases in length. One will recognize that this includes standard sub-segments of lengths of 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, or 150 nucleobases in length.


In some embodiments, the molecular masses of the test amplification products are determined by mass spectrometry such as electrospray Fourier transform ion cyclotron resonance (FTICR) mass spectrometry or electrospray time-of-flight mass spectrometry. The use of electrospray mass spectrometry permits the measurement of large amplification products, as large as 500 nucleobases in length, whereas amplification products analyzed by matrix-assisted laser desorption ionization mass spectrometry are typically much smaller in length (approximately 15 nucleobases in length).


If desired, the length of the standard segments can be chosen such that some members of the set have calculated molecular masses that are dissimilar from other members of the set. Having standard segments of dissimilar molecular masses allows for multiplexing or pooling of amplification products corresponding to the standard segments prior to molecular mass determination, by mass spectrometry for example. As is illustrated in FIGS. 2 and 3, the resultant amplification products from a reaction using the at least two primer pairs are sufficiently separated along the charge axis of the mass spectrometry plot. This separation is preferred, but not necessary, because the individually measured amplicon strands can be easily visualized.


In some embodiments, the compositions and methods are used for genotyping of a suspected variant of a known species of bacterium or virus. The base compositions of the test amplification products, if different from the base composition of the standard segments, provide the means for identification of a previously known variant, or for characterization of a previously unobserved variant.


In some embodiments, the compositions and methods are used for identification and characterization of genetically engineered bacteria or viruses. Genetically engineered organisms are produced by insertion or deletion of genes. These modifications are readily detectable by the methods of the present invention.


In some embodiments, the compositions and methods can be used for validation of reference nucleic acid sequences such as those encoding therapeutic proteins including but not limited to vaccines and biological drugs such as monoclonal antibodies for example. A nucleic acid is “validated” by base composition analysis according to the method of the present invention, wherein the result indicates that the analyzed nucleic acid and/or sub-segments thereof have the same base compositions as the reference nucleic acid. The process of “validation” confirms that polymorphisms have not been introduced into the target sequence relative to the reference sequence.


In some embodiments, a known quantity of the standard sequence is included in the sample (as an internal calibration standard) containing the suspected variant and the quantity of the variant is determined from the abundance data obtained from mass spectrometry for example. Methods of using internal calibration standards in base composition analyses are described in commonly owned U.S. application Ser. No. 11/059,776 which is incorporated herein by reference in entirety.


In some embodiments, the compositions and methods are used for characterization of heterogeneity of a standard nucleic acid test sample. For example, the standard nucleic acid test sample can be a vaccine vector having a standard sequence. The present invention can be used to identify a variant of said standard sequence and also determine the quantity of the variant relative to the standard sequence. Such an analysis is advantageous, for example, in situations requiring rapid throughput analysis for quality control. The methods described herein will be able to determine if the quantity of a variant sub-population increases to the point wherein quality of the product is compromised.


In some embodiments, the compositions and methods are used for identification of a genotype of a given organism. This can be accomplished by first, selecting a series of primer pairs for amplification of consecutive or overlapping segments of a standard nucleic acid region found across known genotypes of a given organism. The process continues by amplifying a test nucleic acid of an organism of unknown genotype with the series of primer pairs to obtain a corresponding series of amplification products, at least some of which are then measured by mass spectrometry. Base compositions of the amplification products are then calculated from the molecular masses. These base compositions are compared with measured or calculated amplification product base compositions representing amplification products of known genotypes of a given organism obtained with the same series of primers. One or more matches of known and unknown base compositions provide the genotype of the organism.


Preferably, at least some or all of the amplification products have a range of lengths between about 45 to about 150 nucleobases. However, and depending on the mass spectrometer instrument used, the amplification products analyzed by mass spectrometry can be as large as about 500 nucleobases. Moreover, very large amplification products can be digested into smaller fragments that are compatible with the mass spectrometer used. Methods of base composition analysis are described in commonly owned U.S. patent application Ser. Nos. 10/660,998, 10/853,660, and 11/209,439, each of which are incorporated herein by reference in entirety.


In some embodiments, the amplification is effected using the polymerase chain reaction (PCR). In some embodiments, the PCR reaction is performed with an extension cycle having a length of one second. The one second extension cycle is shorter than an ordinary extension cycle and is employed for the purpose of minimization of artifact amplification products arising from target site crossover.


In some embodiments, the organism of unknown genotype is a human individual. In some embodiments, obtaining a genotypic result for a human individual provides the means to draw a forensic conclusion with regard to the individual, for example, to conclude with a very high probability that the individual has had contact with another individual or was present at a particular location.


In some embodiments with applications in human forensics, a given forensic nucleic acid sample may be characterized by base composition analysis that includes comparison with members of a database of tens, hundreds or even thousands of reference nucleic acid segments obtained from individuals of known identity or racial profile, or with standard references like the Anderson mitochondrial DNA sequence. Such a database can be stored on or embedded in a computer-readable medium and accessed over a network such as the internet for example. Preferably the database comprises base compositions of individual sub-segments of the reference nucleic acids.


In some embodiments, the nucleic acid being amplified for a genotyping analysis is mitochondrial DNA. In other embodiments, the nucleic acid is chromosomal DNA.


In some embodiments, the mitochondrial DNA being amplified for a genotyping analysis is from one or both of the highly variable regions HV1 or HV2.


In some embodiments, the length of the DNA region being analyzed is 300 to 700 nucleobases in length. In other embodiments, the length of the DNA region being analyzed in 400 to 600 nucleobases in length or any length therewithin.


In some embodiments, the amplifying step of the method is carried out in the presence of a dNTP containing a molecular mass-modifying tag. In some embodiments, only one of the four canonical dNTPs has the molecular mass-modifying tag. In some embodiments, the dNTP containing the molecular mass-modifying tag is 2′-deoxy-guanosine-5′-triphosphase, which has the greatest mass of the four canonical dNTPs. In other embodiments, any of the other three canonical dNTPs can contain the molecular mass-modifying tag. In some embodiments, the tag comprises a minor isotope of carbon or nitrogen. In some embodiments, the isotope of the molecular mass-modifying tag is 13C or 15N. The advantage to employing the latter mass-modifying tags is that the dNTP structure is not altered and thus, efficiency of the amplification process should be retained.


In some embodiments, the 3′ end residue of each primer hybridizes to a conserved nucleic acid residue of the target nucleic acid wherein the conserved nucleic acid residue is conserved among different genotypes. In other embodiments, the final two 3′ end residues of each primer hybridizes to a conserved nucleic acid residue of the target nucleic acid wherein the conserved nucleic acid residue is conserved among different genotypes. In other embodiments, the final three 3′ end residues of each primer hybridizes to a conserved nucleic acid residue of the target nucleic acid wherein the conserved nucleic acid residue is conserved among different genotypes.


In some embodiments, multiplexing amplification reactions are carried out with at least two primer pairs. In other embodiments, multiplexing reactions are carried out with three primer pairs, also known as triplex combinations.


In some embodiments, the compositions and methods are used for characterization of length or base composition heteroplasmy in mitochondrial DNA and also for determination of the quantity of a given heteroplasmic variant relative to a “standard” mitochondrial DNA region. In some embodiments, characterization of length heteroplasmy is used to diagnose and/or evaluate the progression of a mitochondrial DNA-related genetic disease such as one or more of the following mitochondrial diseases: Alpers Disease, Barth syndrome, Beta-oxidation Defects, Carnitine-Acyl-Carnitine Deficiency, Carnitine Deficiency, Co-Enzyme QIO Deficiency, Complex I Deficiency, Complex II Deficiency, Complex III Deficiency, Complex IV Deficiency, Complex V Deficiency, COX Deficiency, CPEO, CPT I Deficiency, CPT II Deficiency, Glutaric Aciduria Type II, KSS, Lactic Acidosis, LCAD, LCHAD, Leigh Disease or Syndrome, LHON, Lethal Infantile Cardiomyopathy, Luft Disease, MAD, MCA, MELAS, MERRF, Mitochondrial Cytopathy, Mitochondrial DNA Depletion, Mitochondrial Encephalopathy, Mitochondrial Myopathy, MNGIE, NARP, Pearson Syndrome, Pyruvate Carboxylase Deficiency, Pyruvate Dehydrogenase Deficiency, Respiratory Chain, SCAD, SCHAD, or VLCAD.


Determination of sequence identity is described in the following example: a nucleic acid 20 nucleobases in length which is otherwise identical to another 20 nucleobase nucleic acid but having two non-identical residues has 18 of 20 identical residues has 18/20=0.9 or 90% sequence identity. In another example, a nucleic acid 15 nucleobases in length having all residues identical to a 15 nucleobase segment of a nucleic acid 20 nucleobases in length would have 15/20=0.75 or 75% sequence identity with the 20 nucleobase nucleic acid. In another example, a nucleic acid 17 nucleobases in length having all residues identical to a 15 nucleobase segment of a nucleic acid 20 nucleobases in length would have 15/17=0.882 or 88.2% sequence identity. In some embodiments, a nucleic acid variant has between about 70% and 99% sequence identity with a standard nucleic acid sequence. In other embodiments, the nucleic acid variant has between about 75% to about 99% sequence identity. In other embodiments, the nucleic acid has between about 80% to about 99% sequence identity. In other embodiments, the nucleic acid has between about 85% to about 99% sequence identity. In other embodiments, the nucleic acid has between about 90% to about 99% sequence identity. In other embodiments, the nucleic acid has between about 95% to about 99% sequence identity. One will recognize that these embodiments provide for nucleic acid variants having sequence identity with a standard nucleic acid sequence ranging from about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 98%, to about 99%, as well as fractions thereof.


EXAMPLES
Example 1
Selection of Primers for Analysis of Mitochondrial DNA

An alignment of 5615 mitochondrial DNA sequences was constructed and analyzed for regions of conservation which are useful as primer binding sites for tiling coverage of the mitochondrial DNA regions HV1 and HV2. A total of 24 primer binding sites were chosen according to the criterion that the 5′-end of the primer binding sites remain conserved across the alignment of mitochondrial DNA sequences. In some cases, only the 5′-terminal nucleobase itself is conserved. In other cases, as many as two or three consecutive nucleobases at the 5′ end of the primer binding sites are conserved.


In cases where primer coverage at a particular region is desired but complete conservation is absent, backup primer pairs can be chosen to ensure that target sequences will be amplified. For example, the 5′ end of the primer binding site for the forward primer of primer pair number 2893 is 99.7% conserved among the 5615 mitochondrial DNA sequences of the alignment, a backup primer pair was designed. Primer pair number 2894 has a G residue instead of an A residue because A is 0.3% conserved at the 5′ end of the primer binding site.


Table 1 shows the panel of 25 primer pairs designed to tile the informative HV1 (coordinates 15924 . . . 16428) and HV2 (coordinates 31-576) mitochondrial DNA regions for complete and partially redundant coverage with partially overlapping amplification products according to the general scheme shown in FIG. 1. The extent of overlap may vary but generally overlapping regions relative to two amplification products should range from about ten nucleobases to about 50 nucleobases of overlap. The sizes of amplification products produced with the primer pairs of Table 1 range in length from 85 to 140 nucleobase pairs. With the exception of three amplification products, all are less than 130 nucleobase pairs. The coordinates of the primer binding sites are given in the forward and reverse primer names with reference to the standard Anderson mitochondrial DNA sequence (SEQ ID NO: 51). For example, the forward primer of primer pair number 2889 (SEQ ID NO: 1) hybridizes to coordinates 16357-16376 of the standard Anderson mitochondrial DNA sequence (SEQ ID NO: 51). The primer pair name designation “HUMMTDNA” refers to human mitochondrial DNA. Primer pair numbers 2901 and 2925 are designed to produce an amplification product corresponding to the same sub-segment defined by Anderson mitochondrial DNA coordinates 15924 . . . 15985 (see Table 2). This extent of redundancy is sometimes beneficial in cases where high variability occurs at chosen primer binding sites such that a given primer of a primer pair does not effectively hybridize to the mitochondrial DNA of certain individuals. For this reason, 25 primer pairs are used to obtain amplification products of 24 sub-segments.









TABLE 1







Primer Pairs Used for Amplifying HV1 and


HV2 Regions of Mitochondrial DNA













Primer
Forward

Forward
Reverse

Reverse


pair
primer
Forward
SEQ ID
primer
Reverse
SEQ ID


number
name
sequence
NO:
name
sequence
NO:
















2889
HUMMTDNA_
TCTCGTCCCC
1
HUMMTDNA_
TCGAGGAGAGT
26



ASN_16357_
ATGGATGACC

ASN_16429_
AGCACTCTTGT




16376_F


16451_R
G






2890
HUMMTDNA_
TGCCATTTAC
2
HUMMTDNA_
TGGTCAAGGGA
27



ASN_16318_
CGTACATAGC

ASN_16382_
CCCCTATCTG




16341_F
ACAT

16402_R







2891
HUMMTDNA_
TCACCCCTCA
3
HUMMTDNA_
TGGGACGAGAA
28



ASN_16256_
CCCACTAGGA

ASN_16345_
GGGATTTGACT




16282_F
TACCAAC

16366_R







2892
HUMMTDNA_
TCACACATCA
4
HUMMTDNA_
TGCTATGTACG
29



ASN_16231_
ACTGCAACTC

ASN_16306_
GTAAATGGCTT




16253_F
CAA

16338_R
TATGTACTATG






2893
HUMMTDNA_
TAGTACATAA
5
HUMMTDNA_
TGGTGAGGGGT
30



ASN_16154_
AAACCCAATC

ASN_16251_
GGCTTTG




16181_F
CACATCAA

16268_R







2894
HUMMTDNA_
TAGTACATAA
6
HUMMTDNA_
TGGTGAGGGGT
31



ASN_16154_
AAACCCAATC

ASN_16251_
GGCTTTG




16181_2_F
CACATCAG

16268_R







2895
HUMMTDNA_
TTTCCATAAA
7
HUMMTDNA_
TGGGTTGATTG
32



ASN_16130_
TACTTGACCA

ASN_16202_
CTGTACTTGCT




16156_F
CCTGTAG

16224_R
T






2896
HUMMTDNA_
TACTGCCAGC
8
HUMMTDNA_
TGGGTTGATTG
33



ASN_16102_
CACCATGAAT

ASN_16202_
CTGTACTTGCT




16123_F
AT

16224_R
T






2897
HUMMTDNA_
TCCAAGTATT
9
HUMMTDNA_
TACAGGTGGTC
34



ASN_16055_
GACTCACCCA

ASN_16130_
AAGTATTTATG




16077_F
TCA

16155_R
GTAC






2898
HUMMTDNA_
TCTTTCATGG
10
HUMMTDNA_
TCATGGTGGCT
35



ASN_16025_
GGAAGCAGAT

ASN_16099_
GGCAGTAATG




16047_F
TTG

16119_R







2899
HUMMTDNA_
TGCACCCAAA
11
HUMMTDNA_
TGGTGAGTCAA
36



ASN_15985_
GCTAAGATTC

ASN_16052_
TACTTGGGTGG




16014_F
TAATTTAAAC

16073_R







2901
HUMMTDNA_
TGGGGTATAA
12
HUMMTDNA_
TTAAATTAGAA
37



ASN_15893_
ACTAATACAC

ASN_15986_
TCTTAGCTTTG




15923_F
CAGTCTTGTA

16012_R
GGTGC





A









2902
HUMMTDNA_
TCAGGTCTAT
13
HUMMTDNA_
TGTCTCGCAAT
38



ASN_5_30_
CACCCTATTA

ASN_77_97_R
GCTATCGCGT




F
ACCACT









2903
HUMMTDNA_
TATTAACCAC
14
HUMMTDNA_
TTTCAAAGACA
39



ASN_20_
TCACGGGAGC

ASN_115_
GATACTGCGAC




40_F
T

139_R
ATA






2904
HUMMTDNA_
TAGCATTGCG
15
HUMMTDNA_
TGCCTGTAATA
40



ASN_83_
AGACGCTGGA

ASN_163_
TTGAACGTAGG




102_F


187_R
TGC






2905
HUMMTDNA_
TCTATGTCGC
16
HUMMTDNA_
TGGGTTATTAT
41



ASN_113_
AGTATCTGTC

ASN_218_
TATGTCCTACA




137_F
TTTGA

245_R
AGCATT






2906
HUMMTDNA_
TCCTTTATCG
17
HUMMTDNA_
TGGTTGTTATG
42



ASN_154_
CACCTACGTT

ASN_268_
ATGTCTGTGTG




177_F
CAAT

290_R
G






2907
HUMMTDNA_
TAACAATTGA
18
HUMMTDNA_
TGTTTTTGGGG
43



ASN_239_
ATGTCTGCAC

ASN_341_
TTTGGCAGAGA




262_F
AGCC

363_R
T






2908
HUMMTDNA_
TGTGTTAATT
19
HUMMTDNA_
TCTGTGGCCAG
44



ASM_204_
AATTAATGCT

ASN_314_
AAGCGG




233_F
TGTAGGACAT

330_R







2910
HUMMTDNA_
TCTTAAACAC
20
HUMMTDNA_
TAAAAGTGCAT
45



ASN_331_
ATCTCTGCCA

ASN_402_
ACCGCCAAAAG




354_F
AACC

425_R
AT






2912
HUMMTDNA_
TGCGGTATGC
21
HUMMTDNA_
TGTGTGTGCTG
46



ASN_409_
ACTTTTAACA

ASN_502_
GGTAGGATG




430_F
GT

521_R







2913
HUMMTDNA_
TCTCCCATAC
22
HUMMTDNA_
TGCTTTGAGGA
47



ASN_464_
TACTAATCTC

ASN_577_
GGTAAGCTACA




492_F
ATCAATACA

603_R
TAAAC






2916
HUMMTDNA_
TACCCTAACA
23
HUMMTDNA_
TGGAGGGGAAA
48



ASN_367_
CCAGCCTAAC

ASN_438_
ATAATGTGTTA




388_F
CA

463_R
GTTG






2923
HUMMTDNA_
TGCTTTCCAC
24
HUMMTDNA_
TCTGGTTAGGC
49



ASN_262_
ACAGACATCA

ASN_368_
TGGTGTTAGGG




288_F
TAACAAA

390_R
T






2925
HUMMTDNA_
TCCTTTTTCC
25
HUMMTDNA_
TGCTTCCCCAT
50



ASN_15937_
AAGGACAAAT

ASN_16018_
GAAAGAACAGA




15962_F
CAGAGA

16041_R
GA
















TABLE 2







Amplification Coordinates of Mitochondrial DNA for the


Primer Pairs of Table 1









Primer pair
Amplification



number
Coordinates
mtDNA Region





2889
16377 . . . 16428
HV1


2890
16342 . . . 16381
HV1


2891
16283 . . . 16344
HV1


2892
16254 . . . 16305
HV1


2893
16182 . . . 16250
HV1


2894
16182 . . . 16250
HV1


2895
16157 . . . 16201
HV1


2896
16124 . . . 16201
HV1


2897
16078 . . . 16129
HV1


2898
16048 . . . 16098
HV1


2899
16015 . . . 16051
HV1


2901
15924 . . . 15985
HV1


2902
31 . . . 76
HV2


2903
 41 . . . 114
HV2


2904
103 . . . 162
HV2


2905
138 . . . 217
HV2


2906
178 . . . 267
HV2


2907
263 . . . 340
HV2


2908
234 . . . 314
HV2


2910
355 . . . 402
HV2


2912
431 . . . 501
HV2


2913
493 . . . 576
HV2


2916
389 . . . 437
HV2


2923
289 . . . 371
HV2


2925
15924 . . . 15985
HV1









Example 2
Validation of Triplex Tiling Mitochondrial DNA Assay

The 25 primer pairs of Table 1 were divided into triplex combinations of three primer pairs such that the amplification products of three primer pairs within a triplex combination have sense and antisense strands which are significantly different in molecular mass from the other sense and antisense strands of other amplification products within the triplex combinations. The triplex combinations are shown in Table 3 with reference to primer pair combinations.









TABLE 3







Triplex Combinations of Primer Pairs for Simultaneous Analysis of


Mitochondrial DNA Regions












Triplex






Combination
Primer Pair
Primer Pair
Primer Pair



No.
Number
Number
Number







1
2892
2901
2906



2
2891
2908
2925



3
2890
2899
2907



4
2898
2889
2923



5
2902
2910
2893/2894



6
2916
2897
2893



7
2904
2896
2913



8
2895
2912
2905










PCR cycle conditions used for obtaining amplification products for this assay are as follows: 10 minutes at 96° C. followed by six cycles of steps (a) to (c) wherein: (a) is 20 seconds at 96° C., (b) is 1.5 minutes at 55° C., and (c) is 1 second at 72° C., followed by 36 cycles of steps (d) to (f) wherein (d) is 20 seconds at 96° C., (b) is 1.5 minutes at 50° C., and (c) is 1 second at 72° C., followed by a retention at 4° C. All PCR reactions were carried out with an Eppendorf thermal cycler with 40 μl reaction volumes in a 96-well microtiter plate format. Liquid manipulations were performed using a Packard MPII liquid handling robotic platform. The PCR reaction mixture consisted of 4 units of Amplitaq Gold, 1× buffer II (Applied Biosystems, Foster City, Calif.), 1.5 mM MgCl2, 800 μM dNTP mixture and 250 nM of each primer. The dNTP mixture contained carbon-13 enriched deoxyguanosine triphosphate, a chemically invisible molecular mass-modifying tag which adds 10 Da to each G residue incorporated into a given amplification product so that the numbers of possible base compositions consistent with a measured molecular mass is reduced and the probability of assignment of an incorrect base composition to a given amplification product is greatly decreased.


Eleven saliva samples were obtained from in-house laboratory personnel and subjected to PCR reactions as described above with the 8 triplex primer pair sets shown in Table 3. The PCR amplification products were purified according to the primary amine-terminated magnetic bead separation method; a technique that is well known in the art and that is described in US patent publication 20050130196 which is incorporated herein by reference in entirety. All amplification products were analyzed using a Bruker Daltonics MicroTOF™ mass spectrometer. Ions from the ESI source undergo orthogonal ion extraction and are focused in a reflectron prior to detection. The TOF and FTICR are equipped with the same automated sample handling and fluidics described above. Ions are formed in the standard MicroTOF™ ESI source that is equipped with the same off-axis sprayer and glass capillary as the FTICR ESI source. Consequently, source conditions were the same as those described above. External ion accumulation was also employed to improve ionization duty cycle during data acquisition. Each detection event on the TOF was comprised of 75,000 data points digitized over 75 μs.


Mass spectra of the amplification products were analyzed independently using a maximum-likelihood processor, such as is widely used in radar signal processing. This processor, referred to as GenX, first makes maximum likelihood estimates of the input to the mass spectrometer for each primer by running matched filters for each base composition aggregate on the input data. This processor is described in U.S. Patent Application Publication No. 20040209260 which is incorporated herein by reference in entirety.


All duplicate reactions were analyzed independently and duplicate results were identical in all cases. An example of a mass spectrum of triplex primer combination 1 (primer pair nos. 2892, 2901 and 2906) is shown in FIG. 2 wherein each of the peaks labeled A-F represent a single strand of DNA of an amplification product. The strands are clearly separated which facilitates efficient analysis of the molecular masses.


The applicability of the present invention for resolution of mitochondrial DNA heteroplasmy is indicated in FIG. 3. Strands C′, D′, C″ and D″ represent two amplification products having length heteroplasmy of the amplification product of strands C and D. Each of the strands of the heteroplasmic variants is visible in the mass spectrum because they vary in molecular mass.


Example 3
Rapid Typing of Human Mitochondrial DNA

Mitochondrial DNA (mtDNA) analysis of forensic samples is performed when the quantity and/or quality of DNA are insufficient for nuclear DNA analysis, or when DNA analysis through a maternal lineage is otherwise desired. Forensic mtDNA analysis is performed by sequencing portions of the mtDNA genome, which is a lengthy and labor intensive technique. We present a mass spectrometry-based multiplexed PCR assay suitable for automated analysis of mtDNA control region segments. The assay has been internally validated with 20 DNA samples with known sequence profiles and 50 blinded samples contributed by external collaborators. Correct profiles were obtained in all cases when compared to sequencing data. Two samples containing mixed templates were observed and the relative contribution of each template was quantified directly from the mass spectra of PCR products.


The primer pairs of Table 1 were designed to amplify 1051 bases of human mitochondrial DNA in the hypervariable regions HV1 and HV2. The primer pairs were combined in multiplex reactions in groups which were chosen such that the target segments of the three primer pairs being combined were maximally separated and such that each of the three amplification product masses in a triplex mixture were resolvable from each other by mass spectrometry. The triplex groups are shown in Table 3. The lengths of the amplification products were 85 to 140 base pairs. All except for three amplification products were less than 130 base pairs in length. The relative primer pair concentrations in the triplex mixtures were adjusted in order to favor simultaneous amplification of all three target segments.


Mass spectra were measured by electrospray time-of-flight (TOF) mass spectrometry.


A standard reference human mitochondrial DNA database was used to obtain the base composition profiles corresponding to the series of amplification products produced by the overlapping primer pairs. As described above, the database was populated with base composition data from the Anderson reference mitochondrial DNA, from base composition measurements earlier obtained, and by conversions from databases of earlier obtained sequencing data. These base composition profiles represent the “truth data.”


Fifty blinded test samples, including 25 blood samples and 25 cheek swab samples were tested and compared to the pre-existing truth data. Mitochondrial DNA was purified from the samples by the Qiagen blood punch protocol or by the Qiagen buccal swab protocol and quantified using the Quantifiler qPCR kit prior to analysis. Two or more independent assays were performed with the overlapping primers of Table 1 using between 100 and 500 pg of mitochondrial DNA in each reaction.


The purified mitochondrial DNA was subjected to triplex PCR amplification with the eight triplex primer groups of Table 3 according to the procedure indicated in Example 2. Amplified mixtures were purified by solution capture of nucleic acids with ion exchange resin linked to magnetic beads as follows: 25 μl of a 2.5 mg/mL suspension of BioClone amine terminated superparamagnetic beads were added to 25 to 50 μl of a PCR (or RT-PCR) reaction containing approximately 10 pM of a typical PCR amplification product. The above suspension was mixed for approximately 5 minutes by vortexing or pipetting, after which the liquid was removed after using a magnetic separator. The beads containing bound PCR amplification product were then washed three times with 50 mM ammonium bicarbonate/50% MeOH or 100 mM ammonium bicarbonate/50% MeOH, followed by three more washes with 50% MeOH. The bound PCR amplicon was eluted with a solution of 25 mM piperidine, 25 mM imidazole, 35% MeOH which included peptide calibration standards.


Each mass spectrum obtained by ESI-TOF mass spectrometry was independently calibrated by internal peptide calibrants and noise-reduced prior to calculation of base composition. Base compositions were obtained from molecular masses and compared to a database developed from over 110,000 mitochondrial DNA sequences. The base composition of each amplification product was associated with mitochondrial DNA coordinates as shown, for example in Table 4 which provides the base compositions for sample AF-12 from the set of 50 blinded samples.









TABLE 4







Mitochondrial DNA Base Composition Profile for Sample AF-12








Anderson/Cambridge



Sequence Coordinates


(SEQ ID NO: 51)
Base Composition





15893 . . . 16012
A47 G18 C25 T30


15937 . . . 16041
A35 G14 C24 T32


15985 . . . 16073
A26 G15 C21 T27


16025 . . . 16119
A26 G17 C26 T26


16055 . . . 16155
A31 G13 C30 T27


16102 . . . 16224
A45 G13 C42 T23


16130 . . . 16224
A36 G7 C33 T19


16154 . . . 16268
A44 G7 C46 T18


16231 . . . 16338
A40 G9 C40 T19


16256 . . . 16366
A37 G9 C41 T24


16318 . . . 16402
A20 G14 C30 T21


16357 . . . 16451
A21 G17 C36 T21


 5 . . . 97
A19 G24 C24 T26


 20 . . . 139
A24 G34 C29 T33


 83 . . . 187
A23 G21 C29 T32


113 . . . 245
A39 G18 C28 T48


154 . . . 290
A49 G17 C31 T40


204 . . . 330
A42 G16 C35 T32


204 . . . 330
A42 G16 C36 T32


204 . . . 330
A42 G16 C37 T32


239 . . . 363
A43 G11 C46 T23


239 . . . 363
A43 G11 C47 T23


239 . . . 363
A43 G11 C48 T23


239 . . . 363
A43 G11 C49 T23


262 . . . 390
A47 G10 C50 T20


262 . . . 390
A47 G10 C51 T20


262 . . . 390
A47 G10 C52 T20


262 . . . 390
A47 G10 C53 T20


331 . . . 425
A33 G9 C27 T26


367 . . . 463
A27 G8 C32 T30


409 . . . 521
A32 G7 C48 T26


464 . . . 603
A44 G10 C63 T23









Heteroplasmy was detected in several of the samples. For example, sample AF-4 has C⇄T heteroplasmy at position 16176. Two distinct amplification products having base compositions of A45 G13 C41 T24 and A45 G13 C40 T25 were obtained for this sample using primer pair number 2896 which amplifies positions 16102 . . . 16224. If conventional sequencing analyses were used to analyze the amplification reaction mixture, heteroplasmy would not have been detected. Table 5 indicates additional examples of heteroplasmy detected in various samples.









TABLE 5







Summary of Heteroplasmy Detection in Selected Samples










Blinded


Approximate % of


Sample
Region
Heteroplasmy
Minor Product













AF-2
16231 . . . 16338
C → T
32.4



16256 . . . 16366


AF-4
16102 . . . 16224
C → T
49.2



16130 . . . 16224


AF-7
16318 . . . 16402
T → C
10.2


AF-9
464 . . . 603
AC insertion
17.3


AF-19
15985 . . . 16073
A → G
44.9



16025 . . . 16119


AF-22
 6102 . . . 16224
C → A
36.2



16130 . . . 16224


AF-24
464 . . . 603
AC deletion
13.5


FBI-22
16055 . . . 16155
A → C
7.0


FBI-37
16231 . . . 16338
C → T
20.0



16256 . . . 16366


FBI-48
16055 . . . 16155
T → G
6.0


FBI-49
154 . . . 290
A → C
10.6


FBI-51
 5 . . . 97
C → T
43.0



 20 . . . 139


FBI-57
16357 . . . 16451
T → C
6.0


FBI-61
464 . . . 603
AC insertion
17.0


FBI-66
113 . . . 245
C → T
50.0



154 . . . 290


FBI-72
113 . . . 245
C → T
34.0



154 . . . 290









The results of the investigation of the 50 blinded samples indicated that 47 of 47 pure samples were directly concordant with the sequence data available. One negative (no mitochondrial DNA present) was confirmed as negative and two buccal swab samples were confirmed as mixtures of existing buccal swab samples. Deduction of contributors to mixtures was confirmed as accurate. Multiple examples of length heteroplasmy and single nucleotide polymorphism heteroplasmy were observed. These results indicate that the method is useful for rapid typing of human mitochondrial DNA.


Example 4
Demonstration of the Feasibility of Rapid Detection of a Genetic Engineering Event

To detect a genetic engineering event indicated by the presence of foreign DNA sequences inserted into a parent virus, a strategy of overlapping PCR primers to tile large sections of viral genomes is employed. Primer binding sites were chosen such that the PCR amplicon length (standard segments) will be approximately 150 nucleobases in length with overlapping segments defined by primer hybridization regions every 50-100 nucleobases across the entire target region (in a manner exemplified by FIG. 1).


Target regions are chosen according to expectation of identification of a genetic engineering event at a particular region. For example, if it is known that “region X” of a genome of a given virus is known to be a common insertion point for a gene encoding a toxin used as a biowarfare agent, it would be advantageous to simplify the base composition analysis by choosing only the genomic coordinates of region X as the target (a portion of the genome chosen as the target). The target region is then divided into sub-segments and primer pairs are chosen to obtain amplification products which represent the sub-segments for base composition analysis. On the other hand, if it is known that any point in an entire genome is appropriate for insertion of a gene, it would be advantageous to define the entire genome as the target in order to ensure that the insertion is detected. One with ordinary skill will recognize that defining an entire genome as a target will require design of many more primer pairs and significantly more analysis resources.


A database of molecular masses and base compositions for each standard segment for the standard target virus species will be used to assemble a base composition map of each sampled region from the mass spectrum derived from each amplification reaction. The identification of at least one amplification product whose base composition differs from the base composition of its corresponding standard segment in one or more overlapping tiled regions will indicate that a variant exists and the sample will be flagged for further analysis. SNP variants are readily recognized and can be directly analyzed by the methods described herein. As an example of the proposed method, 10 Kb nucleobase regions of orthopoxvirus species genetically engineered with a green-fluorescent protein (GFP) construct are inserted into analogous regions in five different orthopoxviruses which will serve as benign surrogates to represent a potentially deadly engineered virus.


In the following proof-of-concept example using the recombinant GFP-containing camelpoxvirus (CMPV-GFP), simulated processed mass spectrometry data was used to reconstruct a standard segment base composition map, associate it unambiguously to CMPV, and identify presence of a foreign insert in the virus by flagging an unexpected/unmatched hole in two of the amplified regions. Overlapping primer pairs were selected to span the CMPV-GFP sequence. A theoretical prediction of the expected standard amplification products using these primers was used to populate a database that serves as an expected mass set for all poxvirus species. Processed mass spectrometry data of the amplified regions of CMPV-GFP were simulated and matched against the database of 16 poxvirus sequences (which did not include the GFP-engineered sequence) to construct a base composition profile of each region. The base composition profile is generated using the full set of potential fragments from all database sequences, which helps increase profile coverage in the case of strain-to-strain SNP variations. If any SNP-generated fragments appear that do not occur in any database sequence, the base composition of the double-stranded fragment can be deduced directly from the masses. The final base composition profile for each region can then be compared to the compositions for all database sequences to confirm/refine the identity of the parent virus. The presence of an unmatched “hole” in the assembled profile that cannot be matched to the expected viral sequence indicates the potential presence of an engineered insert. This region may then be sequenced and compared to the full sequence database via BLAST. The ability to rapidly identify the presence of the insert, the location of the insertion, and the flanking regions of the viral genome where the unexpected genetic modification was done will serve as a powerful tool to flag potential bioengineering events. It further reduces the burden of sequencing to specific, targeted regions of the viral genome instead of the entire virus from every sample.


Example 5
Vector Validation and Characterization of Vector Heterogeneity

This example illustrates a scenario where the method of the present invention could be used to validate and/or characterize heterogeneity of standard nucleic acid sequences encoding biological products. The process of production of biological therapeutic proteins such as vaccines and monoclonal antibodies requires storage and manipulation of the nucleic acid sequences encoding the therapeutic proteins. Mutations may occasionally arise within a given nucleic acid sequence encoding the protein and compromise its therapeutic effect. It is desirable to have a method for rapid validation of such nucleic acid sequences and characterization of heterogeneity of the sequences, if present.


Vector X contains a nucleic acid sequence encoding vaccine Y which is used to vaccinate individuals against infection of virus Z. Vector X is used to transfect a suitable host for production of vaccine Y. Vaccine Y is suspected of being compromised by a mutation that has arisen in the nucleic acid sequence encoding vaccine Y and is being propagated via routine laboratory manipulations of vector X.


The method of the present invention is used to analyze the nucleic acid of vector X by base composition analysis of sub-segments of the vector which encode vaccine Y. The nucleic acid sequence encoding vaccine Y is 300 nucleobases in length. This sequence is divided into four sub-segments as follows: sub-segment 1 represents coordinates 1 . . . 100 of the nucleic acid sequence encoding vaccine Y; sub-segment 2 represents coordinates 61 . . . 160 of the nucleic acid sequence encoding vaccine Y; sub-segment 3 represents coordinates 141 . . . 240 of the nucleic acid sequence encoding vaccine Y; and sub-segment 4 represents coordinates 221 . . . 300 of the nucleic acid sequence encoding vaccine Y. The base compositions of each of the four sub-segments are known because the sequence of vaccine Y is known. Sub-segment 1 of the nucleic acid of vaccine Y has a base composition of A25T20C30 G25; sub-segment 2 of the nucleic acid of vaccine Y has a base composition of Al5T20 C35 G30; sub-segment 3 of the nucleic acid of vaccine Y has a base composition of A20T25 C30 G25; and sub-segment 4 of the nucleic acid of vaccine Y has a base composition of A25 T15 C15 G20. Primer pair 1 is used to obtain an amplification product of vector X wherein the amplification product corresponds to sub-segment 1. Primer pair 2 is used to obtain an amplification product of vector X wherein the amplification product corresponds to sub-segment 2. Primer pair 3 is used to obtain an amplification product of vector X wherein the amplification product corresponds to sub-segment 3. Primer pair 4 is used to obtain an amplification product of vector X wherein the amplification product corresponds to sub-segment 4. The amplification products corresponding to sub-segments 1-4 are analyzed by mass spectrometry to determine their molecular masses. The base compositions of one or more of the amplification products are calculated from the molecular masses and compared with the base compositions of the sub-segments of vaccine Y listed above.


In one example, production lot A-1 of vector X is analyzed according to the method described above. The results of the base composition calculations indicate that each of the experimentally determined base compositions of the amplification products match the base compositions of the four sub-segments. The conclusion of this exercise is that vector X and the nucleic acid encoding vaccine Y contained thereon, do not contain mutations and that the vaccine vector is validated, indicating that future vaccine production will not be affected.


In another example, production lot B-2 of vector X is analyzed according to the method described above. The results of the base composition calculations indicate that each of the experimentally determined base compositions of the amplification products match the base compositions of the four sub-segments. An additional amplification product is observed in the mass spectrum of the amplification reaction of primer pair 3. The additional amplification product which corresponds to sub-segment 3 has a base composition of A20 T25 C31 G24. This indicates that the additional amplification product has a G→C substitution relative to the standard base composition of sub-segment 3. The conclusion of this exercise is that vector X and the nucleic acid encoding vaccine Y are heterogeneous and that production of vaccine Y from production lot B-2 of vector X may be compromised. The mass spectrum indicating signals from two amplification products corresponding to sub-segment 3 may also be used to estimate the relative amounts of the two amplification products, thereby further characterizing the extent of heterogeneity of the nucleic acid sequence encoding vaccine Y. If the relative quantity of nucleic acid containing the mutation is low, it may be decided that heterogeneity is negligible. On the other hand, if the relative quantity of nucleic acid containing the mutation is high, it may be decided that vector X lot B-2 is severely compromised and should be destroyed instead of being used to produce vaccine Y.


Various modifications of the invention, in addition to those described herein, will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. Each reference (including, but not limited to, journal articles, U.S. and non-U.S. patents, patent application publications, international patent application publications, gene bank accession numbers, internet web sites, and the like) cited in the present application is incorporated herein by reference in its entirety. Those skilled in the art will appreciate that numerous changes and modifications may be made to the embodiments of the invention and that such changes and modifications may be made without departing from the spirit of the invention. It is therefore intended that the appended claims cover all such equivalent variations as fall within the true spirit and scope of the invention.

Claims
  • 1. A method for analyzing a nucleic acid comprising the steps of: (a) obtaining a sample comprising nucleic acid for base composition analysis;(b) selecting at least two primer pairs that will generate overlapping amplification products of at least two sub-segments of the nucleic acid;(c) amplifying at least two nucleic acid sequences of a region of the nucleic acid designated as a target for base composition analysis using the at least two primer pairs, thereby generating at least two overlapping amplification products;(d) determining base compositions of the amplification products by;(i) measuring molecular masses of one or more of the amplification products generated in step (c) using a mass spectrometer; and(ii) converting one or more of the measured molecular masses to base compositions;(e) comparing one or more of the base compositions with reference base composition data for the nucleic acid sequence; and(f) identifying the presence of a particular nucleic acid sequence or variant thereof.
  • 2. The method of claim 1 wherein the step of selecting at least two primer pairs is selecting at least one triplex combination of primer pairs.
  • 3. The method of claim 1 wherein the amplification product is from about 40 nucleobases in length to about 150 nucleobases in length.
  • 4. The method of claim 3 wherein the amplification product is from about 46 nucleobases in length to about 140 nucleobases in length.
  • 5. The method of claim 1 wherein the amplification product is less than 130 nucleobases.
  • 6. The method of claim 1 wherein the amplification product is from about 85 nucleobases in length to about 140 nucleobases in length.
  • 7. The method of claim 1 wherein the sample comprising nucleic acid for base composition analysis is selected from the group consisting of a human chromosomal nucleic acid, a human mitochondrial nucleic acid, a bacterial nucleic acid, a viral nucleic acid, a fungal nucleic acid, a synthetic nucleic acid, a recombinant nucleic acid and a combination thereof.
  • 8. The method of claim 1 wherein the comparing step identifies at least one variant whose base composition differs from the base composition of its corresponding reference sub-segment, thereby characterizing a genotype of a human, bacterium, virus or fungus.
  • 9. The method of claim 1 wherein the identifying step identifies at least one amplification product whose base composition differs from the base composition of its corresponding reference sub-segment, thereby identifying a genetically-engineered bacterium, virus or fungus.
  • 10. The method of claim 7 wherein the sample for base composition analysis is at least a portion of the hypervariable Region I segment of a mitochondrial DNA.
  • 11. The method of claim 10 wherein the hypervariable Region I segment is nucleotide positions 15924 to 16451 of SEQ ID NO: 51.
  • 12. The method of claim 10 wherein the sample for base composition analysis comprises a heteroplasmy variant.
  • 13. The method of claim 7 wherein the sample for base composition analysis is at least a portion of the hypervariable Region II segment of a mitochondrial DNA.
  • 14. The method of claim 13 wherein the hypervariable Region II segment is nucleotide positions 31 to 576 of SEQ ID NO: 51.
  • 15. The method of claim 13 wherein the sample for base composition comprises a heteroplasmy variant.
  • 16. The method of claim 1 wherein the amplifying step is carried out in the presence of a dNTP comprising a mass modifying tag.
  • 17. The method of claim 16 wherein the dNTP is 2′-deoxy-guanosine-5′-triphosphate.
  • 18. The method of claim 16 wherein the dNTP is 13.sup.C.
  • 19. The method of claim 7 wherein the sample for base composition analysis is human mitochondrial DNA and the obtained base compositions are compared with corresponding reference sub-segments of a reference sequence comprising the Anderson/Cambridge sequence (SEQ ID NO: 51).
  • 20. The method of claim 7 wherein the comparing of the measured base composition to the reference sub-segments provides genotype information comprising human identification, SNP identification, VNTR identification, recombinant insert identification, heteroplasmy identification, genetic disease disposition and combinations thereof.
  • 21. The method of claim 1 wherein the amplification step is quantitative and further comprises the step of adding to the sample a known quantity of a calibrant that will co-amplify with the sample for sequence identity analysis.
  • 22. The method of claim 1 wherein the mass spectrometry is electrospray Fourier transform ion cyclotron resonance mass spectrometry or electrospray time-of-flight mass spectrometry.
  • 23. The method of claim 1 wherein the target region for base composition analysis has a length falling in the range comprising an upper limit of about 700 nucleobases and comprising a lower limit of about 300 nucleobases.
  • 24. The method of claim 23 wherein the range comprises an upper limit of about 600 nucleobases and comprises a lower limit of about 400 nucleobases.
  • 25. The method of claim 1 wherein the nucleic acid is a vector.
  • 26. The method of claim 25 wherein the vector comprises a gene encoding a therapeutic protein.
  • 27. A method for identifying a human comprising the steps of: (a) obtaining a sample comprising mitochondrial DNA of the human for base composition analysis;(b) selecting at least two primer pairs that will generate overlapping amplification products representing sub-segments of the mitochondrial DNA;(c) amplifying at least two nucleic acid sequences of a region of the mitochondrial DNA designated as a target for base composition analysis using the at least two primer pairs, thereby generating at least two overlapping amplification products;(d) determining base compositions of the amplification products by;(i) measuring molecular masses of one or more of the amplification products generated in step (c) using a mass spectrometer; and(ii) converting one or more of the measured molecular masses to base compositions;(e) comparing one or more of the base compositions with reference base composition data for the nucleic acid sequence thereby identifying the human.
  • 28. The method of claim 27 wherein the sample for base composition analysis is at least a portion of the hypervariable Region I segment of a mitochondrial DNA.
  • 29. The method of claim 28 wherein the hypervariable Region I segment is nucleotide positions 15924 to 16451 of SEQ ID NO: 51.
  • 30. The method of claim 27 wherein the sample for base composition analysis is at least a portion of the hypervariable Region II segment of a mitochondrial DNA.
  • 31. The method of claim 30 wherein the hypervariable Region II segment is nucleotide positions 31 to 576 of SEQ ID NO: 51.
  • 32. The method of claim 27 wherein the comparing comprises a head-to-head comparison between said one or more of the base compositions and the reference base composition data.
  • 33. The method of claim 27 wherein the comparing comprises a comparison between said one or more of the base compositions with a standard reference database comprising the reference base composition data.
  • 34. A method for characterizing heteroplasmy in mitochondrial DNA comprising the steps of: (a) obtaining a sample comprising mitochondrial DNA for base composition analysis;(b) selecting at least two primer pairs that will generate overlapping amplification products representing sub-segments of the mitochondrial DNA;(c) amplifying at least two nucleic acid sequences of a region of the mitochondrial DNA designated as a target for base composition analysis using the at least two primer pairs, thereby generating at least two overlapping amplification products;(d) determining base compositions of the amplification products by;(i) measuring molecular masses of one or more of the amplification products generated in step (c) using a mass spectrometer; and(ii) converting one or more of the measured molecular masses to base compositions;(e) comparing one or more of the base compositions with one or more base compositions of reference sub-segments of a reference sequence; and(f) identifying at least two distinct amplification products with distinct base compositions obtained by the same pair of primers, thereby characterizing the heteroplasmy.
RELATED APPLICATIONS

The present application claims the benefit of priority to U.S. Provisional Application Ser. No. 60/701,404, filed Jul. 21, 2005; to U.S. Provisional Application Ser. No. 60/771,101, filed Feb. 6, 2006; and to U.S. Provisional Application Ser. No. 60/747,607 filed May 18, 2006. Each of the above listed U.S. Provisional Applications is incorporated herein by reference in entirety. Methods disclosed in U.S. application Ser. Nos. 10/156,608, 09/891,793, 10/418,514, 10/660,997, 10/660,122, 10,660,996, 10/660,998, 10/728,486, 10/405,756, 10/853,660, 11/060,135, 11/073,362 and 11/209,439, are commonly owned and incorporated herein by reference in their entirety for any purpose.

US Referenced Citations (357)
Number Name Date Kind
4075475 Risby et al. Feb 1978 A
4683195 Mullis et al. Jul 1987 A
4683202 Mullis Jul 1987 A
4965188 Mullis et al. Oct 1990 A
5015845 Allen et al. May 1991 A
5072115 Zhou Dec 1991 A
5143905 Sivasubramanian et al. Sep 1992 A
5213961 Bunn et al. May 1993 A
5219727 Wang et al. Jun 1993 A
5288611 Kohne Feb 1994 A
5436129 Stapleton Jul 1995 A
5451500 Stapleton Sep 1995 A
5472843 Milliman Dec 1995 A
5476774 Wang et al. Dec 1995 A
5484808 Grinnell Jan 1996 A
5484908 Froehler et al. Jan 1996 A
5502177 Matteucci et al. Mar 1996 A
5503980 Cantor Apr 1996 A
5504327 Sproch et al. Apr 1996 A
5504329 Mann et al. Apr 1996 A
5523217 Lupski et al. Jun 1996 A
5527669 Resnick et al. Jun 1996 A
5527675 Coull et al. Jun 1996 A
5527875 Yokoyama et al. Jun 1996 A
5547835 Koster Aug 1996 A
5567587 Kohne Oct 1996 A
5576204 Blanco et al. Nov 1996 A
5580733 Levis et al. Dec 1996 A
5605798 Koster Feb 1997 A
5608217 Franzen et al. Mar 1997 A
5612179 Simons Mar 1997 A
5622824 Koster Apr 1997 A
5625184 Vestal et al. Apr 1997 A
5639606 Willey Jun 1997 A
5641632 Kohne Jun 1997 A
5645985 Froehler et al. Jul 1997 A
5683869 Shaw et al. Nov 1997 A
5686242 Bruice et al. Nov 1997 A
5691141 Koster Nov 1997 A
5700642 Monforte et al. Dec 1997 A
5702895 Matsunaga et al. Dec 1997 A
5707802 Sandhu et al. Jan 1998 A
5712125 Uhlen Jan 1998 A
5716825 Hancock et al. Feb 1998 A
5727202 Kucala Mar 1998 A
5745751 Nelson et al. Apr 1998 A
5747246 Pannetier et al. May 1998 A
5747251 Carson et al. May 1998 A
5753467 Jensen et al. May 1998 A
5753489 Kistner et al. May 1998 A
5759771 Tilanus Jun 1998 A
5763169 Sandhu et al. Jun 1998 A
5763588 Matteucci et al. Jun 1998 A
5770367 Southern et al. Jun 1998 A
5777324 Hillenkamp Jul 1998 A
5814442 Natarajan et al. Sep 1998 A
5822824 Dion Oct 1998 A
5828062 Jarrell et al. Oct 1998 A
5830653 Froehler et al. Nov 1998 A
5830655 Monforte et al. Nov 1998 A
5830853 Backstrom et al. Nov 1998 A
5832489 Kucala Nov 1998 A
5834255 Van Gemen et al. Nov 1998 A
5845174 Yasui et al. Dec 1998 A
5849492 Rogan Dec 1998 A
5849497 Steinman Dec 1998 A
5849901 Mabilat et al. Dec 1998 A
5851765 Koster Dec 1998 A
5856174 Lipshutz et al. Jan 1999 A
5864137 Becker et al. Jan 1999 A
5866429 Bloch Feb 1999 A
5869242 Kamb Feb 1999 A
5871697 Rothberg et al. Feb 1999 A
5872003 Koster Feb 1999 A
5876936 Ju Mar 1999 A
5876938 Stolowitz et al. Mar 1999 A
5885775 Haff et al. Mar 1999 A
5900481 Lough et al. May 1999 A
5928905 Stemmer et al. Jul 1999 A
5928906 Koster Jul 1999 A
5965363 Monforte et al. Oct 1999 A
5965383 Vogel et al. Oct 1999 A
5972693 Rothberg et al. Oct 1999 A
5976798 Parker et al. Nov 1999 A
5981176 Wallace Nov 1999 A
5981178 Tsui et al. Nov 1999 A
5981190 Israel Nov 1999 A
5994066 Bergeron et al. Nov 1999 A
6001564 Bergeron et al. Dec 1999 A
6001584 Karin et al. Dec 1999 A
6005096 Matteucci et al. Dec 1999 A
6007690 Nelson et al. Dec 1999 A
6007992 Lin et al. Dec 1999 A
6015666 Springer et al. Jan 2000 A
6018713 Coli et al. Jan 2000 A
6024925 Little et al. Feb 2000 A
6028183 Lin et al. Feb 2000 A
6043031 Koster et al. Mar 2000 A
6046005 Ju Apr 2000 A
6051378 Monforte et al. Apr 2000 A
6054278 Dodge et al. Apr 2000 A
6055487 Margery et al. Apr 2000 A
6060246 Summerton et al. May 2000 A
6061686 Gauvin et al. May 2000 A
6063031 Cundari et al. May 2000 A
6074823 Koster Jun 2000 A
6074831 Yakhini et al. Jun 2000 A
6090558 Butler et al. Jul 2000 A
6104028 Hunter et al. Aug 2000 A
6110710 Smith et al. Aug 2000 A
6111251 Hillenkamp Aug 2000 A
6133436 Koster et al. Oct 2000 A
6140053 Koster Oct 2000 A
6146144 Fowler et al. Nov 2000 A
6146854 Koster et al. Nov 2000 A
6153389 Haarer et al. Nov 2000 A
6159681 Zebala Dec 2000 A
6180339 Sandhu et al. Jan 2001 B1
6180372 Franzen Jan 2001 B1
6187842 Kobayashi et al. Feb 2001 B1
6194144 Koster Feb 2001 B1
6197498 Koster Mar 2001 B1
6214555 Leushner et al. Apr 2001 B1
6218118 Sampson et al. Apr 2001 B1
6221587 Ecker et al. Apr 2001 B1
6221598 Schumm et al. Apr 2001 B1
6221601 Koster et al. Apr 2001 B1
6221605 Koster Apr 2001 B1
6225450 Koster May 2001 B1
6235476 Bergmann et al. May 2001 B1
6235478 Koster May 2001 B1
6235480 Shultz et al. May 2001 B1
6238871 Koster May 2001 B1
6238927 Abrams et al. May 2001 B1
6239159 Brown et al. May 2001 B1
6258538 Koter et al. Jul 2001 B1
6261769 Everett et al. Jul 2001 B1
6265716 Hunter et al. Jul 2001 B1
6265718 Park et al. Jul 2001 B1
6266131 Hamada et al. Jul 2001 B1
6266144 Li Jul 2001 B1
6268129 Gut et al. Jul 2001 B1
6268131 Kang et al. Jul 2001 B1
6268144 Koster Jul 2001 B1
6268146 Shultz Jul 2001 B1
6270973 Lewis et al. Aug 2001 B1
6270974 Shultz et al. Aug 2001 B1
6274726 Laugharn, Jr. et al. Aug 2001 B1
6277573 Koster et al. Aug 2001 B1
6277578 Shultz et al. Aug 2001 B1
6277634 McCall et al. Aug 2001 B1
6300076 Koster Oct 2001 B1
6303297 Lincoln et al. Oct 2001 B1
6312893 Van Ness et al. Nov 2001 B1
6312902 Shultz et al. Nov 2001 B1
6322970 Little et al. Nov 2001 B1
6361940 Van Ness et al. Mar 2002 B1
6372424 Brow et al. Apr 2002 B1
6389428 Rigault et al. May 2002 B1
6391551 Shultz et al. May 2002 B1
6393367 Tang et al. May 2002 B1
6419932 Dale Jul 2002 B1
6423966 Hillenkamp et al. Jul 2002 B2
6428955 Koster et al. Aug 2002 B1
6428956 Crooke et al. Aug 2002 B1
6432651 Hughes et al. Aug 2002 B1
6436635 Fu et al. Aug 2002 B1
6436640 Simmons et al. Aug 2002 B1
6453244 Oefner et al. Sep 2002 B1
6458533 Felder et al. Oct 2002 B1
6468743 Romick et al. Oct 2002 B1
6468748 Monforte et al. Oct 2002 B1
6475143 Iliff Nov 2002 B2
6475736 Stanton, Jr. Nov 2002 B1
6475738 Shuber et al. Nov 2002 B2
6479239 Anderson et al. Nov 2002 B1
6500621 Koster Dec 2002 B2
6553317 Lincoln et al. Apr 2003 B1
6558902 Hillenkamp May 2003 B1
6563025 Song et al. May 2003 B1
6566055 Monforte et al. May 2003 B1
6568055 Tang et al. May 2003 B1
6582916 Schmidt et al. Jun 2003 B1
6586584 McMillian et al. Jul 2003 B2
6589485 Koster Jul 2003 B2
6602662 Koster et al. Aug 2003 B1
6605433 Fliss et al. Aug 2003 B1
6610492 Stanton, Jr. et al. Aug 2003 B1
6613509 Chen Sep 2003 B1
6613520 Ashby Sep 2003 B2
6623928 Van Ness et al. Sep 2003 B2
6638714 Linnen et al. Oct 2003 B1
6680476 Hidalgo et al. Jan 2004 B1
6682889 Wang et al. Jan 2004 B1
6705530 Kiekhaefer Mar 2004 B2
6706530 Hillenkamp Mar 2004 B2
6716634 Myerson Apr 2004 B1
6783939 Olmsted et al. Aug 2004 B2
6800289 Nagata et al. Oct 2004 B2
6813615 Colasanti et al. Nov 2004 B1
6836742 Brekenfeld Dec 2004 B2
6852487 Barany et al. Feb 2005 B1
6856914 Pelech Feb 2005 B1
6875593 Froehler Apr 2005 B2
6906316 Sugiyama et al. Jun 2005 B2
6906319 Hoyes Jun 2005 B2
6914137 Baker Jul 2005 B2
6977148 Dean et al. Dec 2005 B2
6994962 Thilly Feb 2006 B1
7022835 Rauth et al. Apr 2006 B1
7024370 Epler et al. Apr 2006 B2
7108974 Ecker et al. Sep 2006 B2
7198893 Köster et al. Apr 2007 B1
7217510 Ecker et al. May 2007 B2
7226739 Ecker et al. Jun 2007 B2
7255992 Ecker et al. Aug 2007 B2
7285422 Little et al. Oct 2007 B1
7312036 Sampath et al. Dec 2007 B2
7321828 Cowsert et al. Jan 2008 B2
7349808 Kreiswirth et al. Mar 2008 B1
7390458 Burow et al. Jun 2008 B2
7419787 Köster Sep 2008 B2
7501251 Köster et al. Mar 2009 B2
7666588 Ecker et al. Feb 2010 B2
7718354 Ecker et al. May 2010 B2
7741036 Ecker et al. Jun 2010 B2
7781162 Ecker et al. Aug 2010 B2
20010039263 Matthes et al. Nov 2001 A1
20020006611 Portugal et al. Jan 2002 A1
20020028923 Cowsert et al. Mar 2002 A1
20020042112 Koster et al. Apr 2002 A1
20020042506 Kristyanne et al. Apr 2002 A1
20020045178 Cantor et al. Apr 2002 A1
20020055101 Bergeron et al. May 2002 A1
20020090320 Burow et al. Jul 2002 A1
20020120408 Kreiswirth et al. Aug 2002 A1
20020137057 Wold et al. Sep 2002 A1
20020138210 Wilkes et al. Sep 2002 A1
20020150903 Koster et al. Oct 2002 A1
20020150927 Matray et al. Oct 2002 A1
20020168630 Fleming et al. Nov 2002 A1
20020187490 Tiedje et al. Dec 2002 A1
20030017487 Xue et al. Jan 2003 A1
20030027135 Ecker et al. Feb 2003 A1
20030039976 Haff et al. Feb 2003 A1
20030050470 An et al. Mar 2003 A1
20030064483 Shaw et al. Apr 2003 A1
20030073112 Zhang et al. Apr 2003 A1
20030082539 Ecker et al. May 2003 A1
20030084483 Simpson et al. May 2003 A1
20030101172 De La Huerga May 2003 A1
20030104410 Mittmann Jun 2003 A1
20030104699 Minamihaba et al. Jun 2003 A1
20030113738 Liu et al. Jun 2003 A1
20030113745 Monforte et al. Jun 2003 A1
20030119018 Omura et al. Jun 2003 A1
20030124556 Ecker et al. Jul 2003 A1
20030125192 Moon Jul 2003 A1
20030129589 Koster et al. Jul 2003 A1
20030134312 Burgyone et al. Jul 2003 A1
20030148281 Glucksmann Aug 2003 A1
20030148284 Vision et al. Aug 2003 A1
20030167133 Ecker et al. Sep 2003 A1
20030167134 Ecker et al. Sep 2003 A1
20030175695 Ecker et al. Sep 2003 A1
20030175696 Ecker et al. Sep 2003 A1
20030175697 Ecker et al. Sep 2003 A1
20030175729 Van Eijk et al. Sep 2003 A1
20030186247 Smarason Oct 2003 A1
20030187588 Ecker et al. Oct 2003 A1
20030187593 Ecker et al. Oct 2003 A1
20030187615 Epler et al. Oct 2003 A1
20030190605 Ecker et al. Oct 2003 A1
20030190635 McSwiggen Oct 2003 A1
20030194699 Lewis et al. Oct 2003 A1
20030203398 Bramucci et al. Oct 2003 A1
20030220844 Marmellos et al. Nov 2003 A1
20030224377 Wengel et al. Dec 2003 A1
20030225529 Ecker et al. Dec 2003 A1
20030228571 Ecker et al. Dec 2003 A1
20030228597 Cowsert et al. Dec 2003 A1
20030228613 Bornarth et al. Dec 2003 A1
20040005555 Rothman et al. Jan 2004 A1
20040006611 Yi Jan 2004 A1
20040013703 Ralph et al. Jan 2004 A1
20040014957 Eldrup et al. Jan 2004 A1
20040023207 Polansky Feb 2004 A1
20040023209 Jonasson Feb 2004 A1
20040029129 Wang et al. Feb 2004 A1
20040038206 Zhang et al. Feb 2004 A1
20040038208 Fisher et al. Feb 2004 A1
20040038234 Gut et al. Feb 2004 A1
20040038385 Langlois et al. Feb 2004 A1
20040081993 Cantor et al. Apr 2004 A1
20040101809 Weiss et al. May 2004 A1
20040110169 Ecker et al. Jun 2004 A1
20040111221 Beattie et al. Jun 2004 A1
20040117129 Ecker et al. Jun 2004 A1
20040117354 Azzaro et al. Jun 2004 A1
20040121309 Ecker et al. Jun 2004 A1
20040121310 Ecker et al. Jun 2004 A1
20040121311 Ecker et al. Jun 2004 A1
20040121312 Ecker et al. Jun 2004 A1
20040121313 Ecker et al. Jun 2004 A1
20040121314 Ecker et al. Jun 2004 A1
20040121315 Ecker et al. Jun 2004 A1
20040121329 Ecker et al. Jun 2004 A1
20040121335 Ecker et al. Jun 2004 A1
20040121340 Ecker et al. Jun 2004 A1
20040122598 Ecker et al. Jun 2004 A1
20040122857 Ecker et al. Jun 2004 A1
20040126764 Lasken et al. Jul 2004 A1
20040137013 Katinger et al. Jul 2004 A1
20040161770 Ecker et al. Aug 2004 A1
20040180328 Ecker et al. Sep 2004 A1
20040185438 Ecker Sep 2004 A1
20040191769 Marino et al. Sep 2004 A1
20040202997 Ecker et al. Oct 2004 A1
20040209260 Ecker et al. Oct 2004 A1
20040219517 Ecker et al. Nov 2004 A1
20040253583 Ecker et al. Dec 2004 A1
20040253619 Ecker et al. Dec 2004 A1
20050026147 Walker et al. Feb 2005 A1
20050026641 Hokao Feb 2005 A1
20050027459 Ecker et al. Feb 2005 A1
20050065813 Mishelevich et al. Mar 2005 A1
20050130196 Hofstadler et al. Jun 2005 A1
20050130216 Becker et al. Jun 2005 A1
20050142584 Willson et al. Jun 2005 A1
20050250125 Novakoff Nov 2005 A1
20050266397 Ecker et al. Dec 2005 A1
20050266411 Hofstadler et al. Dec 2005 A1
20060020391 Kreiswirth et al. Jan 2006 A1
20060057605 Sampath et al. Mar 2006 A1
20060121520 Ecker et al. Jun 2006 A1
20060172330 Osborn et al. Aug 2006 A1
20060205040 Sampath Sep 2006 A1
20060240412 Hall et al. Oct 2006 A1
20060259249 Sampath et al. Nov 2006 A1
20060275788 Ecker et al. Dec 2006 A1
20070048735 Ecker et al. Mar 2007 A1
20080160512 Ecker et al. Jul 2008 A1
20080311558 Ecker et al. Dec 2008 A1
20090004643 Ecker et al. Jan 2009 A1
20090023150 Koster et al. Jan 2009 A1
20090042203 Koster Feb 2009 A1
20090092977 Koster Apr 2009 A1
20090125245 Hofstadler et al. May 2009 A1
20090148829 Ecker et al. Jun 2009 A1
20090148836 Ecker et al. Jun 2009 A1
20090148837 Ecker et al. Jun 2009 A1
20090182511 Ecker et al. Jul 2009 A1
20090239224 Ecker et al. Sep 2009 A1
20090280471 Ecker et al. Nov 2009 A1
20100070194 Ecker et al. Mar 2010 A1
20100145626 Ecker et al. Jun 2010 A1
20100184035 Hall et al. Jul 2010 A1
Foreign Referenced Citations (181)
Number Date Country
1202204 Sep 1996 CN
19732086 Jan 1999 DE
19802905 Jul 1999 DE
19824280 Dec 1999 DE
19852167 May 2000 DE
19943374 Mar 2001 DE
10132147 Feb 2003 DE
281390 Sep 1988 EP
633321 Jan 1995 EP
0620862 Apr 1998 EP
1035219 Sep 2000 EP
1138782 Oct 2001 EP
1234888 Aug 2002 EP
02709785 Sep 2002 EP
1138782 Feb 2003 EP
1308506 May 2003 EP
1310571 May 2003 EP
1333101 Aug 2003 EP
1364064 Nov 2003 EP
1365031 Nov 2003 EP
1234888 Jan 2004 EP
1234888 Jan 2004 EP
1748072 Jan 2007 EP
2811321 Jan 2002 FR
2325002 Nov 1998 GB
2339905 Feb 2000 GB
01136KOLNP2003 Feb 2003 IN
5276999 Oct 1993 JP
11137259 May 1999 JP
24024206 Jan 2004 JP
2004000200 Jan 2004 JP
24201641 Jul 2004 JP
24201679 Jul 2004 JP
WO8803957 Jun 1988 WO
WO9015157 Dec 1990 WO
WO9205182 Apr 1992 WO
WO9208117 May 1992 WO
WO9209703 Jun 1992 WO
WO9219774 Nov 1992 WO
WO 9303186 Feb 1993 WO
WO9305182 Mar 1993 WO
WO 9308297 Apr 1993 WO
WO 9416101 Jul 1994 WO
WO 9421822 Sep 1994 WO
WO9419490 Sep 1994 WO
WO 9504161 Feb 1995 WO
WO 9513396 May 1995 WO
WO9511996 May 1995 WO
WO9513395 May 1995 WO
WO9531997 Nov 1995 WO
WO9606187 Feb 1996 WO
WO9616186 May 1996 WO
WO 9629431 Sep 1996 WO
WO 9632504 Oct 1996 WO
WO 9637630 Nov 1996 WO
WO9635450 Nov 1996 WO
WO 9733000 Sep 1997 WO
WO9734909 Sep 1997 WO
WO 9737041 Oct 1997 WO
WO9747766 Dec 1997 WO
WO 9803684 Jan 1998 WO
WO 9812355 Mar 1998 WO
WO 9814616 Apr 1998 WO
WO 9815652 Apr 1998 WO
WO 9820020 May 1998 WO
WO 9820157 May 1998 WO
WO 9820166 May 1998 WO
WO 9826095 Jun 1998 WO
WO 9831830 Jul 1998 WO
WO9835057 Aug 1998 WO
WO 9840520 Sep 1998 WO
WO 9854751 Dec 1998 WO
WO9854571 Dec 1998 WO
WO 9905319 Feb 1999 WO
WO 9912040 Mar 1999 WO
WO 9914375 Mar 1999 WO
WO9913104 Mar 1999 WO
WO 9929898 Jun 1999 WO
WO 9931278 Jun 1999 WO
WO 9957318 Nov 1999 WO
WO9958713 Nov 1999 WO
WO9960183 Nov 1999 WO
WO0032750 Jun 2000 WO
WO0038636 Jul 2000 WO
WO0063362 Oct 2000 WO
WO0066762 Nov 2000 WO
WO0066789 Nov 2000 WO
WO0077260 Dec 2000 WO
WO0100828 Jan 2001 WO
WO 0107648 Feb 2001 WO
WO0112853 Feb 2001 WO
WO0120018 Mar 2001 WO
WO 0123604 Apr 2001 WO
WO0123608 Apr 2001 WO
WO 0132930 May 2001 WO
WO0140497 Jun 2001 WO
WO0146404 Jun 2001 WO
WO 0151661 Jul 2001 WO
WO0151662 Jul 2001 WO
WO 0157263 Aug 2001 WO
WO 0157518 Aug 2001 WO
WO 0173199 Oct 2001 WO
WO0173119 Oct 2001 WO
WO0177392 Oct 2001 WO
WO0196388 Dec 2001 WO
WO0202811 Jan 2002 WO
WO 0210186 Feb 2002 WO
WO 0210444 Feb 2002 WO
WO 0218641 Mar 2002 WO
WO 0221108 Mar 2002 WO
WO 0222873 Mar 2002 WO
WO0224876 Mar 2002 WO
WO 0250307 Jun 2002 WO
WO 02057491 Jul 2002 WO
WO 02070664 Sep 2002 WO
WO02070728 Sep 2002 WO
WO02070737 Sep 2002 WO
WO 02077278 Oct 2002 WO
WO 02099034 Dec 2002 WO
WO02099095 Dec 2002 WO
WO02099129 Dec 2002 WO
WO02099130 Dec 2002 WO
WO 03002750 Jan 2003 WO
WO 03008636 Jan 2003 WO
WO2003001976 Jan 2003 WO
WO 03016546 Feb 2003 WO
WO200314382 Feb 2003 WO
WO2003012058 Feb 2003 WO
WO2003012074 Feb 2003 WO
WO2003018636 Mar 2003 WO
WO2003020890 Mar 2003 WO
WO200303373 Apr 2003 WO
WO 03060163 Jul 2003 WO
WO2003054162 Jul 2003 WO
WO2003054755 Jul 2003 WO
WO2003075955 Sep 2003 WO
WO 03088979 Oct 2003 WO
WO 03093506 Nov 2003 WO
WO 03097869 Nov 2003 WO
WO2003100035 Dec 2003 WO
WO2003100068 Dec 2003 WO
WO2003102191 Dec 2003 WO
WO2003104410 Dec 2003 WO
WO2003106635 Dec 2003 WO
WO2004003511 Jan 2004 WO
WO2004009849 Jan 2004 WO
WO2004011651 Feb 2004 WO
WO2004013357 Feb 2004 WO
WO2004040013 May 2004 WO
WO2004044123 May 2004 WO
WO2004044247 May 2004 WO
WO 2004052175 Jun 2004 WO
WO2004052175 Jun 2004 WO
WO2004053076 Jun 2004 WO
WO2004053141 Jun 2004 WO
WO2004053164 Jun 2004 WO
WO2004060278 Jul 2004 WO
WO2004070001 Aug 2004 WO
WO2004072230 Aug 2004 WO
WO2004072231 Aug 2004 WO
WO2004101809 Nov 2004 WO
WO2005003384 Jan 2005 WO
WO2005009202 Feb 2005 WO
WO2005012572 Feb 2005 WO
WO2005024046 Mar 2005 WO
WO2005036369 Apr 2005 WO
WO 2005053141 Jun 2005 WO
WO2005054454 Jun 2005 WO
WO2005075686 Aug 2005 WO
WO2005086634 Sep 2005 WO
WO2005091971 Oct 2005 WO
WO2005098047 Oct 2005 WO
WO2005116263 Dec 2005 WO
WO2006089762 Aug 2006 WO
WO2006094238 Sep 2006 WO
WO2006116127 Nov 2006 WO
WO2006135400 Dec 2006 WO
WO2007014045 Feb 2007 WO
WO2007086904 Aug 2007 WO
WO2008104002 Aug 2008 WO
WO2008118809 Oct 2008 WO
Related Publications (1)
Number Date Country
20070218467 A1 Sep 2007 US
Provisional Applications (3)
Number Date Country
60701404 Jul 2005 US
60771101 Feb 2006 US
60747607 May 2006 US