The invention is related to polymorphisms associated with coronary artery disease.
Coronary artery disease (CAD) occurs when the arteries that supply blood to the heart muscle (coronary arteries) become hardened and narrowed. The arteries harden and become narrow due to the buildup of plaque on the inner walls or lining of the arteries (atherosclerosis). Blood flow to the heart is reduced as plaque narrows the coronary arteries. This decreases the oxygen supply to the heart muscle. CAD is the most common type of heart disease. It is the leading cause of death in the U.S. in both men and women. When blood flow and oxygen supply to the heart are reduced or cut off, you can develop anginia or heart attack. Angina is chest pain or discomfort that occurs when your heart is not getting enough blood. A heart attack happens when a blood clot suddenly cuts off most or all blood supply to part of the heart. Cells in the heart muscle that do not receive enough oxygen-carrying blood begin to die. This can cause permanent damage to the heart muscle. Over time, CAD can weaken your heart muscle and contribute to heart failure and arrhythmias. In heart failure, the heart is not able to pump blood to the rest of the body effectively. Heart failure does not mean that your heart has stopped or is about to stop working. But it does mean that your heart is failing to pump blood the way that it should. Arrhythmias are changes in the normal rhythm of the heartbeats. Some can be quite serious. (National Institutes of Health website, 2003)
Despite the clear role of lifestyle in CAD, family history has also long been recognized as a risk factor, particularly in disease with onset before age 70 (Hunt, S. C., R. R. Williams, and G. K. Barlow, A comparison of positive family history definitions for defining risk of future disease. J Chronic Dis, 1986. 39(10): 809-821). Heritability of the trait is estimated to be ˜0.34 (Williams, F. M., et al., A common genetic factor underlies hypertension and other cardiovascular disorders. BMC Cardiovasc Disord, 2004. 4(1): 20).
The invention provides methods and compositions for determining susceptibility for coronary artery disease, or related conditions, in an individual. In one aspect, the invention provides nucleic acid sequences that may be used to determine the presence or absence of nucleotides at polymorphic sites in an individual's RNA or genomic DNA that are associated with susceptibility for coronary artery disease, or related conditions. In another aspect, the invention provides a method and kits for identifying a patient having a susceptibility to coronary artery disease having the following steps: (i) obtaining a sample of DNA or RNA from a patient; and (ii) detecting in the sample one or more at-risk polymorphisms located in a sequence of one or more genes selected from the group consisting of those listed in Table 1. In one aspect, the one or more genes are selected from the group consisting of PARD3 and CDC42 (ENSG0000007083 1).
As used herein, an “at-risk polymorphism” is a polymorphism having an association with the presence of coronary artery disease in an individual. In another aspect, an “at-risk polymorphism” is a polymorphism in linkage disequilibrium with a polymorphism listed in Table 1. In still another aspect, an “at-risk polymorphism” is a polymorphism in PARD3 or CDC42. In still another aspect, an “at-risk polymorphism” is a polymorphism in PARD3 or CDC42 listed in Table 1. In preferred aspects one allele of a biallelic polymorphism is identified as being associated with either an increased risk of CAD, called herein an “at-risk allele” or a decreased risk of CAD (conferring a protective effect). The “at-risk allele” may be the major or the minor allele.
In one aspect, SNPs identified in ABCA1 that showed significant allelic associated with CAD were re3758294 and rs4149265. The minor alleles of both appear to be protective and are in high LD with each other.
In another aspect of the invention, it has been discovered that CDC42 and PARD3 act together to increase risk of CAD. Individuals who carry at-risk polymorphisms at both rs16826506 (CDC42) and rs1545214 (PARD3) are at about six times the risk of developing CAD than someone who has wild-type alleles at both loci, as determined by a 2-sided Fisher Exact Test p-value of 0.006. The rare minor allele of rs16826506 is far more common in cases than controls
Terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts in the field, e.g. Kornberg and Baker, DNA Replication, Second Edition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford, 1984); and the like.
“Amplicon” means the product of a polynucleotide amplification reaction. That is, it is a population of polynucleotides, usually double stranded, that are replicated from one or more starting sequences. The one or more starting sequences may be one or more copies of the same sequence, or it may be a mixture of different sequences. Amplicons may be produced by a variety of amplification reactions whose products are multiple replicates of one or more target nucleic acids. Generally, amplification reactions producing amplicons are “template-driven” in that base pairing of reactants, either nucleotides or oligonucleotides, have complements in a template polynucleotide that are required for the creation of reaction products. In one aspect, template-driven reactions are primer extensions with a nucleic acid polymerase or oligonucleotide ligations with a nucleic acid ligase. Such reactions include, but are not limited to, polymerase chain reactions (PCRs), linear polymerase reactions, nucleic acid sequence-based amplification (NASBAs), rolling circle amplifications, and the like, disclosed in the following references that are incorporated herein by reference: Mullis et al, U.S. Pat. Nos. 4,683,195; 4,965,188; 4,683,202; 4,800,159 (PCR); Gelfand et al, U.S. Pat. No. 5,210,015 (real-time PCR with “taqman” probes); Wittwer et al, U.S. Pat. No. 6,174,670; Kacian et al, U.S. Pat. No. 5,399,491 (“NASBA”); Lizardi, U.S. Pat. No. 5,854,033; Aono et al, Japanese patent publ. JP 4-262799 (rolling circle amplification); and the like. In one aspect, amplicons of the invention are produced by PCRs. An amplification reaction may be a “real-time” amplification if a detection chemistry is available that permits a reaction product to be measured as the amplification reaction progresses, e.g. “real-time PCR” described below, or “real-time NASBA” as described in Leone et al, Nucleic Acids Research, 26: 2150-2155 (1998), and like references. As used herein, the term “amplifying” means performing an amplification reaction. A “reaction mixture” means a solution containing all the necessary reactants for performing a reaction, which may include, but not be limited to, buffering agents to maintain pH at a selected level during a reaction, salts, co-factors, scavengers, and the like.
“Complementary or substantially complementary” refers to the hybridization or base pairing or the formation of a duplex between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid. Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two single stranded RNA or DNA molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 80% of the nucleotides of the other strand, usually at least about 90% to 95%, and more preferably from about 98 to 100%. Alternatively, substantial complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement. Typically, selective hybridization will occur when there is at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90% complementary. See, M. Kanehisa Nucleic Acids Res. 12:203 (1984), incorporated herein by reference.
“Coronary artery disease” refers to a disease of the heart and the coronary arteries that is characterized by atherosclerotic arterial deposits that block blood flow to the heart.
“Duplex” means at least two oligonucleotides and/or polynucleotides that are fully or partially complementary undergo Watson-Crick type base pairing among all or most of their nucleotides so that a stable complex is formed. The terms “annealing” and “hybridization” are used interchangeably to mean the formation of a stable duplex. In one aspect, stable duplex means that a duplex structure is not destroyed by a stringent wash, e.g. conditions including tempature of about 5° C. less that the Tm of a strand of the duplex and low monovalent salt concentration, e.g. less than 0.2 M, or less than 0.1 M. “Perfectly matched” in reference to a duplex means that the poly- or oligonucleotide strands making up the duplex form a double stranded structure with one another such that every nucleotide in each strand undergoes Watson-Crick basepairing with a nucleotide in the other strand. The term “duplex” comprehends the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine bases, PNAs, and the like, that may be employed. A “mismatch” in a duplex between two oligonucleotides or polynucleotides means that a pair of nucleotides in the duplex fails to undergo Watson-Crick bonding.
“Genetic locus,” or “locus” in reference to a genome or target polynucleotide, means a contiguous subregion or segment of the genome or target polynucleotide. As used herein, genetic locus, or locus, may refer to the position of a nucleotide, a gene, or a portion of a gene in a genome, including mitochondrial DNA, or it may refer to any contiguous portion of genomic sequence whether or not it is within, or associated with, a gene. In one aspect, a genetic locus refers to any portion of genomic sequence, including mitochondrial DNA, from a single nucleotide to a segment of few hundred nucleotides, e.g. 100-300, in length. Usually, a particular genetic locus may be identified by its nucleotide sequence, or the nucleotide sequence, or sequences, of one or both adjacent or flanking regions.
“Hybridization” refers to the process in which two single-stranded polynucleotides bind non-covalently to form a stable double-stranded polynucleotide. The term “hybridization” may also refer to triple-stranded hybridization. The resulting (usually) double-stranded polynucleotide is a “hybrid” or “duplex.” “Hybridization conditions” will typically include salt concentrations of less than about 1M, more usually less than about 500 mM and less than about 200 mM. Hybridization temperatures can be as low as 50° C., but are typically greater than 22° C., more typically greater than about 30° C., and preferably in excess of about 37° C. Hybridizations are usually performed under stringent conditions, i.e. conditions under which a probe will hybridize to its target subsequence. Stringent conditions are sequence-dependent and are different in different circumstances. Longer fragments may require higher hybridization temperatures for specific hybridization. As other factors may affect the stringency of hybridization, including base composition and length of the complementary strands, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone. Generally, stringent conditions are selected to be about 5° C. lower than the Tm for the specific sequence at s defined ionic strength and pH. Exemplary stringent conditions include salt concentration of at least 0.01 M to no more than 1 M Na ion concentration (or other salts) at a pH 7.0 to 8.3 and a temperature of at least 25° C. For example, conditions of 5×SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C. are suitable for allele-specific probe hybridizations. For stringent conditions, see for example, Sambrook, Fritsche and Maniatis. “Molecular Cloning A laboratory Manual” 2nd Ed. Cold Spring Harbor Press (1989) and Anderson “Nucleic Acid Hybridization” 1st Ed., BIOS Scientific Publishers Limited (1999), which are hereby incorporated by reference in its entirety for all purposes above. “Hybridizing specifically to” or “specifically hybridizing to” or like expressions refer to the binding, duplexing, or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.
“Hybridization-based assay” means any assay that relies on the formation of a stable duplex or triplex between a probe and a target nucleotide sequence for detecting or measuring such a sequence. In one aspect, probes of such assays anneal to (or form duplexes with) regions of target sequences in the range of from 8 to 100 nucleotides; or in other aspects, they anneal to target sequences in the range of from 8 to 40 nucleotides, or more usually, in the range of from 8 to 20 nucleotides. A “probe” in reference to a hybridization-based assay mean a polynucleotide that has a sequence that is capable of forming a stable hybrid (or triplex) with its complement in a target nucleic acid and that is capable of being detected, either directly or indirectly. Hybridization-based assays include, without limitation, assays based on use of oligonucleotides, such as polymerase chain reactions, NASBA reactions, oligonucleotide ligation reactions, single-base extensions of primers, circularizable probe reactions, allele-specific oligonucleotides hybridizations, either in solution phase or bound to solid phase supports, such as microarrays or microbeads. There is extensive guidance in the literature on hybridization-based assays, e.g. Hames et al, editors, Nucleic Acid Hybridization a Practical Approach (IRL Press, Oxford, 1985); Tijssen, Hybridization with Nucleic Acid Probes, Parts I & II (Elsevier Publishing Company, 1993); Hardiman, Microarray Methods and Applications (DNA Press, 2003); Schena, editor, DNA Microarrays a Practical Approach (IRL Press, Oxford, 1999); and the like. In one aspect, hybridization-based assays are solution phase assays; that is, both probes and target sequences hybridize under conditions that are substantially free of surface effects or influences on reaction rate. A solution phase assay may include circumstance where either probes or target sequences are attached to microbeads.
“Linkage” describes the tendency of genes, alleles, loci or genetic markers to be inherited together as a result of their location on the same chromosome, and can be measured by percent recombination between the genes, alleles, loci or genetic markers that are physically-linked on the same chromosome. Loci occurring within 50 centimorgan of each other are linked. Some linked markers occur within the same gene or gene cluster.
“Kit” refers to any delivery system for delivering materials or reagents for carrying out a method of the invention. In the context of assays, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., probes, enzymes, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials for assays of the invention. In one aspect, kits of the invention comprise probes specific for interfering polymorphic loci. In another aspect, kits comprise nucleic acid standards for validating the performance of probes specific for interfering polymorphic loci. Such contents may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container contains probes.
“Ligation” means to form a covalent bond or linkage between the termini of two or more nucleic acids, e.g. oligonucleotides and/or polynucleotides, in a template-driven reaction. The nature of the bond or linkage may vary widely and the ligation may be carried out enzymatically or chemically. As used herein, ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5′ carbon of a terminal nucleotide of one oligonucleotide with 3′ carbon of another oligonucleotide. A variety of template-driven ligation reactions are described in the following references, which are incorporated by reference: Whitely et al, U.S. Pat. No. 4,883,750; Letsinger et al, U.S. Pat. No. 5,476,930; Fung et al, U.S. Pat. No. 5,593,826; Kool, U.S. Pat. No. 5,426,180; Landegren et al, U.S. Pat. No. 5,871,921; Xu and Kool, Nucleic Acids Research, 27: 875-881 (1999); Higgins et al, Methods in Enzymology, 68: 50-71 (1979); Engler et al, The Enzymes, 15: 3-29 (1982); and Namsaraev, U.S. patent publication 2004/0110213.
“Microarray” refers to a solid phase support having a planar surface, which carries an array of nucleic acids, each member of the array comprising identical copies of an oligonucleotide or polynucleotide immobilized to a spatially defined region or site, which does not overlap with those of other members of the array; that is, the regions or sites are spatially discrete. Spatially defined hybridization sites may additionally be “addressable” in that its location and the identity of its immobilized oligonucleotide are known or predetermined, for example, prior to its use. Typically, the oligonucleotides or polynucleotides are single stranded and are covalently attached to the solid phase support, usually by a 5′-end or a 3′-end. The density of non-overlapping regions containing nucleic acids in a microarray is typically greater than 100 per cm2, and more preferably, greater than 1000 per cm2. Microarray technology is reviewed in the following references: Schena, Editor, Microarrays: A Practical Approach (IRL Press, Oxford, 2000); Southern, Current Opin. Chem. Biol., 2: 404-410 (1998); Nature Genetics Supplement, 21: 1-60 (1999). As used herein, “random microarray” refers to a microarray whose spatially discrete regions of oligonucleotides or polynucleotides are not spatially addressed. That is, the identity of the attached oligonucleotides or polynucleotides is not discernable, at least initially, from its location. In one aspect, random microarrays are planar arrays of microbeads wherein each microbead has attached a single kind of hybridization tag complement, such as from a minimally cross-hybridizing set of oligonucleotides. Arrays of microbeads may be formed in a variety of ways, e.g. Brenner et al, Nature Biotechnology, 18: 630-634 (2000); Tulley et al, U.S. Pat. No. 6,133,043; Stuelpnagel et al, U.S. Pat. No. 6,396,995; Chee et al, U.S. Pat. No. 6,544,732; and the like. Likewise, after formation, microbeads, or oligonucleotides thereof, in a random array may be identified in a variety of ways, including by optical labels, e.g. fluorescent dye ratios or quantum dots, shape, sequence analysis, or the like.
“Nucleoside” as used herein includes the natural nucleosides, including 2′-deoxy and 2′-hydroxyl forms, e.g. as described in Kornberg and Baker, DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). “Analogs” in reference to nucleosides includes synthetic nucleosides having modified base moieties and/or modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley, New York, 1980); Uhlman and Peyman, Chemical Reviews, 90: 543-584 (1990), or the like, with the proviso that they are capable of specific hybridization. Such analogs include synthetic nucleosides designed to enhance binding properties, reduce complexity, increase specificity, and the like. Polynucleotides comprising analogs with enhanced hybridization or nuclease resistance properties are described in Uhlman and Peyman (cited above); Crooke et al, Exp. Opin. Ther. Patents, 6: 855-870 (1996); Mesmaeker et al, Current Opinion in Structual Biology, 5: 343-355 (1995); and the like. Exemplary types of polynucleotides that are capable of enhancing duplex stability include oligonucleotide N3′→P5′ phosphoramidates (referred to herein as “amidates”), peptide nucleic acids (referred to herein as “PNAs”), oligo-2′-O-alkylribonucleotides, polynucleotides containing C-5 propynylpyrimidines, locked nucleic acids (LNAs), and like compounds. Such oligonucleotides are either available commercially or may be synthesized using methods described in the literature.
“Polymerase chain reaction,” or “PCR,” means a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors well-known to those of ordinary skill in the art, e.g. exemplified by the references: McPherson et al, editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature >90° C., primers annealed at a temperature in the range 50-75° C., and primers extended at a temperature in the range 72-78° C. The term “PCR” encompasses derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like. Reaction volumes range from a few hundred nanoliters, e.g. 200 nL, to a few hundred μL, e.g. 200 μL. “Reverse transcription PCR,” or “RT-PCR,” means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, e.g. Tecott et al, U.S. Pat. No. 5,168,038, which patent is incorporated herein by reference. “Real-time PCR” means a PCR for which the amount of reaction product, i.e. amplicon, is monitored as the reaction proceeds. There are many forms of real-time PCR that differ mainly in the detection chemistries used for monitoring the reaction product, e.g. Gelfand et al, U.S. Pat. No. 5,210,015 (“taqman”); Wittwer et al, U.S. Pat. Nos. 6,174,670 and 6,569,627 (intercalating dyes); Tyagi et al, U.S. Pat. No. 5,925,517 (molecular beacons); which patents are incorporated herein by reference. Detection chemistries for real-time PCR are reviewed in Mackay et al, Nucleic Acids Research, 30: 1292-1305 (2002), which is also incorporated herein by reference. “Nested PCR” means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, “initial primers” in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and “secondary primers” mean the one or more primers used to generate a second, or nested, amplicon. “Multiplexed PCR” means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g. Bernard et al, Anal. Biochem., 273: 221-228 (1999)(two-color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. “Quantitative PCR” means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Quantitative PCR includes both absolute quantitation and relative quantitation of such target sequences. Quantitative measurements are made using one or more reference sequences that may be assayed separately or together with a target sequence. The reference sequence may be endogenous or exogenous to a sample or specimen, and in the latter case, may comprise one or more competitor templates. Typical endogenous reference sequences include segments of transcripts of the following genes: β-actin, GAPDH, β2-microglobulin, ribosomal RNA, and the like. Techniques for quantitative PCR are well-known to those of ordinary skill in the art, as exemplified in the following references that are incorporated by reference: Freeman et al, Biotechniques, 26: 112-126 (1999); Becker-Andre et al, Nucleic Acids Research, 17: 9437-9447 (1989); Zimmerman et al, Biotechniques, 21: 268-279 (1996); Diviacco et al, Gene, 122: 3013-3020 (1992); Becker-Andre et al, Nucleic Acids Research, 17: 9437-9446 (1989); and the like.
“Polymorphism” refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. A polymorphic marker or site is the locus at which such polymorphism occurs. Preferred markers have at least two alleles, each occurring at frequency of greater than 1%, and more preferably greater than 5% or 10% of a selected population. A polymorphic locus may be as small as one base pair. Polymorphic markers include restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, deletions, simple sequence repeats, and insertion elements, such as Alu. The first identified allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles. The allelic form occurring most frequently in a selected population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms. A diallelic polymorphism has two forms. A triallelic polymorphism has three forms.
“Polymorphism” or “genetic variant” means a substitution, inversion, insertion, or deletion of one or more nucleotides at a genetic locus, or a translocation of DNA from one genetic locus to another genetic locus. In one aspect, polymorphism means one of multiple alternative nucleotide sequences that may be present at a genetic locus of an individual and that may comprise a nucleotide substitution, insertion, or deletion with respect to other sequences at the same locus in the same individual, or other individuals within a population. An individual may be homozygous or heterozygous at a genetic locus; that is, an individual may have the same nucleotide sequence in both alleles, or have a different nucleotide sequence in each allele, respectively. In one aspect, insertions or deletions at a genetic locus comprises the addition or the absence of from 1 to 10 nucleotides at such locus, in comparison with the same locus in another individual of a population (or another allele in the same individual). Usually, insertions or deletions are with respect to a major allele at a locus within a population, e.g. an allele present in a population at a frequency of fifty percent or greater.
“Single nucleotide polymorphism” or “SNP” occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences. The site is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations). A single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site. A transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine or vice versa. Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele.
“Isolated nucleic acid” means an object species invention that is the predominant species present (i.e., on a molar basis it is more abundant than any other individual species in the composition). Preferably, an isolated nucleic acid comprises at least about 50 percent (on a molar basis) of all macromolecular species present; and more preferably, an isolated nucleic acid comprises at least about 90 percent (on a molar basis) of all macromolecular species present. Most preferably, the object species is purified to essential homogeneity (contaminant species cannot be detected in the composition by conventional detection methods).
“Linkage disequilibrium” or “LD” or “allelic association” or “association” means the preferential association of a particular allele or genetic marker with a specific allele, or genetic marker at another chromosomal location more frequently than expected by chance given the particular allele frequency in the population. For example, if locus X has alleles a and b, which occur equally frequently, and another locus Y has alleles c and d, which occur equally frequently, one would expect the haplotype ac to occur with a frequency of 0.25 in a population of individuals. If ac occurs more frequently, then alleles a and c are considered in linkage disequilibrium. Linkage disequilibrium may result from natural selection of certain combination of alleles, through the admixture of two or more genetically different populations or because an allele has been introduced into a population too recently to have reached equilibrium (random association) between linked alleles.
A “marker” or “biomarker” in linkage disequilibrium with disease predisposing variants can be particularly useful in detecting susceptibility to disease (or association with sub-clinical phenotypes) notwithstanding that the marker does not cause the disease. For example, a marker (X) that is not itself a causative element of a disease, but which is in linkage disequilibrium with a gene (including regulatory sequences) (Y) that is a causative element of a phenotype, can be used detected or indicate susceptibility to the disease in circumstances in which the gene Y may not have been identified or may not be readily detectable. Younger alleles (i.e., those arising from mutation relatively recently) are expected to have a larger genomic segment in linkage disequilibrium. The age of an allele can be determined from whether the allele is shared among different human ethnic groups and/or between humans and related species.
A “study population” can consist of any number of individuals, subjects or biological samples that may come from human or non-human organisms.
“Polynucleotide” or “oligonucleotide” are used interchangeably and each mean a linear polymer of nucleotide monomers. Monomers making up polynucleotides and oligonucleotides are capable of specifically binding to a natural polynucleotide by way of a regular pattern of monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Such monomers and their internucleosidie linkages may be naturally occurring or may be analogs thereof, e.g. naturally occurring or non-naturally occurring analogs. Non-naturally occurring analogs may include PNAs, phosphorothioate internucleosidie linkages, bases containing linking groups permitting the attachment of labels, such as fluorophores, or haptens, and the like. Whenever the use of an oligonucleotide or polynucleotide requires enzymatic processing, such as extension by a polymerase, ligation by a ligase, or the like, one of ordinary skill would understand that oligonucleotides or polynucleotides in those instances would not contain certain analogs of internucleosidic linkages, sugar moities, or bases at any or some positions. Polynucleotides typically range in size from a few monomeric units, e.g. 5-40, when they are usually referred to as “oligonucleotides,” to several thousand monomeric units. Whenever a polynucleotide or oligonucleotide is represented by a sequence of letters (upper or lower case), such as “ATGCCTG,” it will be understood that the nucleotides are in 5′→3′ order from left to right and that “A” denotes deoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine, and “T” denotes thymidine, “I” denotes deoxyinosine, “U” denotes uridine, unless otherwise indicated or obvious from context. Unless otherwise noted the terminology and atom numbering conventions will follow those disclosed in Strachan and Read, Human Molecular Genetics 2 (Wiley-Liss, New York, 1999). Usually polynucleotides comprise the four natural nucleosides (e.g. deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine for DNA or their ribose counterparts for RNA) linked by phosphodiester linkages; however, they may also comprise non-natural nucleotide analogs, e.g. including modified bases, sugars, or internucleosidic linkages. It is clear to those skilled in the art that where an enzyme has specific oligonucleotide or polynucleotide substrate requirements for activity, e.g. single stranded DNA, RNA/DNA duplex, or the like, then selection of appropriate composition for the oligonucleotide or polynucleotide substrates is well within the knowledge of one of ordinary skill, especially with guidance from treatises, such as Sambrook et al, Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 1989), and like references.
“Primer” means an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3′ end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process are determined by the sequence of the template polynucleotide. Usually primers are extended by a DNA polymerase. Primers usually have a length in the range of from 14 to 36 nucleotides.
“Readout” means a parameter, or parameters, which are measured and/or detected that can be converted to a number or value. In some contexts, readout may refer to an actual numerical representation of such collected or recorded data. For example, a readout of fluorescent intensity signals from a microarray is the address and fluorescence intensity of a signal being generated at each hybridization site of the microarray; thus, such a readout may be registered or stored in various ways, for example, as an image of the microarray, as a table of numbers, or the like.
“Solid support”, “support”, and “solid phase support” are used interchangeably and refer to a material or group of materials having a rigid or semi-rigid surface or surfaces. In many embodiments, at least one surface of the solid support will be substantially flat, although in some embodiments it may be desirable to physically separate synthesis regions for different compounds with, for example, wells, raised regions, pins, etched trenches, or the like. According to other embodiments, the solid support(s) will take the form of beads, resins, gels, microspheres, or other geometric configurations. Microarrays usually comprise at least one planar solid phase support, such as a glass microscope slide.
“Specific” or “specificity” in reference to the binding of one molecule to another molecule, such as a labeled target sequence for a probe, means the recognition, contact, and formation of a stable complex between the two molecules, together with substantially less recognition, contact, or complex formation of that molecule with other molecules. In one aspect, “specific” in reference to the binding of a first molecule to a second molecule means that to the extent the first molecule recognizes and forms a complex with another molecule in a reaction or sample, it forms the largest number of the complexes with the second molecule. Preferably, this largest number is at least fifty percent. Generally, molecules involved in a specific binding event have areas on their surfaces or in cavities giving rise to specific recognition between the molecules binding to each other. Examples of specific binding include antibody-antigen interactions, enzyme-substrate interactions, formation of duplexes or triplexes among polynucleotides and/or oligonucleotides, receptor-ligand interactions, and the like. As used herein, “contact” in reference to specificity or specific binding means two molecules are close enough that weak non-covalent chemical interactions, such as Van der Waal forces, hydrogen bonding, base-stacking interactions, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules.
“Tm” is used in reference to “melting temperature.” Melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. Several equations for calculating the Tm of nucleic acids are well known in the art. As indicated by standard references, a simple estimate of the Tm value may be calculated by the equation. Tm=81.5+0.41 (% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl (see e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985). Other references (e.g., Allawi, H. T. & SantaLucia, J., Jr., Biochemistry 36, 10581-94 (1997)) include alternative methods of computation which take structural and environmental, as well as sequence characteristics into account for the calculation of Tm.
“Sample” means a quantity of material from a biological, environmental, medical, or patient source in which detection or measurement of target nucleic acids is sought. On the one hand it is meant to include a specimen or culture (e.g., microbiological cultures). On the other hand, it is meant to include both biological and environmental samples. A sample may include a specimen of synthetic origin. Biological samples may be animal, including human, fluid, solid (e.g., stool) or tissue, as well as liquid and solid food and feed products and ingredients such as dairy items, vegetables, meat and meat by-products, and waste. Biological samples may include materials taken from a patient including, but not limited to cultures, blood, saliva, cerebral spinal fluid, pleural fluid, milk, lymph, sputum, semen, needle aspirates, and the like. Biological samples may be obtained from all of the various families of domestic animals, as well as feral or wild animals, including, but not limited to, such animals as ungulates, bear, fish, rodents, etc. Environmental samples include environmental material such as surface matter, soil, water and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, utensils, disposable and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the present invention.
The invention provides a collection of novel polymorphisms in genes encoding products related to susceptibility and/or course or outcome of coronary artery disease. Detection of polymorphisms in such genes is useful in designing and performing diagnostic assays for evaluation of genetic risks for coronary artery disease and other related conditions. Analysis of such polymorphisms is also useful in designing prophylactic and therapeutic regimes customized to underlying abnormalities. Detection of such polymorphisms is also useful for conducting clinical trials of drugs for treatment of these diseases and the underlying biological abnormalities. The polymorphisms of the invention also have more general applications, such as forensics, paternity testing, linkage analysis and positional cloning.
A study was designed to assess common alleles in 237 candidate genes for association with CAD and to assess rare alleles in a subset of such candidate genes for association with CAD. Common alleles were measured using molecular inversion probes, which are disclosed in the following references that are incorporated by reference: Willis et al, U.S. Pat. No. 6,858,412; Hardenbol et al, Nature Biotechnology, 21: 673-678 (2003); and Hardenbol et al, Genome Research, 15: 269-275 (2005). Rare alleles were identified using mismatch repair detection, a polymorphism discovery technique disclosed in the following references that are incorporated by reference: Faham et al, U.S. patent publication 2003/0003472; Faham et al, Human Molecular Genetics, 10: 1657-1664 (2001); and Fakhrair-Rad et al, Genome Research, 14: 1404-1412 (2004). After discovery, rare single nucleotide polymorphisms were measured using molecular inversion probe technology.
Analyzed samples included DNA obtained from 314 cases and 314 controls. The enrollment criteria for cases were as follows: US White Caucasians; CAD with 70% occlusion or greater in at least one artery. Cases were excluded if any of the following applied: diagnosed with type 2 diabetes; MI (heart attack); extreme hypertension (systolic bp>180 or diastolic bp>100 or with end-organ damage); hypotension (systolic bp<90 and diastolic bp<50); heavy smokers (current users of greater than one pack per day). Controls were drawn from a group of healthy US White Caucasians. Exclusion criteria were as follows: first degree-relatives with type 2 diabetes; hypertension (systolic bp>140 or diastolic bp>90); fasting glucose>126 mg/dl; total cholesterol=>300 mg/dl. Controls were matched to cases with the following criteria: exact match for gender; age match control is −3/+6 years of the case; body mass index (BMI) match is ±3 units.
Measures of the association of polymorphisms in cases with CAD were obtained by applying chi-squared tests and Fisher's exact tests to the data as follows: (i) p-value of chi-square of 2-by-2 allele count table (designated “P-Value Allele Chiˆ2”), (ii) Fisher Exact test p-value (2-sided) of 2-by-2 allele count table (designated “P-Value Allele Fisher”), (iii) p-value of chi-square of 3-by-2 genotype count table (designated “P-Value Genotype Chiˆ2”), (iv) Fisher Exact test p-value (2-sided) of 2-by-2 genotype count table assuming recessive model (minor/minor vs all others)(designated “P-Value Recessive Fisher”), and (v) Fisher Exact test p-value (2-sided) of 2-by-2 genotype count table assuming dominant model (carriers of minor allele vs all others) (designated “P-Value Dominant Fisher”). If any of the resulting measures was equal to or less than 0.05, the polymorphism was deemed to have a significant association with CAD. Such polymorphisms are listed in Table 1. Column designations (in addition to those described above) are as follows: “SNP” is a reference SNP (or “refSNP”) cluster identifier or an MRD ID (which are further identified by sequence in Table 2); “Gene” is a HUGO or Ensembl ID (as of May 2005); “Chrom” is chromosome number; “Posn” is the base position on the indicated chromosome (NCBI build 35); “Major Allele” is self-explanatory; “Minor Allele” is self-explanatory; “1 Case” is the count of major allele in cases; “2 Case” is the count of minor allele in cases; “1 Cntr” is the count of major allele in controls; “2 Cntr” is the count of minor allele in controls; “1/1 Case” is the count of major/major genotype in cases; “1/2 Case” is the count of major/minor genotype in cases; “2/2 Case” is the count of minor/minor genotype in cases; “1/1 Cntr” is the count of major/major genotype in controls; “1/2 Cntr” is the count of major/minor genotype in controls; and “2/2 Cntr” is the count of minor/major genotype in controls. SNPs in Table 2 were identified in the study. The SNP alleles are as follows: M is A or C, Y is C or T, W is A or T and K is G or T.
In one aspect, SNPs in CDC42 and PARD3 were found to act together to increase risk of CAD. Individuals who carry at-risk polymorphisms at both rs16826506 (CDC42) and rs1545214 (PARD3) are at about six times the risk of developing CAD than someone who has wild-type alleles at both loci, as determined by a 2-sided Fisher Exact Test p-value of 0.006. SNP rs16826506 is found in the following sequence: SEQ ID NO 1: 5′-aagtaatggt atattaaatt tggaatatag Mgaaaacaat gacccataat gtcatgataa a-3′ where the major allele is A and the minor allele is C. SNP rs1545214 is found in the following sequence SEQ ID NO 2: 5′-tcaattttag aatgtcaggg ctgtctatgg Matattccaa acctcgacat tcaaagtggc a-3′ with major allele C and minor allele A. In some aspects the patient may be identified at being at increased risk for CAD if the patient is heterozygous at both SNPs rs1545214 and rs16826506, but the patient may also be identified as being at increased risk if the patient is homozygous for the minor allele of one or both SNPs.
A. Preparation of Samples. Polymorphisms are detected in a target nucleic acid from an individual being analyzed. For assay of genomic DNA, virtually any biological sample (other than pure red blood cells) is suitable. For example, convenient tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal epithelium, skin and hair. For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target nucleic acid is expressed.
Many of the methods described below require amplification of DNA from target samples. This can be accomplished by e.g., PCR. See generally PCR Technology: Principles and Applications for DNA Amplification (ed. H. A. Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and Applications (eds. Innis, et al., Academic Press, San Diego, Calif., 1990); Mattila et al., Nucleic Acids Res. 19, 4967 (1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al., IRL Press, Oxford); and U.S. Pat. No. 4,683,202 (each of which is incorporated by reference for all purposes).
Other suitable amplification methods include the ligase chain reaction (LCR) (see Wu and Wallace, Genomics 4, 560 (1989), Landegren et al., Science 241, 1077 (1988), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86, 1173 (1989)), and self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA). The latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.
B. Single Base Extension Methods for Detecting Polymorphisms. Single base extension methods are described by e.g., U.S. Pat. No. 5,846,710, U.S. Pat. No. 6,004,744, U.S. Pat. No. 5,888,819 and U.S. Pat. No. 5,856,092. In brief, the methods work by hybridizing a primer that is complementary to a target sequence such that the 3′ end of the primer is immediately adjacent to but does not span a site of potential variation in the target sequence. That is, the primer comprises a subsequence from the complement of a target polynucleotide terminating at the base that is immediately adjacent and 5′ to the polymorphic site. The hybridization is performed in the presence of one or more labeled nucleotides complementary to base(s)that may occupy the site of potential variation. For example, for a biallelic polymorphisms two differentially labeled nucleotides can be used. For a tetraallelic polymorphism four differentially labeled nucleotides can be used. In some methods, particularly methods employing multiple differentially labeled nucleotides, the nucleotides are dideoxynucleotides. Hybridization is performed under conditions permitting primer extension if a nucleotide complementary to a base occupying the site of variation in the target sequence is present. Extension incorporates a labeled nucleotide thereby generating a labeled extended primer. If multiple differentially labeled nucleotides are used and the target is heterozygous then multiple differentially labeled extended primers can be obtained. Extended primers are detected providing an indication of which bas(es) occupy the site of variation in the target polynucleotide.
C. Allele-Specific Probes. The design and use of allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15 mer at the 7 position; in a 16 mer, at either the 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms. Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence. The polymorphisms can also be identified by hybridization to nucleic acid arrays, some example of which are described by WO 95/11995 (incorporated by reference in its entirety for all purposes).
D. Allele-Specific Amplification Methods. An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarily. See Gibbs, Nucleic Acid Res. 17, 2427-2448 (1989). This primer is used in conjunction with a second primer that hybridizes at a distal site. Amplification proceeds from the two primers leading to a detectable product signifying the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarily to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. In some methods, the mismatch is included in the 3′-most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer. See, for example, WO 93/22456.
E. Direct-Sequencing. The direct analysis of the sequence of polymorphisms of the present invention can be accomplished using either the dideoxy-chain termination method or the Maxam-Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory Manual, (Acad. Press, 1988)).
F. Denaturing Gradient Gel Electrophoresis. Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, (W. H. Freeman and Co, New York, 1992), Chapter 7.
G. Single-Strand Conformation Polymorphism Analysis. Alleles of target sequences can be differentiated using single-strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770 (1989). Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single-stranded nucleic acids may refold or form secondary structures that are partially dependent on the base sequence. The different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence difference between alleles of target sequences.
After determining polymorphic form(s) present in an individual at one or more polymorphic sites, this information can be used in a number of methods.
A. Association Studies with CAD
The polymorphisms of the invention may contribute to the phenotype of an organism in different ways. Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure. The effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances. By analogy, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation causes severe disease. Other polymorphisms occur in non-coding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation. A single polymorphism may affect more than one phenotypic trait. Likewise, a single phenotypic trait may be affected by polymorphisms in different genes.
Correlation is performed for a population of individuals who have been tested for the presence or absence of CAD or an intermediate phenotype and for one or more polymorphic markers or polymorphic sites. To perform such analysis, the presence or absence of a set of polymorphic forms (i.e. a polymorphic set) is determined for a set of the individuals, some of whom exhibit a particular trait, and some of whom may exhibit lack of the trait. The alleles of each polymorphism of the set are then reviewed to determine whether the presence or absence of a particular allele is associated with the trait of interest. Correlation can be performed by standard statistical methods including, but not limited to, chi-squared test, Fisher Exact Test, Analysis of Variance, contingency table tests, logistic regression, parametric linkage analysis, non-parametric linkage analysis etc and statistically significant correlations between polymorphic form(s) and phenotypic characteristics are noted. For example, it might be found that the presence of allele A1 at polymorphism A correlates with CAD, measured either as a categorical or continuous trait. As a further example, it might be found that the combined presence of allele A1 at polymorphism A and allele B1 at polymorphism B correlates with CAD or a sub-phenotype.
B. Diagnosis of CAD
Polymorphic forms that correlate with CAD or sub-phenotypes are useful in diagnosing CAD or susceptibility thereto or predicting disease prognosis. Combined detection of several such polymorphic forms typically increases the probability of an accurate diagnosis. For example, the presence of a single polymorphic form known to correlate with CAD might indicate a probability of 20% that an individual has or is susceptible to CAD, whereas detection of five polymorphic forms, each of which correlates with less than 20% probability, might indicate a probability up to 80% that an individual has or is susceptible to CAD. Analysis of the polymorphisms of the invention can be combined with that of other polymorphisms or other risk factors for CAD, such as family history. Polymorphisms could be used to diagnose CAD or sub-phenotypes of CAD at the pre-symptomatic stage, as a method of post-symptomatic diagnosis, as a method of confirmation of diagnosis or as a post-mortem diagnosis.
Patients diagnosed with CAD can be treated with conventional therapies and/or can be counseled to avoid environmental factors that exacerbate the condition. Patients diagnosed with CAD may also be counseled about the risk of genetically transmitting the disease to offspring, or counseled about the risk of family members sharing genetic variation(s) relevant to CAD.
C. Drug Screening
The polymorphism(s) showing the strongest correlation with CAD within a given gene are likely either to have a causative role in the manifestation of the phenotype or to be in linkage disequilibrium with the causative variants. Such a role can be confirmed by in vitro gene expression of the variant gene or by producing a transgenic animal expressing a human gene bearing such a polymorphism and determining whether the animal develops the phenotype. Polymorphisms in coding regions that result in amino acid changes usually cause CAD by decreasing, increasing or otherwise altering the activity of the protein encoded by the gene in which the polymorphism occurs. Polymorphisms in coding regions that introduce stop codons usually cause CAD by reducing (heterozygote) or eliminating (homozygote) functional protein produced by the gene. Occasionally, stop codons result in production of a truncated peptide with aberrant activities relative to the full-length protein. Polymorphisms in regulatory regions typically cause CAD or related phenotypes by causing increased or decreased expression of the protein encoded by the gene in which the polymorphism occurs. Polymorphisms in exonic or untranslated sequences can cause CAD or related phenotypes either through the same mechanism as polymorphisms in regulatory sequences or by causing altered spliced patterns resulting in an altered protein.
Having identified certain polymorphisms as having causative roles in CAD, and having elucidated at least in general terms whether such polymorphisms increase or decrease the activity or expression level of associated proteins, customized therapies can be devised for classes of patients with different genetic subtypes of diseases. For example, if a polymorphism in a given protein causes CAD by increasing the expression level or activity of the protein, the diseases associated with the polymorphism can be treated by administering an antagonist of the protein. If a polymorphism in a given protein causes CAD by decreasing the expression level or activity of a protein, the form of CAD associated with the polymorphism can be treated by administering the protein itself, a nucleic acid encoding the protein that can be expressed in a patient, or an analog or agonist of the protein.
The polymorphisms of the invention are also useful for conducting clinical trials of drug candidates for CAD. Such trials are performed on treated or control populations having similar or identical polymorphic profiles at a defined collection of polymorphic sites. Use of genetically matched populations eliminates or reduces variation in treatment outcome due to genetic factors, leading to a more accurate assessment of the efficacy of a potential drug.
Furthermore, the polymorphisms of the invention may be used after the completion of a clinical trial to elucidate differences in response to a given treatment. For example, the set of polymorphisms may be used to stratify the enrolled patients into disease sub-types or classes. It may further be possible to use the polymorphisms to identify subsets of patients with similar polymorphic profiles who have unusual (high or low) response to treatment or who do not respond at all (non-responders). In this way, information about the underlying genetic factors influencing response to treatment can be used in many aspects of the development of treatment (these range from the identification of new targets, through the design of new trials to product labeling and patient targeting). Additionally, the polymorphisms may be used to identify the genetic factors involved in adverse response to treatment (also called adverse events or adverse drug reactions). For example, patients who show adverse response may have more similar polymorphic profiles than would be expected by chance. This would allow the early identification and exclusion of such individuals from treatment. It would also provide information that might be used to understand the biological causes of adverse events and to modify the treatment to avoid such outcomes.
Even if a polymorphism is not causative but is instead in linkage disequilibrium with a causative variant, it may still be useful for genetic tests, including diagnosis, prognosis, and drug screening.
D. Forensics
Determination of which polymorphic forms occupy a set of polymorphic sites in an individual identifies a set of polymorphic forms that distinguishes the individual. See generally National Research Council, The Evaluation of Forensic DNA Evidence (Eds. Pollard et al., National Academy Press, DC, 1996). The more sites that are analyzed the lower the probability that the set of polymorphic forms in one individual is the same as that in an unrelated individual. Preferably, if multiple sites are analyzed, the sites are unlinked. Thus, polymorphisms of the invention are often used in conjunction with polymorphisms in distal genes. Preferred polymorphisms for use in forensics are diallelic because the population frequencies of two polymorphic forms can usually be determined with greater accuracy than those of multiple polymorphic forms at multi-allelic loci.
The capacity to identify a distinguishing or unique set of forensic markers in an individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at the crime scene. If frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals), one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance.
p(ID) is the probability that two random individuals have the same polymorphic or allelic form at a given polymorphic site. In diallelic loci, four genotypes are possible: AA, AB, BA, and BB. If alleles A and B occur in a haploid genome of the organism with frequencies x and y, the probability of each genotype in a diploid organism are (see WO 95/12607):
Homozygote: p(AA)=x2
Homozygote: p(BB)=y2=(1−x)2
Single Heterozygote: p(AB)=p(BA)=xy=x(1−x)
Both Heterozygotes: p(AB+BA)=2xy=2x(1−x)
The probability of identity at one locus (i.e., the probability that two individuals, picked at random from a population will have identical polymorphic forms at a given locus) is given by the equation:
p(ID)=(x2)2+(2xy)2+(y2)2.
These calculations can be extended for any number of polymorphic forms at a given locus. For example, the probability of identity p(ID) for a 3-allele system where the alleles have the frequencies in the population of x, y and z, respectively, is equal to the sum of the squares of the genotype frequencies:
p(ID)=x4+(2xy)2+(2yz)2+(2xz)2+z4+y4
In a locus of n alleles, the appropriate binomial expansion is used to calculate p(ID) and p(exc).
The cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci is determined by multiplying the probabilities provided by each locus.
cum p(ID)=p(ID1)p(ID2)p(ID3) . . . p(IDn)
The cumulative probability of non-identity for n loci (i.e. the probability that two random individuals will be different at 1 or more loci) is given by the equation:
cum p(nonID)=1−cum p(ID).
If several polymorphic loci are tested, the cumulative probability of non-identity for random individuals becomes very high (e.g., one billion to one). Such probabilities can be taken into account together with other evidence in determining the guilt or innocence of the suspect.
F. Paternity Testing
The object of paternity testing is usually to determine whether a male is the father of a child. In most cases, the mother of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child's genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of polymorphisms in the putative father and the child.
If the set of polymorphisms in the child attributable to the father does not match the putative father, it can be concluded, barring experimental error, that the putative father is not the real biological father. If the set of polymorphisms in the child attributable to the father does match the set of polymorphisms of the putative father, a statistical calculation can be performed to determine the probability of coincidental match.
The probability of parentage exclusion (representing the probability that a random male will have a polymorphic form at a given polymorphic site that makes him incompatible as the father) is given by the equation (see WO 95/12607):
p(exc)=xy(1−xy)
where x and y are the population frequencies of alleles A and B of a diallelic polymorphic site.
(At a triallelic site p(exc)=xy(1−xy)+yz(1−yz)+xz(l-xz)+3xyz(1−xyz))), where x, y and z and the respective population frequencies of alleles A, B and C).
The probability of non-exclusion is
p(non-exc)=1−p(exc)
The cumulative probability of non-exclusion (representing the value obtained when n loci are used) is thus:
cum p(non-exc)=p(non-exc1)p(non-exc2)p(non-exc3) . . . p(non-excn)
The cumulative probability of exclusion for n loci (representing the probability that a random male will be excluded)
cum p(exc)=1−cum p(non-exc).
If several polymorphic loci are included in the analysis, the cumulative probability of exclusion of a random male is very high. This probability can be taken into account in assessing the liability of a putative father whose polymorphic marker set matches the child's polymorphic marker set attributable to his/her father.
G. Genetic Mapping of Phenotypic Traits
The polymorphisms shown in Table 2 can also be used to establish physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with it. Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position, and thereby cloning gene(s) responsible for the trait. See Lander et al., Proc. Natl. Acad. Sci. (USA) 83, 7353-7357 (1986); Lander et al., Proc. Natl. Acad. Sci. (USA) 84, 2363-2367 (1987); Donis-Keller et al., Cell 51, 319-337 (1987); Lander et al., Genetics 121, 185-199 (1989)). Genes localized by linkage can be cloned by a process known as directional cloning. See Wainwright, Med. J. Australia 159, 170-174 (1993); Collins, Nature Genetics 1, 3-6 (1992) (each of which is incorporated by reference in its entirety for all purposes).
Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co-segregate with a phenotypic trait. See, e.g., Kerem et al., Science 245, 1073-1080 (1989); Monaco et al., Nature 316, 842 (1985); Yamoka et al., Neurology 40, 222-226 (1990); Rossiter et al., FASEB Journal 5, 21-27 (1991).
Linkage is analyzed by calculation of LOD (log of the odds) values. A lod value is the relative likelihood of obtaining observed segregation data for a marker and a genetic locus when the two are located at a recombination fraction θ, versus the situation in which the two are not linked, and thus segregating independently (Thompson & Thompson, Genetics in Medicine (5th ed, W.B. Saunders Company, Philadelphia, 1991); Strachan, “Mapping the human genome” in The Human Genome (BIOS Scientific Publishers Ltd, Oxford), Chapter 4). A series of likelihood ratios are calculated at various recombination fractions (θ), ranging from θ=0.0 (coincident loci) to θ=0.50 (unlinked). Thus, the likelihood at a given value of θ is proportional to the probability of data if loci linked at θ to probability of data if loci unlinked. The computed likelihoods are usually expressed as the log (base10) of this ratio (i.e., a lod score). For example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence. The use of logarithms allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of θ (e.g., LIPED, MLINK (Lathrop, Proc. Nat. Acad. Sci. (USA) 81, 3443-3446 (1984)). For any particular lod score, a recombination fraction may be determined from mathematical tables. See Smith et al., Mathematical tables for research workers in human genetics (Churchill, London, 1961); Smith, Ann. Hum. Genet. 32, 127-150 (1968). The value of θ at which the lod score is the highest is considered to be the best estimate of the recombination fraction. Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of θ) than the possibility that the two loci are unlinked. By convention, a combined lod score of +3 or greater (equivalent to greater than 1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked. Similarly, by convention, a negative lod score of −2 or less is taken as definitive evidence against linkage of the two loci being compared. Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations.
V. Kits
The invention further provides kits comprising assay components to implement detection assays specific for polymorphisms of the invention. Thus, for hybridization-based assays, such components include probes capable of forming stable duplexes with predetermined regions adjacent to or encompassing polymorphisms of the inventions. For assays that depend on a polymerase-based primer extension reaction for detection, such as a PCR or single-base extension reaction, assay components include at least one primer having a sequence complementary to a region adjacent to or encompassing such polymorphism. Likewise, for ligation-based detection assays, such as OLA, circularizable probe reactions, and the like, oligonucleotides are provided that are complementary to adjacent regions near, or encompassing, a polymorphism of the invention. In another embodiment, at least one allele-specific oligonucleotide is provided as described above. Often, the kits contain one or more pairs of allele-specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate. For example, the same substrate can comprise allele-specific oligonucleotide probes for detecting at least one, or at least 5, or at least 10, or all of the polymorphisms shown in Table 1. Optional additional components of the kit include, for example, restriction enzymes, reverse transcriptase or polymerase, the substrate nucleoside triphosphates, means used to label (for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin), and the appropriate buffers for reverse transcription, PCR, or hybridization reactions. Usually, the kit also contains instructions for carrying out the methods.
From the foregoing, it is apparent that the invention includes a number of general uses that can be expressed concisely as follows. The invention provides for the use of any of the nucleic acid segments described above in the diagnosis or monitoring of diseases, particularly CAD and related conditions. The invention further provides for the use of any of the nucleic acid segments in the manufacture of medicines for the treatment or prophylaxis of such diseases. The invention further provides for the use of any of the DNA segments as a pharmaceutical.
All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference. Although the present invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.
Mccgagaaggagcgggcggcggccggggcagcggttacagttgtgcggcc
Ygcccgtctccagggaagcagcttttggtccccatctggggcaagcctcc
Wcgctccagccgctcccagatttctgggatctaggagagagaagtggaga
Ygggtgatttgcgggggcagggtggtgtgcaggcctaagaagacagaggt
Kggaatattgactgtctttcactctccttggaccctagggctacagaact
Mtaaaactacactgaacatgtgaatagcatattgtggtggacaagagcaa
This application is related to U.S. Provisional Patent Application No. 60/715,038, filed Sep. 8, 2005, the entire disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60715038 | Sep 2005 | US |