Nucleic acids containing single nucleotide polymorphisms and methods of use thereof

BACKGROUND OF THE INVENTION

Sequence polymorphism-based analysis of nucleic acid sequences has lead to novel approaches for determining the identity and relatedness of individuals. The approach is generally based on alterations in nucleic acid sequences between related individuals. This analysis has been widely used in a variety of genetic, diagnostic, and forensic applications. For example, polymorphism analyses are used in identity and paternity analysis, and in genetic mapping studies.

Several different types of polymorphisms in nucleic acid have been described. One such type of variation is a restriction fragment length polymorphism (RFLP). RFLPS can create or delete a recognition sequence for a restriction endonuclease in one nucleic acid relative to a second nucleic acid. The result of the variation is in an alteration the relative length of restriction enzyme generated DNA fragments in the two nucleic acids.

Other polymorphisms take the form of short tandem repeats (STR) sequences, which are also referred to as variable numbers of tandem repeat (VNTR) sequences. STR sequences typically that include tandem repeats of 2, 3, or 4 nucleotide sequences that are present in a nucleic acid from one individual but absent from a second, related individual at the corresponding genomic location.

Other polymorphisms take the form of single nucleotide variations, termed single nucleotide polymorphisms (SNPs), between individuals. A SNP can, in some instances, be referred to as a “cSNP” to denote that the nucleotide sequence containing the SNP originates as a cDNA.

SNPs can arise in several ways. A single nucleotide polymorphism may arise due to a substitution of one nucleotide for another at the polymorphic site. Substitutions can be transitions or transversions. A transition is the replacement of one purine nucleotide by another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine, or the converse.

Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele. Thus, the polymorphic site is a site at which one allele bears a gap with respect to a single nucleotide in another allele. Some SNPs occur within, or near genes. One such class includes SNPs falling within regions of genes encoding for a polypeptide product. These SNPs may result in an alteration of the amino acid sequence of the polypeptide product and give rise to the expression of a defective or other variant protein. Such variant products can, in some cases result in a pathological condition, e.g., genetic disease. Examples of genes in which a polymorphism within a coding sequence gives rise to genetic disease include sickle cell anemia and cystic fibrosis. Other SNPs do not result in alteration of the polypeptide product. Of course, SNPs can also occur in noncoding regions of genes.

SNPs tend to occur with great frequency and are spaced uniformly throughout the genome. The frequency and uniformity of SNPs means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest.

SUMMARY OF THE INVENTION

The invention is based in part on the discovery of novel single nucleotide polymorphisms (SNPs) in regions of human DNA.

Accordingly, in one aspect, the invention provides an isolated polynucleotide which includes one or more of the SNPs described herein. The polynucleotide can be, e.g., a nucleotide sequence which includes one or more of the polymorphic sequences shown in Table 1 (SEQ ID NOS: 1-1192) and which includes a polymorphic sequence, or a fragment of the polymorphic sequence, as long as it includes the polymorphic site. The polynucleotide may alternatively contain a nucleotide sequence which includes a sequence complementary to one or more of the sequences (SEQ ID NOS: 1-1192), or a fragment of the complementary nucleotide sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence.

The polynucleotide can be, e.g., DNA or RNA, and can be between about 10 and about 100 nucleotides, e.g, 10-90, 10-75, 10-51, 10-40, or 10-30, nucleotides in length.

In some embodiments, the polymorphic site in the polymorphic sequence includes a nucleotide other than the nucleotide listed in Table 1, column 5 for the polymorphic sequence, e.g., the polymorphic site includes the nucleotide listed in Table 1, column 6 for the polymorphic sequence.

In other embodiments, the complement of the polymorphic site includes a nucleotide other than the complement of the nucleotide listed in Table 1, column 5 for the complement of the polymorphic sequence, e.g., the complement of the nucleotide listed in Table 1, column 6 for the polymorphic sequence.

In some embodiments, the polymorphic sequence is associated with a polypeptide related to one of the protein families disclosed herein. For example, the nucleic acid may be associated with a polypeptide related to angiopoietin, 4-hydroxybutyrate dehydrogenase, or any of the other proteins identified in Table 1, column 10.

In another aspect, the invention provides an isolated allele-specific oligonucleotide that hybridizes to a first polynucleotide containing a polymorphic site. The first polynucleotide can be, e.g., a nucleotide sequence comprising one or more polymorphic sequences (SEQ ID NOS:1-1192), provided that the polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic sequence. Alternatively, the first polynucleotide can be a nucleotide sequence that is a fragment of the polymorphic sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence, or a complementary nucleotide sequence which includes a sequence complementary to one or more polymorphic sequences (SEQ ID NOS:1-1192), provided that the complementary nucleotide sequence includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5. The first polynucleotide may in addition include a nucleotide sequence that is a fragment of the complementary sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence.

In some embodiments, the oligonucleotide does not hybridize under stringent conditions to a second polynucleotide. The second polynucleotide can be, e.g., (a) a nucleotide sequence comprising one or more polymorphic sequences (SEQ ID NOS:1-1192), wherein the polymorphic sequence includes the nucleotide listed in Table 1, column 5 for the polymorphic sequence; (b) a nucleotide sequence that is a fragment of any of the polymorphic sequences; (c) a complementary nucleotide sequence including a sequence complementary to one or more polymorphic sequences (SEQ ID NOS:1-1192), wherein the polymorphic sequence includes the complement of the nucleotide listed in Table 1, column 5; and (d) a nucleotide sequence that is a fragment of the complementary sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence.

The oligonucleotide can be, e.g., between about 10 and about 100 bases in length. In some embodiments, the oligonucleotide is between about 10 and 75 bases, 10 and 51 bases, 10 and about 40 bases, or about 15 and 30 bases in length.

The invention also provides a method of detecting a polymorphic site in a nucleic acid. The method includes contacting the nucleic acid with an oligonucleotide that hybridizes to a polymorphic sequence selected from the group consisting of SEQ ID NOS: 1-1192, or its complement, provided that the polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic sequence, or the complement includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5. The method also includes determining whether the nucleic acid and the oligonucleotide hybridize. Hybridization of the oligonucleotide to the nucleic acid sequence indicates the presence of the polymorphic site in the nucleic acid.

In preferred embodiments, the oligonucleotide does not hybridize to the polymorphic sequence when the polymorphic sequence includes the nucleotide recited in Table 1, column 5 for the polymorphic sequence, or when the complement of the polymorphic sequence includes the complement of the nucleotide recited in Table 1, column 5 for the polymorphic sequence.

The oligonucleotide can be, e.g., between about 10 and about 100 bases in length. In some embodiments, the oligonucleotide is between about 10 and 75 bases, 10 and 51 bases, 10 and about 40 bases, or about 15 and 30 bases in length.

In some embodiments, the polymorphic sequence identified by the oligonucleotide is associated with a nucleic acid encoding polypeptide related to one of the protein families disclosed herein, the polymorphic sequence is associated with a polypeptide related to one of the protein families disclosed herein. For example, the nucleic acid may be associated with a polypeptide related to angiopoietin, 4-hydroxybutyrate dehydrogenase, or any of the other proteins identified in Table 1, column 10.

In a further aspect, the invention provides a method of determining the relatedness of a first and second nucleic acid. The method includes providing a first nucleic acid and a second nucleic acid and contacting the first nucleic acid and the second nucleic acid with an oligonucleotide that hybridizes to a polymorphic sequence selected from the group consisting of SEQ ID NOS: 1-1192, or its complement, provided that the polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic sequence, or the complement includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5. The method also includes determining whether the first nucleic acid and the second nucleic acid hybridize to the oligonucleotide, and comparing hybridization of the first and second nucleic acids to the oligonucleotide. Hybridization of first and second nucleic acids to the nucleic acid indicates the first and second subjects are related.

In preferred embodiments, the oligonucleotide does not hybridize to the polymorphic sequence when the polymorphic sequence includes the nucleotide recited in Table 1, column 5 for the polymorphic sequence, or when the complement of the polymorphic sequence includes the complement of the nucleotide recited in Table 1, column 5 for the polymorphic sequence.

The oligonucleotide can be, e.g., between about 10 and about 100 bases in length. In some embodiments, the oligonucleotide is between about 10 and 75 bases, 10 and 51 bases, 10 and about 40 bases, or about 15 and 30 bases in length.

The method can be used in a variety of applications. For example, the first nucleic acid may be isolated from physical evidence gathered at a crime scene, and the second nucleic acid may be obtained is a person suspected of having committed the crime. Matching the two nucleic acids using the method can establishing whether the physical evidence originated from the person.

In another example, the first sample may be from a human male suspected of being the father of a child and the second sample may be from a child. Establishing a match using the described method can establishing whether the male is the father of the child.

In another aspect, the method includes determining if a sequence polymorphism is the present in a subject, such as a human. The method includes providing a nucleic acid from the subject and contacting the nucleic acid with an oligonucleotide that hybridizes to a polymorphic sequence selected from the group consisting of SEQ ID NOS: 1-1192, or its complement, provided that the polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for said polymorphic sequence, or the complement includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5. Hybridization between the nucleic acid and the oligonucleotide is then determined. Hybridization of the oligonucleotide to the nucleic acid sequence indicates the presence of the polymorphism in said subject.

In another aspect, the invention provides an isolated polypeptide comprising a polymorphic site at one or more amino acid residues, and wherein the protein is encoded by a polynucleotide including one of the polymorphic sequences SEQ ID NOS:1-1192, or their complement, provided that the polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic sequence, or the complement includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5.

The polypeptide can be, e.g., related to one of the protein families disclosed herein. For example, polypeptide can be related to angiopoietin, 4-hydroxybutyrate dehydrogenase, ATP-dependent RNA helicase, MHC Class I histocompatibility antigen, or phosphoglycerate kinase.

In some embodiments, the polypeptide is translated in the same open reading frame as is a wild type protein whose amino acid sequence is identical to the amino acid sequence of the polymorphic protein except at the site of the polymorphism.

In some embodiments, the polypeptide encoded by the polymorphic sequence, or its complement, includes the nucleotide listed in Table 1, column 6 for the polymorphic sequence, or the complement includes the complement of the nucleotide listed in Table 1, column 6.

The invention also provides an antibody that binds specifically to a polypeptide encoded by a polynucleotide comprising a nucleotide sequence encoded by a polynucleotide selected from the group consisting of polymorphic sequences SEQ ID NOS:1-1192, or its complement. The polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic sequence, or the complement includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5.

In some embodiments, the antibody binds specifically to a polypeptide encoded by a polymorphic sequence which includes the nucleotide listed in Table 1, column 6 for the polymorphic sequence.

Preferably, the antibody does not bind specifically to a polypeptide encoded by a polymorphic sequence which includes the nucleotide listed in Table 1, column 5 for the polymorphic sequence.

The invention further provides a method of detecting the presence of a polypeptide having one or more amino acid residue polymorphisms in a subject. The method includes providing a protein sample from the subject and contacting the sample with the above-described antibody under conditions that allow for the formation of antibody-antigen complexes. The antibody-antigen complexes are then detected. The presence of the complexes indicates the presence of the polypeptide.

The invention also provides a method of treating a subject suffering from, at risk for, or suspected of, suffering from a pathology ascribed to the presence of a sequence polymorphism in a subject, e.g., a human, non-human primate, cat, dog, rat, mouse, cow, pig, goat, or rabbit. The method includes providing a subject suffering from a pathology associated with aberrant expression of a first nucleic acid comprising a polymorphic sequence selected from the group consisting of SEQ ID NOS:1-1192, or its complement, and treating the subject by administering to the subject an effective dose of a therapeutic agent. Aberrant expression can include qualitative alterations in expression of a gene, e.g., expression of a gene encoding a polypeptide having an altered amino acid sequence with respect to its wild-type counterpart. Qualitatively different polypeptides can include, shorter, longer, or altered polypeptides relative to the amino acid sequence of the wild-type polypeptide. Aberrant expression can also include quantitative alterations in expression of a gene. Examples of quantitative alterations in gene expression include lower or higher levels of expression of the gene relative to its wild-type counterpart, or alterations in the temporal or tissue-specific expression pattern of a gene. Finally, aberrant expression may also include a combination of qualitative and quantitative alterations in gene expression.

The therapeutic agent can include, e.g., second nucleic acid comprising the polymorphic sequence, provided that the second nucleic acid comprises the nucleotide present in the wild type allele. In some embodiments, the second nucleic acid sequence comprises a polymorphic sequence which includes nucleotide listed in Table 1, column 5 for the polymorphic sequence.

Alternatively, the therapeutic agent can be a polypeptide encoded by a polynucleotide comprising polymorphic sequence selected from the group consisting of SEQ ID NOS:1-1192, or by a polynucleotide comprising a nucleotide sequence that is complementary to any one of polymorphic sequences SEQ ID NOS:1-1192, provided that the polymorphic sequence includes the nucleotide listed in Table 1, column 6 for the polymorphic sequence.

The therapeutic agent may further include an antibody as herein described, or an oligonucleotide comprising a polymorphic sequence selected from the group consisting of SEQ ID NOS:1-1192, or by a polynucleotide comprising a nucleotide sequence that is complementary to any one of polymorphic sequences SEQ ID NOS:1-1192, provided that the polymorphic sequence includes the nucleotide listed in Table 1, column 5 or Table 1, column 6 for the polymorphic sequence,

In another aspect, the invention provides an oligonucleotide array comprising one or more oligonucleotides hybridizing to a first polynucleotide at a polymorphic site encompassed therein. The first polynucleotide can be, e.g., a nucleotide sequence comprising one or more polymorphic sequences (SEQ ID NOS:1-1192); a nucleotide sequence that is a fragment of any of the nucleotide sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence; a complementary nucleotide sequence comprising a sequence complementary to one or more polymorphic sequences (SEQ ID NOS:1-1192); or a nucleotide sequence that is a fragment of the complementary sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence.

In preferred embodiments, the he array comprises 10; 100; 1,000; 10,000; 100,000 or more oligonucleotides.

The invention also provides a kit comprising one or more of the herein-described nucleic acids. The kit can include, e.g., polynucleotide which includes one or more of the SNPs described herein. The polynucleotide can be, e.g., a nucleotide sequence which includes one or more of the polymorphic sequences shown in Table 1 (SEQ ID NOS: 1-1192) and which includes a polymorphic sequence, or a fragment of the polymorphic sequence, as long as it includes the polymorphic site. The polynucleotide may alternatively contain a nucleotide sequence which includes a sequence complementary to one or more of the sequences (SEQ ID NOS:1-1192), or a fragment of the complementary nucleotide sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence. Alternatively, or in addition, the kit can include the invention provides an isolated allele-specific oligonucleotide that hybridizes to a first polynucleotide containing a polymorphic site. The first polynucleotide can be, e.g., a nucleotide sequence comprising one or more polymorphic sequences (SEQ ID NOS:1-1192), provided that the polymorphic sequence includes a nucleotide other than the nucleotide recited in Table 1, column 5 for the polymorphic sequence. Alternatively, the first polynucleotide can be a nucleotide sequence that is a fragment of the polymorphic sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence, or a complementary nucleotide sequence which includes a sequence complementary to one or more polymorphic sequences (SEQ ID NOS:1-1192), provided that the complementary nucleotide sequence includes a nucleotide other than the complement of the nucleotide recited in Table 1, column 5. The first polynucleotide may in addition include a nucleotide sequence that is a fragment of the complementary sequence, provided that the fragment includes a polymorphic site in the polymorphic sequence.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In the case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Other features and advantages of the invention will be apparent from the following detailed description and claims.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides human SNPs in sequences which are transcribed, i.e., are cSNPs, and which have not been previously described. As is explained in more detail below, many SNPs have been identified in genes related to polypeptides of known function. If desired, SNPs associated with various polypeptides can be used together. For example, SNPs can be grouped according to whether they are derived from a nucleic acid encoding a polypeptide related to particular protein family or involved in a particular function. Thus, SNPs related to ATPase associated protein may be used together, as may SNPs associated with cadherin, or ephrin (EPH), or any of the other proteins recited in Table 1, column 10. Similarly, SNPs can be grouped according to the functions played by their gene products. Such functions include, structural proteins, proteins from which associated with metabolic pathways fatty acid metabolism, glycolysis, intermediary metabolism, calcium metabolism, proteases, and amino acid metabolism, etc.

The SNPs are shown in Table 1. Table 1 provides a summary of the polymorphic sequences disclosed herein. In the Table, a “SNP” is a polymorphic site embedded in a polymorphic sequence. The polymorphic site is occupied by a single nucleotide, which is the position of nucleotide variation between the wild type and polymorphic allelic sequences. The site is usually preceded by and followed by relatively highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations). Thus, a polymorphic sequence can include one or more of the following sequences: (1) a sequence having the nucleotide denoted in Table 1, column 5 at the polymorphic site in the polymorphic sequence: and (2) a sequence having a nucleotide other than the nucleotide denoted in Table 1, column 5 at the polymorphic site in the polymorphic sequence. An example of the latter sequence is a polymorphic sequence having the nucleotide denoted in Table 1, column 6 at the polymorphic site in the polymorphic sequence.

Nucleotide sequences for a referenced-polymorphic pair are presented in Table 1. The choice of designating one sequence of the cognate pair as a “reference” sequence and the second cognate of the pair as a “polymorphic” sequence is arbitrary. Each cSNP entry provides information concerning both the reference nucleotide sequence as well as the cognate polymorphic sequence occurring at a given polymorphic site. Each row of the Table provides this information for a given reference-polymorphism cognate pair. A reference to the sequence identifier number providing the sequences of both alleles is also provided. In addition, references to the SEQ ID NOS: giving the translated amino acid sequences are also given if appropriate.

Table 1 includes thirteen columns that provide descriptive information for each cSNP, each of which occupies one row in the Table. The column headings, and an explanation for each, are given below.

“SEQ ID” provides the cross-references to the two nucleotide SEQ ID NOS: for the cognate pair, which are numbered consecutively, and, as explained below, amino acid SEQ ID NOS: as well, in the Sequence Listing of the application. Conversely, each sequence entry in the Sequence Listing also includes a cross-reference to the CuraGen sequence ID, under the label “Accession number”. The first pair of SEQ ID NOS: given in the first column of each row of the Table is the SEQ ID NO: identifying the nucleic acid sequence for the polymorphism. If a polymorphism carries an entry for the amino acid portion of the row, a third SEQ ID NO: appears in parentheses in the column “Amino acid before” (see below) for the reference amino acid sequence, and a fourth SEQ ID NO: appears in parentheses in the column “Amino acid after” (see below) for the polymorphic amino acid sequence. The latter SEQ ID NOS: refer to amino acid sequences giving the cognate reference and polymorphic amino acid sequences that are the translation of the nucleotide polymorphism. If a polymorphism carries no entry for the protein portion of the row, only one pair SEQ ID NOS: is provided, in the first column.

“CuraGen sequence ID” provides CuraGen Corporation's accession number.

“Base pos. of SNP” gives the numerical position of the nucleotide in the nucleic acid at which the cSNP is found, as identified in this invention.

“Polymorphic sequence” provides a 51-base sequence with the polymorphic site at the 26

th

base in the sequence, as well as 25 bases from the reference sequence on the 5′ side and the 3′ side of the polymorphic site. The designation at the polymorphic site is enclosed in square brackets, and provides first, the reference nucleotide; second, a “slash (/)”; and third, the polymorphic nucleotide. In certain cases the polymorphism is an insertion or a deletion. In that case, the position that is “unfilled” (i.e., the reference or the polymorphic position) is indicated by the word “gap”.

“Base before” provides the nucleotide present in the reference sequence at the position at which the polymorphism is found.

“Base after” provides the altered nucleotide at the position of the polymorphism. “Amino acid before” provides the amino acid in the reference protein, if the polymorphism occurs in a coding region. This column also includes the SEQ ID NO: in parentheses for the translated reference amino acid sequence if the polymorphism occurs in a coding region.

“Amino acid after” provides the amino acid in the polymorphic protein, if the polymorphism occurs in a coding region. This column also includes the SEQ ID NO: in parentheses for the translated polymorphic amino acid sequence if the polymorphism occurs in a coding region.

“Type of change” provides information on the nature of the polymorphism. “SILENT-NONCODING” is used if the polymorphism occurs in a noncoding region of a nucleic acid. “SILENT-CODING” is used if the polymorphism occurs in a coding region of a nucleic acid of a nucleic acid and results in no change of amino acid in the translated polymorphic protein. “CONSERVATIVE” is used if the polymorphism occurs in a coding region of a nucleic acid and provides a change in which the altered amino acid falls in the same class as the reference amino acid. The classes are: 1) Aliphatic: Gly, Ala, Val, Leu, Ile; 2) Aromatic: Phe, Tyr, Trp; 3) Sulfur-containing: Cys, Met; 4) Aliphatic OH: Ser, Thr; 5) Basic: Lys, Arg, His; 6) Acidic: Asp, Glu, Asn, Gln; 7) Pro falls in none of the other classes; and 8) End defines a termination codon.

“NONCONSERVATIVE” is used if the polymorphism occurs in a coding region of a nucleic acid and provides a change in which the altered amino acid falls in a different class than the reference amino acid.

“FRAMESHIFT” relates to an insertion or a deletion. If the frameshift occurs in a coding region, the Table provides the translation of the frameshifted codons 3′ to the polymorphic site.

“Protein classification of CuraGen gene” provides a generic class into which the protein is classified. Approximately multiple classes of proteins were identified. The classes include the following:

Examples of possible disease correlations between the claimed SNPs with members of the genes of each classification are listed below for representative protein families.

Amylases

Amylase is responsible for endohydrolysis of 1,4-alpha-glucosidic linkages in oligosaccharides and polysaccharides. Variations in amylase gene may be indicative of delayed maturation and of various amylase producing neoplasms and carcinomas.

Amyloid

The serum amyloid A (SAA) proteins comprise a family of vertebrate proteins that associate predominantly with high-density lipoproteins (HDL). The synthesis of certain members of the family is greatly increased in inflammation. Prolonged elevation of plasma SAA levels, as in chronic inflammation, 15 results in a pathological condition, called amyloidosis, which affects the liver, kidney and spleen and which is characterized by the highly insoluble accumulation of SAA in these tissues. Amyloid selectively inhibits insulin-stimulated glucose utilization and glycogen deposition in muscle, while not affecting adipocyte glucose metabolism. Deposition of fibrillar amyloid proteins intraneuronally, as neurofibrillary tangles, extracellularly, as plaques and in blood vessels, is characteristic of both Alzheimer's disease and aged Down's syndrome. Amyloid deposition is also associated with type II diabetes mellitus.

Angiopoeitin

Members of the angiopoeitin/fibrinogen family have been shown to stimulate the generation of new blood vessels, inhibit the generation of new blood vessels, and perform several roles in blood clotting. This generation of new blood vessels, called angiogenesis, is also an essential step in tumor growth in order for the tumor to get the blood supply that it needs to expand. Variation in these genes may be predictive of any form of heart disease, numerous blood clotting disorders, stroke, hypertension and predisposition to tumor formation and metastasis. In particular, these variants may be predictive of the response to various antihypertensive drugs and chemotherapeutic and anti-tumor agents.

Apoptosis-Related Proteins

Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins. It is controlled by regulators, which have either an inhibitory effect on programmed cell death (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic). Many viruses have found a way of countering defensive apoptosis by encoding their own anti-apoptosis genes preventing their target-cells from dying too soon. Variants of apoptosis related genes may be useful in formulation of anti-aging drugs.

Cadherin, Cyclin, Polymerase, Oncogenes, Histones, Kinases

Members of the cell division/cell cycle pathways such as cyclins, many transcription factors and kinases, DNA polymerases, histones, helicases and other oncogenes play a critical role in carcinogenesis where the uncontrolled proliferation of cells leads to tumor formation and eventually metastasis. Variation in these genes may be predictive of predisposition to any form of cancer, from increased risk of tumor formation to increased rate of metastasis. In particular, these variants may be predictive of the response to various chemotherapeutic and anti-tumor agents.

Colony-Stimulating Factor-Related Proteins

Granulocyte/macrophage colony-stimulating factors are cytokines that act in hematopoiesis by controlling the production, differentiation, and function of 2 related white cell populations of the blood, the granulocytes and the monocytes-macrophages.

Complement-Related Proteins

Complement proteins are immune associated cytotoxic agents, acting in a chain reaction to exterminate target cells to that were opsonized (primed) with antibodies, by forming a membrane attack complex (MAC). The mechanism of killing is by opening pores in the target cell membrane. Variations in 20 complement genes or their inhibitors are associated with many autoimmune disorders. Modified serum levels of complement products cause edemas of various tissues, lupus (SLE), vasculitis, glomerulonephritis, renal failure, hemolytic anemia, thrombocytopenia, and arthritis. They interfere with mechanisms of ADCC (antibody dependent cell cytotoxicity), severely impair immune competence and reduce phagocytic ability. Variants of complement genes may also be indicative of type I diabetes mellitus, meningitis neurological disorders such as Nemaline myopathy, Neonatal hypotonia, muscular disorders such as congenital myopathy and other diseases.

Cytochrome

The respiratory chain is a key biochemical pathway which is essential to all aerobic cells. There are five different cytochromes involved in the chain. These are heme bound proteins which serve as electron carriers. Modifications in these genes may be predictive of ataxia areflexia, dementia and myopathic and neuropathic changes in muscles. Also, association with various types of solid tumors.

Kinesins

Kinesins are tubulin molecular motors that function to transport organelles within cells and to move chromosomes along microtubules during cell division. Modifications of these genes may be indicative of neurological disorders such as Pick disease of the brain, tuberous sclerosis.

Cytokines, Interferon, Interleukin

Members of the cytokine families are known for their potent ability to stimulate cell growth and division even at low concentrations. Cytokines such as erythropoietin are cell-specific in their growth stimulation; erythropoietin is useful for the stimulation of the proliferation of erythroblasts. Variants in cytokines may be predictive for a wide variety of diseases, including cancer predisposition.

G-protein Coupled Receptors

G-protein coupled receptors (also called R7G) are an extensive group of hormones, neurotransmitters, odorants and light receptors which transduce extracellular signals by interaction with guanine nucleotide-binding (G) proteins. Alterations in genes coding for G-coupled proteins may be involved in and indicative of a vast number of physiological conditions. These include blood pressure regulation, renal dysfunctions, male infertility, dopamine associated cognitive, emotional, and endocrine functions, hypercalcemia, chondrodysplasia and osteoporosis, pseudohypoparathyroidism, growth retardation and dwarfism.

Thioesterases

Eukaryotic thiol proteases are a family of proteolytic enzymes which contain an active site cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain; an asparagine completes the essential catalytic triad. Variants of thioester associated genes may be predictive of neuronal disorders and mental illnesses such as Ceroid Lipoffiscinosis, Neuronal 1, Infantile, Santavuori disease and more.

“Name of protein identified following a BLASTX analysis of the CuraGen sequence” provides the database reference for the protein found to resemble the novel reference-polymorphism cognate pair most closely.

“Similarity (pvalue) following a BLASTX analysis” provides the pvalue, a statistical measure from the BLASTX analysis that the polymorphic sequence is similar to, and therefore an allele of, the reference, or wild-type, sequence. In the present application, a cutoff of pvalue >1×10

−50

(entered, for example, as 1.0E-50 in the Table) is used to establish that the reference-polymorphic cognate pairs are novel. A pvalue <1×10

−50

defines proteins considered to be already known.

“Map location” provides any information available at the time of filing related to localization of a gene on a chromosome.

The polymorphisms are arranged in Table 1 in the following order:

SEQ ID NOS: 1 to 1112, in consecutive pairs, are SNPs that are silent;

SEQ ID NOS: 1113-1128, in consecutive pairs, are SNPs that lead to conservative amino acid changes;

SEQ ID NOS: 1129-1186, in consecutive pairs, are SNPs that lead to nonconservative amino acid changes; and

SEQ ID NOS: 1187-1192, in consecutive pairs, are SNPs that involve a gap.

With respect to the reference or wild-type sequence at the position of the polymorphism, the allelic cSNP introduces an additional nucleotide (an insertion) or deletes a nucleotide (a deletion). A SNP that involves a gap generates a frame shift.

Also presented in the sequence listing filed herewith are predicted amino acid sequences encoded by the polymorphic sequences shown in Table 1. SEQ ID NOS: 1193-1208, in consecutive pairs, are the amino acid sequences centered at the polymorphic amino acid residue for the protein products provided by SNPs that lead to conservative amino acid changes between the reference and the polymorphic sequences. 7 or 8 amino acids on either side of the polymorphic site are shown. The order in which these sequences appear mirrors the order of presentation of the cognate nucleotide sequences, and is set forth in Table 1.

SEQ ID NOS: 1209-1266, in consecutive pairs, are the amino acid sequences centered at the polymorphic amino acid residue for the protein products provided by SNPs that lead to nonconservative amino acid changes between the reference and the polymorphic sequences. 7 or 8 amino acids on either side of the polymorphic site are shown. The order in which these sequences appear mirrors the order of presentation of the cognate nucleotide sequences, and is set forth in the Table.

SEQ ID NOS: 1267-1272, in consecutive pairs, are the amino acid sequences centered at the polymorphic amino acid residue for the protein products provided by SNPs that lead to frameshift-induced amino acid changes between the reference and the polymorphic sequences. 7 or 8 amino acids on either side of the polymorphic site are shown. The order in which these sequences appear mirrors the order of presentation of the cognate nucleotide sequences, and is set forth in Table 1.

Provided herein are compositions which include, or are capable of detecting, nucleic acid sequences having these polymorphisms, as well as methods of using nucleic acids.

Identification of Individuals Carrying SNPs

Individuals carrying polymorphic alleles of the invention may be detected at either the DNA, the RNA, or the protein level using a variety of techniques that are well known in the art. Strategies for identification and detection are described in e.g., EP 730,663, EP 717,113, and PCT US97/02102. The present methods usually employ pre-characterized polymorphisms. That is, the genotyping location and nature of polymorphic forms present at a site have already been determined. The availability of this information allows sets of probes to be designed for specific identification of the known polymorphic forms.

Many of the methods described below require amplification of DNA from target samples. This can be accomplished by e.g., PCR. (1989), B. for detecting polymorphisms. See generally PCR Technology: Principles and Applications for DNA Amplification (ed. H. A. Erlich, Freeman Press, NY, N.Y., 1992); PCR Protocols: A Guide to Methods and Applications (eds. Innis, et al., Academic Press, San Diego, Calif., 1990); Mattila et al., Nucleic Acids Res. 19, 4967 (1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al., IRL Press, Oxford); and U.S. Pat. No. 4,683,202.

The phrase “recombinant protein” or “recombinantly produced protein” refers to a peptide or protein produced using non-native cells that do not have an endogenous copy of DNA able to express the protein. In particular, as used herein, a recombinantly produced protein relates to the gene product of a polymorphic allele, i.e., a “polymorphic protein” containing an altered amino acid at the site of translation of the nucleotide polymorphism. The cells produce the protein because they have been genetically altered by the introduction of the appropriate nucleic acid sequence. The recombinant protein will not be found in association with proteins and other subcellular components normally associated with the cells producing the protein. The terms “protein” and “polypeptide” are used interchangeably herein.

The phrase “substantially purified” or “isolated” when referring to a nucleic acid, peptide or protein, means that the chemical composition is in a milieu containing fewer, or preferably, essentially none, of other cellular components with which it is naturally associated. Thus, the phrase “isolated” or “substantially pure” refers to nucleic acid preparations that lack at least one protein or nucleic acid normally associated with the nucleic acid in a host cell. It is preferably in a homogeneous state although it can be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as gel electrophoresis or high performance liquid chromatography. Generally, a substantially purified or isolated nucleic acid or protein will comprise more than 80% of all macromolecular species present in the preparation. Preferably, the nucleic acid or protein is purified to represent greater than 90% of all macromolecular species present. More preferably the nucleic acid or protein is purified to greater than 95%, and most preferably the nucleic acid or protein is purified to essential homogeneity, wherein other macromolecular species are not detected by conventional analytical procedures.

The genomic DNA used for the diagnosis may be obtained from any nucleated cells of the body, such as those present in peripheral blood, urine, saliva, buccal samples, surgical specimen, and autopsy specimens. The DNA may be used directly or may be amplified enzymatically in vitro through use of PCR (Saiki et al.

Science

239:487-491 (1988)) or other in vitro amplification methods such as the ligase chain reaction (LCR) (Wu and Wallace

Genomics

4:560-569 (1989)), strand displacement amplification (SDA) (Walker et al.

Proc. Natl. Acad. Sci. U.S.A

, 89:392-396 (1992)), self-sustained sequence replication (3SR) (Fahy et al. PCR Methods P&J& 1:25-33 (1992)), prior to mutation analysis.

The method for preparing nucleic acids in a form that is suitable for mutation detection is well known in the art. A “nucleic acid” is a deoxyribonucleotide or ribonucleotide polymer in either single-or double-stranded form, including known analogs of natural nucleotides unless otherwise indicated. The term “nucleic acids”, as used herein, refers to either DNA or RNA. “Nucleic acid sequence” or “polynucleotide sequence” refers to a single-stranded sequence of deoxyribonucleotide or ribonucleotide bases read from the 5′ end to the 3′ end. The direction of 5′ to 3′ addition of nascent RNA transcripts is referred to as the transcription direction; sequence regions on the DNA strand having the same sequence as the RNA and which are beyond the 5′ end of the RNA transcript in the 5′ direction are referred to as “upstream sequences”; sequence regions on the DNA strand having the same sequence as the RNA and which are beyond the 3′ end of the RNA transcript in the 3′ direction are referred to as “downstream sequences”. The term includes both self-replicating plasmids, infectious polymers of DNA or RNA and nonfunctional DNA or RNA. The complement of any nucleic acid sequence of the invention is understood to be included in the definition of that sequence. “Nucleic acid probes” may be DNA or RNA fragments.

The detection of polymorphisms in specific DNA sequences, can be accomplished by a variety of methods including, but not limited to, restriction-fragment-length-polymorphism detection based on allele-specific restriction-endonuclease cleavage (Kan and Dozy

Lancet

ii:910-912 (1978)), hybridization with allele-specific oligonucleotide probes (Wallace et al. Nucl. Acids Res. 6:3543-3557 (1978)), including immobilized oligonucleotides (Saiki et al.

Proc. Natl. Acad. SCI. USA

, 86:6230-6234 (1969)) or oligonucleotide arrays (Maskos and Southern

Nucl. Acids Res

21:2269-2270 (1993)), allele-specific PCR (Newton et al.

Nucl Acids Res

17:2503-2516 (1989)), mismatch-repair detection (MRD) (Faham and Cox

Genome Res

5:474-482 (1995)), binding of MutS protein (Wagner et al.

Nucl Acids Res

23:3944-3948 (1995), denaturing-gradient gel electrophoresis (DGGE) (Fisher and Lerman et al.

Proc. Natl. Acad. Sci. U.S.A

. 80:1579-1583 (1983)), single-strand-confirmation-polymorphism detection (Orita et al. Genomics 5:874-879 (1983)), RNAase cleavage at mismatched base-pairs (Myers et al.

Science

230:1242 (1985)), chemical (Cotton et al. Proc. Natl. w Sci. U.S.A, 8Z4397-4401 (1988)) or enzymatic (Youil et al. Proc. Natl. Acad. Sci. U.S.A. 92:87-91 (1995)) cleavage of heteroduplex DNA, methods based on allele specific primer extension (Syvanen et al.

Genomics

8:684-692 (1990)), genetic bit analysis (GBA) (Nikiforov et al. &&I

Acids

22:4167-4175 (1994)), the oligonucleotide-ligation assay (OLA) (Landegren et al. Science 241:1077 (1988)), the allele-specific ligation chain reaction (LCR) (Barrany Proc. Natl. Acad. Sci. U.S.A. 88:189-193 (1991)), gap-LCR (Abravaya et al.

Nucl Acids Res

23:675-682 (1995)), radioactive and/or fluorescent DNA sequencing using standard procedures well known in the art, and peptide nucleic acid (PNA) assays (Orum et al., Nucl. Acids Res, 21:5332-5356 (1993); Thiede et al.,

Nucl. Acids Res

. 24:983-984 (1996)).

“Specific hybridization” or “selective hybridization” refers to the binding, or duplexing, of a nucleic acid molecule only to a second particular nucleotide sequence to which the nucleic acid is complementary, under suitably stringent conditions when that sequence is present in a complex mixture (e.g., total cellular DNA or RNA). “Stringent conditions” are conditions under which a probe will hybridize to its target subsequence, but to no other sequences. Stringent conditions are sequence-dependent and are different in different circumstances. Longer sequences hybridize specifically at higher temperatures than shorter ones. Generally, stringent conditions are selected such that the temperature is about 5° C. lower than the thermal melting point (Tm) for the specific sequence to which hybridization is intended to occur at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% of the target sequence hybridizes to the complementary probe at equilibrium. Typically, stringent conditions include a salt concentration of at least about 0.01 to about 1.0 M Na ion concentration (or other salts), at pH 7.0 to 8.3. The temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotides). Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. For example, conditions of 5× SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C. are suitable for allele-specific probe hybridizations.

“Complementary” or “target” nucleic acid sequences refer to those nucleic acid sequences which selectively hybridize to a nucleic acid probe. Proper annealing conditions depend, for example, upon a probe's length, base composition, and the number of mismatches and their position on the probe, and must often be determined empirically. For discussions of nucleic acid probe design and annealing conditions, see, for example, Sambrook et al., or

Current Protocols in Molecular Biology

, F. Ausubel et al., ed., Greene Publishing and Wiley-Interscience, New York (1987).

A perfectly matched probe has a sequence perfectly complementary to a particular target sequence. The test probe is typically perfectly complementary to a portion of the target sequence. A “polymorphic” marker or site is the locus at which a sequence difference occurs with respect to a reference sequence. Polymorphic markers include restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu. The reference allelic form may be, for example, the most abundant form in a population, or the first allelic form to be identified, and other allelic forms are designated as alternative, variant or polymorphic alleles. The allelic form occurring most frequently in a selected population is sometimes referred to as the “wild type” form, and herein may also be referred to as the “reference” form. Diploid organisms may be homozygous or heterozygous for allelic forms. A diallelic polymorphism has two distinguishable forms (i.e., base sequences), and a triallelic polymorphism has three such forms.

As use herein an “oligonucleotide” is a single-stranded nucleic acid ranging in length from 2 to about 60 bases. Oligonucleotides are often synthetic but can also be produced from naturally occurring polynucleotides. A probe is an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing via hydrogen bond formation. Oligonucleotides probes are often between 5 and 60 bases, and, in specific embodiments, may be between 10-40, or 15-30 bases long. An oligonucleotide probe may include natural (i.e. A, G, C, or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in an oligonucleotide probe may be joined by a linkage other than a phosphodiester bond, such as a phosphoramidite linkage or a phosphorothioate linkage, or they may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than by phosphodiester bonds, so long as it does not interfere with hybridization.

As used herein, the term “primer” refers to a single-stranded oligonucleotide which acts as a point of initiation of template-directed DNA synthesis under appropriate conditions (e.g., in the presence of four different nucleoside triphosphates and a polymerization agent, such as DNA polymerase, RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer, but typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not be perfectly complementary to the exact sequence of the template, but should be sufficiently complementary to hybridize with it. The term “primer site” refers to the sequence of the target DNA to which a primer hybridizes. The term “primer pair” refers to a set of primers including a 5′ (upstream) primer that hybridizes with the 5′ end of the DNA sequence to be amplified and a 3′ (downstream) primer that hybridizes with the complement of the 3′ end of the sequence to be amplified.

DNA fragments can be prepared, for example, by digesting plasmid DNA, or by use of PCR. Oligonucleotides for use as primers or probes are chemically synthesized by methods known in the field of the chemical synthesis of polynucleotides, including by way of non-limiting example the phosphoramidite method described by Beaucage and Carruthers,

Tetrahedron Lett

22:1859-1862 (1981) and the triester method provided by Matteucci, et al.,

J. Am. Chem. Soc

., 103:3185 (1981) both incorporated herein by reference. These syntheses may employ an automated synthesizer, as described in Needham-VanDevanter, D. R., et al.,

Nucleic Acids Res

. 12:61596168 (1984). Purification of oligonucleotides may be carried out by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson, J. D. and Regnier, F. E., ,J. Chrom, 255:137-149 (1983). A double stranded fragment may then be obtained, if desired, by annealing appropriate complementary single strands together under suitable conditions or by synthesizing the complementary strand using a DNA polymerase with an appropriate primer sequence. Where a specific sequence for a nucleic acid probe is given, it is understood that the complementary strand is also identified and included. The complementary strand will work equally well in situations where the target is a double-stranded nucleic acid.

The sequence of the synthetic oligonucleotide or of any nucleic acid fragment can be can be obtained using either the dideoxy chain termination method or the Maxam-Gilbert method (see Sambrook et al.

Molecular Cloning—a Laboratory Manual

(2nd Ed.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., (1989), which is incorporated herein by reference. This manual is hereinafter referred to as “Sambrook et al.”; Zyskind et al., (1988)). Recombinant DNA Laboratory Manual, (Acad. Press, New York). Oligonucleotides useful in diagnostic assays are typically at least 8 consecutive nucleotides in length, and may range upwards of 18 nucleotides in length to greater than 100 or more consecutive nucleotides.

Another aspect of the invention pertains to isolated antisense nucleic acid molecules that are hybridizable to or complementary to the nucleic acid molecule comprising the SNP-containing nucleotide sequences of the invention, or fragments, analogs or derivatives thereof. An “antisense” nucleic acid comprises a nucleotide sequence that is complementary to a “sense” nucleic acid encoding a protein, e.g., complementary to the coding strand of a double-stranded cDNA molecule or complementary to an mRNA sequence. In specific aspects, antisense nucleic acid molecules are provided that comprise a sequence complementary to at least about 10, about 25, about 50, or about 60 nucleotides or an entire SNP coding strand, or to only a portion thereof.

In one embodiment, an antisense nucleic acid molecule is antisense to a “coding region” of the coding strand of a polymorphic nucleotide sequence of the invention. The term “coding region” refers to the region of the nucleotide sequence comprising codons which are translated into amino acid. In another embodiment, the antisense nucleic acid molecule is antisense to a “noncoding region” of the coding strand of a nucleotide sequence of the invention. The term “noncoding region” refers to 5′ and 3′ sequences which flank the coding region that are not translated into amino acids (i.e., also referred to as 5′ and 3′ untranslated regions).

Given the coding strand sequences disclosed herein, antisense nucleic acids of the invention can be designed according to the rules of Watson and Crick or Hoogsteen base pairing. For example, the antisense nucleic acid molecule can generally be complementary to the entire coding region of an mRNA, but more preferably as embodied herein, it is an oligonucleotide that is antisense to only a portion of the coding or noncoding region of the mRNA. An antisense oligonucleotide can range in length between about 5 and about 60 nucleotides, preferably between about 10 and about 45 nucleotides, more preferably between about 15 and 40 nucleotides, and still more preferably between about 15 and 30 in length. An antisense nucleic acid of the invention can be constructed using chemical synthesis or enzymatic ligation reactions using procedures known in the art. For example, an antisense nucleic acid (e.g., an antisense oligonucleotide) can be chemically synthesized using naturally occurring nucleotides or variously modified nucleotides designed to increase the biological stability of the molecules or to increase the physical stability of the duplex formed between the antisense and sense nucleic acids, e.g., phosphorothioate derivatives and acridine substituted nucleotides can be used.

Examples of modified nucleotides that can be used to generate the antisense nucleic acid include: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5′-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the antisense nucleic acid can be produced biologically using an expression vector into which a nucleic acid has been subcloned in an antisense orientation (i.e., RNA transcribed from the inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, described further in the following subsection).

The antisense nucleic acid molecules of the invention are typically administered to a subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or genomic DNA encoding a polymorphic protein to thereby inhibit expression of the protein, e.g., by inhibiting transcription and/or translation. The hybridization can be by conventional nucleotide complementary to form a stable duplex, or, for example, in the case of an antisense nucleic acid molecule that binds to DNA duplexes, through specific interactions in the major groove of the double helix. An example of a route of administration of anti sense nucleic acid molecules of the invention includes direct injection at a tissue site. Alternatively, antisense nucleic acid molecules can be modified to target selected cells and then administered systemically. For example, for systemic administration, antisense molecules can be modified such that they specifically bind to receptors or antigens expressed on a selected cell surface, e.g., by linking the antisense nucleic acid molecules to peptides or antibodies that bind to cell surface receptors or antigens. The antisense nucleic acid molecules can also be delivered to cells using the vectors described herein. To achieve sufficient intracellular concentrations of antisense molecules, vector constructs in which the antisense nucleic acid molecule is placed under the control of a strong pol II or pol III promoter are preferred.

In yet another embodiment, the antisense nucleic acid molecule of the invention is an α-anomeric nucleic acid molecule. An α-anomeric nucleic acid molecule forms specific double-stranded hybrids with complementary RNA in which, contrary to the usual units, the strands run parallel to each other (Gaultier et al. (1987)

Nucleic Acids Res

15: 6625-6641). The antisense nucleic acid molecule can also comprise a 2′-o-methylribonucleotide (Inoue et al. (1987)

Nucleic Acids Res

15: 6131-6148) or a chimeric RNA -DNA analogue (Inoue et al. (1987)

FEBS Lett

215: 327-330).

The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: “reference sequence”, “comparison window”, “sequence identity”, “percentage of sequence identity”, and “substantial identity”. A “reference sequence” is a defined sequence used as a basis for a sequence comparison; a reference sequence may be a subset of a larger sequence, for example, as a segment of a full-length cDNA or gene sequence given in a sequence listing, or may comprise a complete cDNA or gene sequence. Optimal alignment of sequences for aligning a comparison window may, for example, be conducted by the local homology algorithm of Smith and Waterman

Adv. Appl. Math

. 2482 (1981), by the homology alignment algorithm of Needleman and Wunsch

J. Mol. Biol

. 48:443 (1970), by the search for similarity method of Pearson and Lipman

Proc. Natl. Acad. Sci. U.S.A

. 852444 (1988), or by computerized implementations of these algorithms (for example, GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.).

Techniques for nucleic acid manipulation of the nucleic acid sequences harboring the cSNP's of the invention, such as subcloning nucleic acid sequences encoding polypeptides into expression vectors, labeling probes, DNA hybridization, and the like, are described generally in Sambrook et al., The phrase “nucleic acid sequence encoding” refers to a nucleic acid which directs the expression of a specific protein, peptide or amino acid sequence. The nucleic acid sequences include both the DNA strand sequence that is transcribed into RNA and the RNA sequence that is translated into protein, peptide or amino acid sequence. The nucleic acid sequences include both the full length nucleic acid sequences disclosed herein as well as non-full length sequences derived from the full length protein. It being further understood that the sequence includes the degenerate codons of the native sequence or sequences which may be introduced to provide codon preference in a specific host cell. Consequently, the principles of probe selection and array design can readily be extended to analyze more complex polymorphisms (see EP 730,663). For example, to characterize a triallelic SNP polymorphism, three groups of probes can be designed tiled on the three polymorphic forms as described above. As a further example, to analyze a diallelic polymorphism involving a deletion of a nucleotide, one can tile a first group of probes based on the undeleted polymorphic form as the reference sequence and a second group of probes based on the deleted form as the reference sequence.

For assay of genomic DNA, virtually any biological convenient tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair can be used. Genomic DNA is typically amplified before analysis. Amplification is usually effected by PCR using primers flanking a suitable fragment e.g., of 50-500 nucleotides containing the locus of the polymorphism to be analyzed. Target is usually labeled in the course of amplification. The amplification product can be RNA or DNA, single stranded or double stranded. If double stranded, the amplification product is typically denatured before application to an array. If genomic DNA is analyzed without amplification, it may be desirable to remove RNA from the sample before applying it to the array. Such can be accomplished by digestion with DNase-free RNAase.

Detection of Polymorphisms in a Nucleic Acid Sample

The SNPs disclosed herein can be used to determine which forms of a characterized polymorphism are present in individuals under analysis.

The design and use of allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 7, 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms.

Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence.

The polymorphisms can also be identified by hybridization to nucleic acid arrays, some examples of which are described in oublished PCT application WO 95/11995. WO 95/11995 also describes subarrays that are optimized for detection of a variant form of a precharacterized polymorphism. Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence. The second group of probes is designed by the same principles, except that the probes exhibit complementarity to the second reference sequence. The inclusion of a second group (or further groups) can be particularly useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (e.g., two or more mutations within 9 to 21 bases).

An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarity. See Gibbs, Nucleic Acid Res. 17 2427-2448 (1989). This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the two-primers, resulting in a detectable product which indicates the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarity to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. The method works best when the mismatch is included in the 3′-most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer (see, e.g., WO 93/22456).

Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, (W. H. Freeman and Co New York, 1992, Chapter 7).

Alleles of target sequences can be differentiated using single-strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770 (1989). Amplified PCR products can be generated and heated or otherwise denatured, to form single stranded amplification products. Single-stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence. The different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence differences between alleles of target sequences.

The genotype of an individual with respect to a pathology suspected of being caused by a genetic polymorphism may be assessed by association analysis. Phenotypic traits suitable for association analysis include diseases that have known but hitherto unmapped genetic components (e.g., agammaglobulinemia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, Fabry's disease, familial hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von Willebrand's disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute intermittent porphyria).

Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, system, diseases of the nervous and infection by pathogenic microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent), systemic lupus erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, oral cavity, ovary, pancreas, prostate, skin, stomach, leukemia, liver, lung, and uterus. Phenotypic traits also include characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.

Such correlations can be exploited in several ways. In the case of a strong correlation between a polymorphic form and a disease for which treatment is available, detection of the polymorphic form set in a human or animal patient may justify immediate administration of treatment, or at least the institution of regular monitoring of the patient. Detection of a polymorphic form correlated with serious disease in a couple contemplating a family may also be valuable to the couple in their reproductive decisions. For example, the female partner might elect to undergo in vitro fertilization to avoid the possibility of transmitting such a polymorphism from her husband to her offspring. In the case of a weaker, but still statistically significant correlation between a polymorphic set and human disease, immediate therapeutic intervention or monitoring may not be justified. Nevertheless, the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) that can be accomplished at little cost to the patient but confer potential benefits in reducing the risk of conditions to which the patient may have increased susceptibility by virtue of variant alleles. After determining polymorphic form(s) present in an individual at one or more polymorphic sites, this information can be used in a number of methods.

Determination of which polymorphic forms occupy a set of polymorphic sites in an individual identifies a set of polymorphic forms that distinguishes the individual. See generally National Research Council,

The Evaluation of Forensic DNA Evidence

(Eds. Pollard et al., National Academy Press, DC, 1996). Since the polymorphic sites are within a 50,000 bp region in the human genome, the probability of recombination between these polymorphic sites is low. That low probability means the haplotype (the set of all 10 polymorphic sites) set forth in this application should be inherited without change for at least several generations. The more sites that are analyzed the lower the probability that the set of polymorphic forms in one individual is the same as that in an unrelated individual. Preferably, if multiple sites are analyzed, the sites are unlinked. Thus, polymorphisms of the invention are often used in conjunction with polymorphisms in distal genes. Preferred polymorphisms for use in forensics are diallelic because the population frequencies of two polymorphic forms can usually be determined with greater accuracy than those of multiple polymorphic forms at multi-allelic loci.

The capacity to identify a distinguishing or unique set of forensic markers in an individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at the crime scene. If frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals), one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance.

p(ID) is the probability that two random individuals have the same polymorphic or allelic form at a given polymorphic site. In diallelic loci, four genotypes are possible: AA, AB, BA, and BB. If alleles A and B occur in a haploid genome of the organism with frequencies x and y, the probability of each genotype in a diploid organism are (see WO 95/12607):

Homozygote:

p

(

AA

)=

x

2

Homozygote:

p

(

BB

)=

y

2

=(1

−x

)

2

Single Heterozygote:

p

(

AB

)=

p

(

BA

)=

xy=x

(1

−x

)

Both Heterozygotes:

p

(

AB+BA

)=2

xy

=2

x

(1

−x

)

The probability of identity at one locus (i.e, the probability that two individuals, picked at random from a population will have identical polymorphic forms at a given locus) is given by the equation:

p

(

ID

)=(

x

2

)

2+

(2

xy

)

2+

(

y

2

)

2

.

These calculations can be extended for any number of polymorphic forms at a given locus. For example, the probability of identity p(ID) for a 3-allele system where the alleles have the frequencies in the population of x, y and z, respectively, is equal to the sum of the squares of the genotype frequencies:

p

(

ID

)=

x

4+

(2

xy

)

2+

(2

yz

)

2+

(2

xz

)

2+

z

4+

y

4

In a locus of n alleles, the appropriate binomial expansion is used to calculate p(ID) and p(exc).

The cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci is determined by multiplying the probabilities provided by each locus:

cum p

(

ID

)=

p

(

ID

1)

p

(

ID

2)

p

(

ID

3) . . .

p

(

IDn

)

The cumulative probability of non-identity for n loci (i.e. the probability that two random individuals will be different at 1 or more loci) is given by the equation:

cum p

(

nonID

)=1

−cum p

(

ID

).

If several polymorphic loci are tested, the cumulative probability of non-identity for random individuals becomes very high (e.g., one billion to one). Such probabilities can be taken into account together with other evidence in determining the guilt or innocence of the suspect.

The object of paternity testing is usually to determine whether a male is the father of a child. In most cases, the mother of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child's genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of polymorphisms in the putative father and the child.

If the set of polymorphisms in the child attributable to the father does not match the putative father, it can be concluded, barring experimental error, that the putative father is not the real father. If the set of polymorphisms in the child attributable to the father does match the set of polymorphisms of the putative father, a statistical calculation can be performed to determine the probability of coincidental match.

The probability of parentage exclusion (representing the probability that a random male will have a polymorphic form at a given polymorphic site that makes him incompatible as the father) is given by the equation (see WO 95/12607):

p

(

exc

)=

xy

(1

−xy

)

where x and y are the population frequencies of alleles A and B of a diallelic polymorphic site. (At a triallelic site p(exc)=xy(1−xy)+yz(1−yz)+xz(1−xz)+3xyz(1−xyz))), where x, y and z and the respective population frequencies of alleles A, B and C). The probability of non-exclusion is:

p

(

non

-

exc

)=1

−p

(

exc

)

The cumulative probability of non-exclusion (representing the value obtained when n loci are used) is thus:

cum p

(

non

-

exc

)=

p

(

non

-

exc

1)

p

(

non

-

exc

2)

p

(

non

-

exc

3) . . .

p

(

non

-

excn

)

The cumulative probability of exclusion for n loci (representing the probability that a random male will be excluded) is:

cum p

(

exc

)=1

−cum p

(

non

-

exc

).

If several polymorphic loci are included in the analysis, the cumulative probability of exclusion of a random male is very high. This probability can be taken into account in assessing the liability of a putative father whose polymorphic marker set matches the child's polymorphic marker set attributable to his/her father.

The polymorphisms of the invention may contribute to the phenotype of an organism in different ways. Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure. The effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances. For example, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation. A single polymorphism may affect more than one phenotypic trait. Likewise, a single phenotypic trait may be affected by polymorphisms in different genes. Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.

Phenotypic traits include diseases that have known but hitherto unmapped genetic components. Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent), systemic lupus erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus. Phenotypic traits also include characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.

Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymorphic markers sets. To perform such analysis, the presence or absence of a set of polymorphisms (i.e. a polymorphic set) is determined for a set of the individuals, some of whom exhibit a particular trait, and some of which exhibit lack of the trait. The alleles of each polymorphism of the set are then reviewed to determine whether the presence or absence of a particular allele is associated with the trait of interest. Correlation can be performed by standard statistical methods such as a κ-squared test and statistically significant correlations between polymorphic form(s) and phenotypic characteristics are noted. For example, it might be found that the presence of allele A1 at polymorphism A correlates with heart disease. As a further example, it might be found that the combined presence of allele A1 at polymorphism A and allele B1 at polymorphism B correlates with increased milk production of a farm animal.

Such correlations can be exploited in several ways. In the case of a strong correlation between a set of one or more polymorphic forms and a disease for which treatment is available, detection of the polymorphic form set in a human or animal patient may justify immediate administration of treatment, or at least the institution of regular monitoring of the patient. Detection of a polymorphic form correlated with serious disease in a couple contemplating a family may also be valuable to the couple in their reproductive decisions. For example, the female partner might elect to undergo in vitro fertilization to avoid the possibility of transmitting such a polymorphism from her husband to her offspring. In the case of a weaker, but still statistically significant correlation between a polymorphic set and human disease, immediate therapeutic intervention or monitoring may not be justified. Nevertheless, the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) that can be accomplished at little cost to the patient but confer potential benefits in reducing the risk of conditions to which the patient may have increased susceptibility by virtue of variant alleles. Identification of a polymorphic set in a patient correlated with enhanced receptiveness to one of several treatment regimes for a disease indicates that this treatment regime should be followed.

For animals and plants, correlations between characteristics and phenotype are useful for breeding for desired characteristics. For example, Beitz et al., U.S. Pat. No. 5,292,639 discuss use of bovine mitochondrial polymorphisms in a breeding program to improve milk production in cows. To evaluate the effect of mtDNA D-loop sequence polymorphism on milk production, each cow was assigned a value of 1 if variant or 0 if wild type with respect to a prototypical mitochondrial DNA sequence at each of 17 locations considered.

The previous section concerns identifying correlations between phenotypic traits and polymorphisms that directly or indirectly contribute to those traits. The present section describes identification of a physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with it. Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position, and thereby cloning gene(s) responsible for the trait. See Lander et al.,

Proc. Natl. Acad. Sci

. (

USA

) 83, 7353-7357 (1986); Lander et al.,

Proc. Natl. Acad. Sci

. (

USA

) 84, 2363-2367 (1987); Donis-Keller et al.,

Cell

51, 319-337 (1987); Lander et al.,

Genetics

121, 185-199 (1989)). Genes localized by linkage can be cloned by a process known as directional cloning. See Wainwright,

Med. J. Australia

159, 170-174 (1993); Collins,

Nature Genetics

1, 3-6 (1992) (each of which is incorporated by reference in its entirety for all purposes).

Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co-segregate with a phenotypic trait. See, e.g., Kerem et al.,

Science

245, 1073-1080 (1989); Monaco et al.,

Nature

316, 842 (1985); Yamoka et al.,

Neurology

40, 222-226 (1990); Rossiter et al.,

FASEB Journal

5, 21-27 (1991).

Linkage is analyzed by calculation of LOD (log of the odds) values. A lod value is the relative likelihood of obtaining observed segregation data for a marker and a genetic locus when the two are located at a recombination fraction , versus the situation in which the two are not linked, and thus segregating independently (Thompson & Thompson,

Genetics in Medicine

(5th ed, W. B. Saunders Company, Philadelphia, 1991); Strachan, “Mapping the human genome” in

The Human Genome

(BIOS Scientific Publishers Ltd, Oxford), Chapter 4). A series of likelihood ratios are calculated at various recombination fractions (), ranging from =0.0 (coincident loci) to =0.50 (unlinked). Thus, the likelihood at a given value of is: probability of data if loci linked at to probability of data if loci unlinked. The computed likelihood is usually expressed as the log

10

of this ratio (i.e., a lod score). For example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence. The use of logarithms allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of (e.g., LIPED, MLINK (Lathrop,

Proc. Nat. Acad Sci

. (USA) 81, 3443-3446 (1984)). For any particular lod score, a recombination fraction may be determined from mathematical tables. See Smith et al.,

Mathematical tables for research workers in human genetics

(Churchill, London, 1961); Smith,

Ann. Hum. Genet.

32, 127-150 (1968). The value of at which the lod score is the highest is considered to be the best estimate of the recombination fraction.

Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of ) than the possibility that the two loci are unlinked. By convention, a combined lod score of +3 or greater (equivalent to greater than 1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked. Similarly, by convention, a negative lod score of −2 or less is taken as definitive evidence against linkage of the two loci being compared. Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations.

The invention further provides transgenic nonhuman animals capable of expressing an exogenous variant gene and/or having one or both alleles of an endogenous variant gene inactivated. Expression of an exogenous variant gene is usually achieved by operably linking the gene to a promoter and optionally an enhancer, and microinjecting the construct into a zygote. See Hogan et al., “Manipulating the Mouse Embryo, A Laboratory Manual,” Cold Spring Harbor Laboratory. (1989). Inactivation of endogenous variant genes can be achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of a positive selection marker. See Capecchi, Science 244, 1288-1292 The transgene is then introduced into an embryonic stem cell, where it undergoes homologous recombination with an endogenous variant gene. Mice and other rodents are preferred animals. Such animals provide useful drug screening systems.

The invention further provides methods for assessing the pharmacogenomic susceptibility of a subject harboring a single nucleotide polymorphism to a particular pharmaceutical compound, or to a class of such compounds. Genetic polymorphism in drug-metabolizing enzymes, drug transporters, receptors for pharmaceutical agents, and other drug targets have been correlated with individual differences based on distinction in the efficacy and toxicity of the pharmaceutical agent administered to a subject. Pharmocogenomic characterization of a subjects susceptibility to a drug enhances the ability to tailor a dosing regimen to the particular genetic constitution of the subject, thereby enhancing and optimizing the therapeutic effectiveness of the therapy.

In cases in which a cSNP leads to a polymorphic protein that is ascribed to be the cause of a pathological condition, method of treating such a condition includes administering to a subject experiencing the pathology the wild type cognate of the polymorphic protein. Once administered in an effective dosing regimen, the wild type cognate provides complementation or remediation of the defect due to the polymorphic protein. The subject's condition is ameliorated by this protein therapy.

A subject suspected of suffering from a pathology ascribable to a polymorphic protein that arises from a cSNP is to be diagnosed using any of a variety of diagnostic methods capable of identifying the presence of the cSNP in the nucleic acid, or of the cognate polymorphic protein, in a suitable clinical sample taken from the subject. Once the presence of the cSNP has been ascertained, and the pathology is correctable by administering a normal or wild-type gene, the subject is treated with a pharmaceutical composition that includes a nucleic acid that harbors the correcting wild-type gene, or a fragment containing a correcting sequence of the wild-type gene. Non-limiting examples of ways in which such a nucleic acid may be administered include incorporating the wild-type gene in a viral vector, such as an adenovirus or adeno associated virus, and administration of a naked DNA in a pharmaceutical composition that promotes intracellular uptake of the administered nucleic acid. Once the nucleic acid that includes the gene coding for the wild-type allele of the polymorphism is incorporated within a cell of the subject, it will initiate de novo biosynthesis of the wild-type gene product. If the nucleic acid is further incorporated into the genome of the subject, the treatment will have long-term effects, providing de novo synthesis of the wild-type protein for a prolonged duration. The synthesis of the wild-type protein in the cells of the subject will contribute to a therapeutic enhancement of the clinical condition of the subject.

A subject suffering from a pathology ascribed to a SNP may be treated so as to correct the genetic defect. (See Kren et al., Proc. Natl. Acad. Sci. USA 96:10349-10354 (1999)). Such a subject is identified by any method that can detect the polymorphism in a sample drawn from the subject. Such a genetic defect may be permanently corrected by administering to such a subject a nucleic acid fragment incorporating a repair sequence that supplies the wild-type nucleotide at the position of the SNP. This site-specific repair sequence encompasses an RNA/DNA oligonucleotide which operates to promote endogenous repair of a subject's genomic DNA. Upon administration in an appropriate vehicle, such as a complex with polyethylenimine or encapsulated in anionic liposomes, a genetic defect leading to an inborn pathology may be overcome, as the chimeric oligonucleotides induces incorporation of the wild-type sequence into the subject's genome. Upon incorporation, the wild-type gene product is expressed, and the replacement is propagated, thereby engendering a permanent repair.

The invention further provides kits comprising at least one allele-specific oligonucleotide as described above. Often, the kits contain one or more pairs of allele-specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate. For example, the same substrate can comprise allele-specific oligonucleotide probes for detecting at least 10, 100, 1000 or all of the polymorphisms shown in the Table. Optional additional components of the kit include, for example, restriction enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates, means used to label (for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin), and the appropriate buffers for reverse transcription, PCR, or hybridization reactions. Usually, the kit also contains instructions for carrying out the hybridizing methods.

Several aspects of the present invention rely on having available the polymorphic proteins encoded by the nucleic acids comprising a SNP of the inventions. There are various methods of isolating these nucleic acid sequences. For example, DNA is isolated from a genomic or cDNA library using labeled oligonucleotide probes having sequences complementary to the sequences disclosed herein.

Such probes can be used directly in hybridization assays. Alternatively probes can be designed for use in amplification techniques such as PCR.

To prepare a cDNA library, mRNA is isolated from tissue such as heart or pancreas, preferably a tissue wherein expression of the gene or gene family is likely to occur. cDNA is prepared from the mRNA and ligated into a recombinant vector. The vector is transfected into a recombinant host for propagation, screening and cloning. Methods for making and screening cDNA libraries are well known, See Gubler, U. and Hoffman, B. J. Gene 25:263-269 (1983) and Sambrook et al.

For a genomic library, for example, the DNA is extracted from tissue and either mechanically sheared or enzymatically digested to yield fragments of about 12-20 kb. The fragments are then separated by gradient centrifugation from undesired sizes and are constructed in bacteriophage lambda vectors. These vectors and phage are packaged in vitro, as described in Sambrook, et al. Recombinant phage are analyzed by plaque hybridization as described in Benton and Davis,

Science

196:180-182 (1977). Colony hybridization is carried out as generally described in M. Grunstein et al. Proc. Natl. Acad. Sci. USA. 72:3961-3965 (1975). DNA of interest is identified in either cDNA or genomic libraries by its ability to hybridize with nucleic acid probes, for example on Southern blots, and these DNA regions are isolated by standard methods familiar to those of skill in the art. See Sambrook, et al.

In PCR techniques, oligonucleotide primers complementary to the two 3′ borders of the DNA region to be amplified are synthesized. The polymerase chain reaction is then carried out using the two primers. See PCR Protocols: a Guide to Methods and Applications (Innis, M, Gelfand, D., Sninsky, J. and White, T., eds.), Academic Press, San Diego (1990). Primers can be selected to amplify the entire regions encoding a full-length sequence of interest or to amplify smaller DNA. segments as desired. PCR can be used in a variety of protocols to isolate cDNA's encoding a sequence of interest. In these protocols, appropriate primers and probes for amplifying DNA encoding a sequence of interest are generated from analysis of the DNA sequences listed herein. Once such regions are PCR-amplified, they can be sequenced and oligonucleotide probes can be prepared from the sequence.

Once DNA encoding a sequence comprising a cSNP is isolated and cloned, one can express the encoded polymorphic proteins in a variety of recombinantly engineered cells. It is expected that those of skill in the art are knowledgeable in the numerous expression systems available for expression of DNA encoding a sequence of interest. No attempt to describe in detail the various methods known for the expression of proteins in prokaryotes or eukaryotes is made here.

In brief summary, the expression of natural or synthetic nucleic acids encoding a sequence of interest will typically be achieved by operably linking the DNA or cDNA to a promoter (which is either constitutive or inducible), followed by incorporation into an expression vector. The vectors can be suitable for replication and integration in either prokaryotes or eukaryotes. Typical expression vectors contain, initiation sequences, transcription and translation terminators, and promoters useful for regulation of the expression of a polynucleotide sequence of interest. To obtain high level expression of a cloned gene, it is desirable to construct expression plasmids which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational initiation, and a transcription/translation terminator. The expression vectors may also comprise generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the plasmid in both eukaryotes and prokaryotes, i.e., shuttle vectors, and selection markers for both prokaryotic and eukaryotic systems. See Sambrook et al.

A variety of prokaryotic expression systems may be used to express the polymorphic proteins of the invention. Examples include

E. coli

, Bacillus, Streptomyces, and the like.

It is preferred to construct expression plasmids which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational initiation, and a transcription/translation terminator. Examples of regulatory regions suitable for this purpose in

E. coli

are the promoter and operator region of the

E. coli

tryptophan biosynthetic pathway as described by Yanofsky, C., J. Bacterial. 158:1018-1024 (1984) and the leftward promoter of phage lambda (P) as described by A, I. and Hagen, D.,

Ann. Rev. Genet

. 14:399-445 (1980). The inclusion of selection markers in DNA vectors transformed in

E. coli

is also useful. Examples of such markers include genes specifying resistance to ampicillin, tetracycline, or chloramphenicol. See Sambrook et al. for details concerning selection markers for use in

E. coli.

To enhance proper folding of the expressed recombinant protein, during purification from

E. coli

, the expressed protein may first be denatured and then renatured. This can be accomplished by solubilizing the bacterially produced proteins in a chaotropic agent such as guanidine HCI and reducing all the cysteine residues with a reducing agent such as beta-mercaptoethanol. The protein is then renatured, either by slow dialysis or by gel filtration. See U.S. Pat. No. 4,511,503. Detection of the expressed antigen is achieved by methods known in the art as radioimmunoassay, or Western blotting techniques or immunoprecipitation. Purification from

E. coli

can be achieved following procedures such as those described in U.S. Pat. No. 4,511,503.

Any of a variety of eukaryotic expression systems such as yeast, insect cell lines, bird, fish, and mammalian cells, may also be used to express a polymorphic protein of the invention. As explained briefly below, a nucleotide sequence harboring a cSNP may be expressed in these eukaryotic systems. Synthesis of heterologous proteins in yeast is well known.

Methods in Yeast Genetics

, Sherman, F., et al., Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the various methods available to produce the protein in yeast. Suitable vectors usually have expression control sequences, such as promoters, including 3-phosphogtycerate kinase or other glycolytic enzymes, and an origin of replication, termination sequences and the like as desired. For instance, suitable vectors are described in the literature (Botstein, et al., Gene 8:17-24 (1979); Broach, et al., Gene 8:121-133 (1979)).

Two procedures are used in transforming yeast cells. In one case, yeast cells are first converted into protoplasts using zymolyase, lyticase or glusulase, followed by addition of DNA and polyethylene glycol (PEG). The PEG-treated protoplasts are then regenerated in a 3% agar medium under selective conditions. Details of this procedure are given in the papers by J. D. Beggs, Nature (London) 275:104-109 (1978); and Hinnen, A., et al., Proc. Natl. Acad. Sci. USA, 75:1929-1933 (1978). The second procedure does not involve removal of the cell wall. Instead the cells are treated with lithium chloride or acetate and PEG and put on selective plates (Ito, H., et al., J. Bact, 153163-168 (1983)). cells and applying standard protein isolation techniques to the lysates:.

The purification process can be monitored by using Western blot techniques or radioimmunoassay or other standard techniques. The sequences encoding the proteins of the invention can also be ligated to various immunoassay expression vectors for use in transforming cell cultures of, for instance, mammalian, insect, bird or fish origin. Illustrative of cell cultures useful for the production of the polypeptides are mammalian cells. Mammalian cell systems often will be in the form of monolayers of cells although mammalian cell suspensions may also be used. A number of suitable host cell lines capable of expressing intact proteins have been developed in the art, and include the HEK293, BHK21, and CHO cell lines, and various human cells such as COS cell lines, HeLa cells, myeloma cell lines, Jurkat cells, etc. Expression vectors for these cells can include expression control sequences, such as an origin of replication, a promoter (e.g., the CMV promoter, a HSV tk promoter or pgk (phosphoglycerate kinase) promoter), an enhancer (Queen et al.

Immunol. Rev

, 89:49 (1986)) and necessary processing information sites, such as ribosome binding sites, RNA splice sites, polyadenylation sites (e.g., an SV40 large T Ag poly A addition site), and transcriptional terminator sequences.

Other animal cells are available, for instance, from the American Type Culture Collection Catalogue of Cell Lines and Hybridomas (7th edition, (1992)). Appropriate vectors for expressing the proteins of the invention in insect cells are usually derived from baculovirus. Insect cell lines include mosquito larvae, silkworm, armyworm, moth and Drosophila cell lines such as a Schneider cell line (See Schneider J. Embryol. Exp. Morphol., 27:353-365 (1987). As indicated above, the vector, e.g., a plasmid, which is used to transform the host cell, preferably contains DNA sequences to initiate transcription and sequences to control the translation of the protein. These sequences are referred to as expression control sequences. As with yeast, when higher animal host cells are employed, polyadenylation or transcription terminator sequences from known mammalian genes need to be incorporated into the vector. An example of a terminator sequence is the polyadenylation sequence from the bovine growth hormone gene. Sequences for accurate splicing of the transcript may also be included. An example of a splicing sequence is the VP1 intron from SV40 (Sprague, J. et a/., J. Virol. 45: 773-781 (1983)). Additionally, gene sequences to control replication in the host cell may be Saveria-Campo, M., 1985, “Bovine Papilloma virus DNA a Eukaryotic Cloning Vector” in DNA Cloning Vol. II a Practical Approach Ed. D. M. Glover, IRL Press, Arlington, Va. pp. 213-238. The host cells are competent or rendered competent for transformation by various means. There are several well-known methods of introducing DNA into animal cells. These include: calcium phosphate precipitation, fusion of the recipient cells with bacterial protoplasts containing the DNA, treatment of the recipient cells with liposomes containing the DNA, DEAE dextran, electroporation and micro-injection of the DNA directly into the cells.

The transformed cells are cultured by means well known in the art (Biochemical Methods in Cell Culture and Virology, Kuchler, R. J., Dowden, Hutchinson and Ross, Inc., (1977)). The expressed polypeptides are isolated from cells grown as suspensions or as monolayers. The latter are recovered by well known mechanical, chemical or enzymatic means.

General methods of expressing recombinant proteins are also known and are exemplified in R. Kaufman, Methods in Enzymology 185, 537-566 (1990). As defined herein “operably linked” refers to linkage of a promoter upstream from a DNA sequence such that the promoter mediates transcription of the DNA sequence. Specifically, “operably linked” means that the isolated polynucleotide of the invention and an expression control sequence are situated within a vector or cell in such a way that the gene encoding the protein is expressed by a host cell which has been transformed (transfected) with the ligated polynucleotide/expression sequence. The term “vector”, refers to viral expression systems, autonomous self-replicating circular DNA (plasmids), and includes both expression and nonexpression plasmids.

The term “gene” as used herein is intended to refer to a nucleic acid sequence which encodes a polypeptide. This definition includes various sequence polymorphisms, mutations, and/or sequence variants wherein such alterations do not affect the function of the gene product. The term “gene” is intended to include not only coding sequences but also regulatory regions such as promoters, enhancers, termination regions and similar untranslated nucleotide sequences. The term further includes all introns and other DNA sequences spliced from the mRNA transcript, along with variants resulting from alternative splice sites.

A number of types of cells may act as suitable host cells for expression of the protein. Mammalian host cells include, for example, monkey COS cells, Chinese Hamster Ovary (CHO) cells, human kidney 293 cells, human epidermal A43 1 cells, human Co10205 cells, 3T3 cells, CV-1 cells, other transformed primate cell lines, normal diploid cells, cell strains derived from in vitro culture of primary tissue, primary explants, HeLa cells, mouse L cells, BHK, HL-60, U937, HaK or Jurkat cells. Alternatively, it may be possible to produce the protein in lower eukaryotes such as yeast or in prokaryotes such as bacteria. Potentially suitable yeast strains include

Saccharomyces cerevisiae, Schizosaccharomyces pombe

, Kluyveromyces strains, Candida or any yeast strain capable of expressing heterologous proteins. Potentially suitable bacterial strains include

Escherichia coli, Bacillus subtilis, Salmonella typhimurium

, or any bacterial strain capable of expressing heterologous proteins. If the protein is made in yeast or bacteria, it may be necessary to modify the protein produced therein, for example by phosphorylation or glycosylation of the appropriate sites, in order to obtain the functional protein.

The protein may also be produced by operably linking the isolated polynucleotide of the invention to suitable control sequences in one or more insect expression vectors, and employing an insect expression system. Materials and methods for baculovirus/insect cell expression systems are commercially available in kit form from, e.g., Invitrogen, San Diego, Calif., U.S.A. (the MaxBac© kit), and such methods are well known in the art, as described in Summers and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987), incorporated herein by reference. As used herein, an insect cell capable of expressing a polynucleotide of the present invention is “transformed.” The protein of the invention may be prepared by culturing transformed host cells under culture conditions suitable to express the recombinant protein.

The polymorphic protein of the invention may also be expressed as a product of transgenic animals, e.g., as a component of the milk of transgenic cows, goats, pigs, or sheep which are characterized by somatic or germ cells containing a nucleotide sequence encoding the protein. The protein may also be produced by known conventional chemical synthesis. Methods for constructing the proteins of the present invention by synthetic means are known to those skilled in the art.

The polymorphic proteins produced by recombinant DNA technology may be purified by techniques commonly employed to isolate or purify recombinant proteins. Recombinantly produced proteins can be directly expressed or expressed as a fusion protein. The protein is then purified by a combination of cell lysis (e.g., sonication) and affinity chromatography. For fusion products, subsequent digestion of the fusion protein with an appropriate proteolytic enzyme releases the desired polypeptide. The polypeptides of this invention may be purified to substantial purity by standard techniques well known in the art, including selective precipitation with such substances as ammonium sulfate, column chromatography, immunopurification methods, and others. See, for instance, R. Scopes, Protein Purification: Principles and Practice, Springer-Verlag: New York (1982), incorporated herein by reference. For example, in an embodiment, antibodies may be raised to the proteins of the invention as described herein. Cell membranes are isolated from a cell line expressing the recombinant protein, the protein is extracted from the membranes and immunoprecipitated. The proteins may then be further purified by standard protein chemistry techniques as described above.

The resulting expressed protein may then be purified from such culture (i.e., from culture medium or cell extracts) using known purification processes, such as gel filtration and ion exchange chromatography. The purification of the protein may also include an affinity column containing agents which will bind to the protein; one or more column steps over such affinity resins as concanavalin A-agarose, heparin-Toyopearl@ or Cibacrom blue 3GA Sepharose B; one or more steps involving hydrophobic interaction chromatography using such resins as phenyl ether, butyl ether, or propyl ether; or immunoaffinity chromatography. Alternatively, the protein of the invention may also be expressed in a form which will facilitate purification. For example, it may be expressed as a fusion protein, such as those of maltose binding protein (MBP), glutathione-S-transferase (GST) or thioredoxin (TRX). Kits for expression and purification of such fusion proteins are commercially available from New England BioLab (Beverly, Mass.), Pharmacia (Piscataway, N.J.) and In Vitrogen, respectively. The protein can also be tagged with an epitope and subsequently purified by using a specific antibody directed to such epitope. One such epitope (“Flag”) is commercially available from Kodak (New Haven, Conn.). Finally, one or more reverse-phase high performance liquid chromatography (RP-HPLC) steps employing hydrophobic RP-HPLC media, e.g., silica gel having pendant methyl or other aliphatic groups, can be employed to further purify the protein. Some or all of the foregoing purification steps, in various combinations, can also be employed to provide a substantially homogeneous isolated recombinant protein. The protein thus purified is substantially free of other mammalian proteins and is defined in accordance with the present invention as an “isolated protein.”

The term “antibody” as used herein refers to immunoglobulin molecules and immunologically active portions of immunoglobulin molecules, i.e., molecules that contain an antigen binding site that specifically binds (immunoreacts with) an antigen, such as polymorphic. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, F

ab

and F

(ab′)2

fragments, and an F

ab

expression library. In a specific embodiment, antibodies to human polymorphic proteins are disclosed.

The phrase “specifically binds to”, “immunospecifically binds to” or is “specifically immunoreactive with”, an antibody when referring to a protein or peptide, refers to a binding reaction which is determinative of the presence of the protein in the presence of a heterogeneous population of proteins and other biological materials. Thus, for example, under designated immunoassay conditions, the specified antibodies bind to a particular protein and do not bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. Of particular interest in the present invention is an antibody that binds immunospecifically to a polymorphic protein but not to its cognate wild type allelic protein, or vice versa. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See Harlow and Lane (1988) Antibodies, a Laboratory Manual, Cold Spring Harbor Publications, New York, for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity.

Polyclonal and/or monoclonal antibodies that immunospecifically bind to polymorphic gene products but not to the corresponding prototypical or “wild-type” gene products are also provided. Antibodies can be made by injecting mice or other animals with the variant gene product or synthetic peptide. Monoclonal antibodies are screened as are described, for example, in Harlow & Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Press, New York (1988); Goding, Monoclonal antibodies, Principles and Practice (2d ed.) Academic Press, New York (1986). Monoclonal antibodies are tested for specific immunoreactivity with a variant gene product and lack of immunoreactivity to the corresponding prototypical gene product.

An isolated polymorphic protein, or a portion or fragment thereof, can be used as an immunogen to generate the antibody that bind the polymorphic protein using standard techniques for polyclonal and monoclonal antibody preparation. The full-length polymorphic protein can be used or, alternatively, the invention provides antigenic peptide fragments of polymorphic for use as immunogens. The antigenic peptide of a polymorphic protein of the invention comprises at least 8 amino acid residues of the amino acid sequence encompassing the polymorphic amino acid and encompasses an epitope of the polymorphic protein such that an antibody raised against the peptide forms a specific immune complex with the polymorphic protein. Preferably, the antigenic peptide comprises at least 10 amino acid residues, more preferably at least 15 amino acid residues, even more preferably at least 20 amino acid residues, and most preferably at least 30 amino acid residues. Preferred epitopes encompassed by the antigenic peptide are regions of polymorphic that are located on the surface of the protein, e.g., hydrophilic regions.

For the production of polyclonal antibodies, various suitable host animals (e.g., rabbit, goat, mouse or other mammal) may be immunized by injection with the polymorphic protein. An appropriate immunogenic preparation can contain, for example, recombinantly expressed polymorphic protein or a chemically synthesized polymorphic polypeptide. The preparation can further include an adjuvant. Various adjuvants used to increase the immunological response include, but are not limited to, Freund's (complete and incomplete), mineral gels (e.g., aluminum hydroxide), surface active substances (e.g, lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, etc.), human adjuvants such as Bacille Calmette-Guerin and

Corynebacterium parvum

, or similar immunostimulatory agents. If desired, the antibody molecules directed against polymorphic proteins can be isolated from the mammal (e.g., from the blood) and further purified by well known techniques, such as protein A chromatography, to obtain the IgG fraction.

The term “monoclonal antibody” or “monoclonal antibody composition”, as used herein, refers to a population of antibody molecules that originates from the clone of a singly hybridoma cell, and that contains only one type of antigen binding site capable of immunoreacting with a particular epitope of a polymorphic protein. A monoclonal antibody composition thus typically displays a single binding affinity for a particular polymorphic protein with which it immunoreacts. For preparation of monoclonal antibodies directed towards a particular polymorphic protein, or derivatives, fragments, analogs or homologs thereof, any technique that provides for the production of antibody molecules by continuous cell line culture may be utilized. Such techniques include, but are not limited to, the hybridoma technique (see Kohler & Milstein, 1975

Nature

256: 495-497); the trioma technique; the human B-cell hybridoma technique (see Kozbor, et al., 1983

Immunol Today

4: 72) and the EBV hybridoma technique to produce human monoclonal antibodies (see Cole, et al., 1985 In: Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Human monoclonal antibodies may be utilized in the practice of the present invention and may be produced by using human hybridomas (see Cote, et al., 1983

. Proc Natl Acad Sci USA

80: 2026-2030) or by transforming human B-cells with Epstein Barr Virus in vitro (see Cole, et al., 1985 In: Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).

According to the invention, techniques can be adapted for the production of single-chain antibodies specific to a polymorphic protein (see e.g., U.S. Pat. No. 4,946,778). In addition, methodologies can be adapted for the construction of F

ab

expression libraries (see e.g., Huse, et al., 1989

Science

246: 1275-1281) to allow rapid and effective identification of monoclonal F

ab

fragments with the desired specificity for a polymorphic protein or derivatives, fragments, analogs or homologs thereof. Non-human antibodies can be “humanized” by techniques well known in the art. See e.g., U.S. Pat. No. 5,225,539. Antibody fragments that contain the idiotypes to a polymorphic protein may be produced by techniques known in the art including, but not limited to: (i) an F

(ab′)2

fragment produced by pepsin digestion of an antibody molecule; (ii) an F

ab

fragment generated by reducing the disulfide bridges of an F

(ab′)2

fragment; (iii) an F

ab

fragment generated by the treatment of the antibody molecule with papain and a reducing agent and (iv) F

v

fragments.

Additionally, recombinant anti-polymorphic protein antibodies, such as chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, which can be made using standard recombinant DNA techniques, are within the scope of the invention. Such chimeric and humanized monoclonal antibodies can be produced by recombinant DNA techniques known in the art, for example using methods described in PCT International Application No. PCT/US86/02269; European Patent Application No. 184,187; European Patent Application No. 171,496; European Patent Application No. 173,494; PCT International Publication No. WO 86/01533; U.S. Pat. No. 4,816,567; European Patent Application No. 125,023; Better et al. (1988)

Science

240:1041-1043; Liu et al. (1987)

PNAS

84:3439-3443; Liu et al. (1987)

J Immunol

. 139:3521-3526; Sun et al. (1987)

PNAS

84:214-218; Nishimura et al. (1987)

Cancer Res

47:999-1005; Wood et al. (1985)

Nature

314:446-449; Shaw et al. (1988)

J Natl Cancer Inst

80:1553-1559); Morrison(1985)

Science

229:1202-1207; Oi et al. (1986)

Bio Techniques

4:214; U.S. Pat. No. 5,225,539; Jones et al. (1986)

Nature

321:552-525; Verhoeyan et al. (1988)

Science

239:1534; and Beidler et al. (1988)

J Immunol

141:4053-4060.

In one embodiment, methodologies for the screening of antibodies that possess the desired specificity include, but are not limited to, enzyme-linked immunosorbent assay (ELISA) and other immunologically-mediated techniques known within the art.

Anti-polymorphic protein antibodies may be used in methods known within the art relating to the detection, quantitation and/or cellular or tissue localization of a polymorphic protein (e.g., for use in measuring levels of the polymorphic protein within appropriate physiological samples, for use in diagnostic methods, for use in imaging the protein, and the like). In a given embodiment, antibodies for polymorphic proteins, or derivatives, fragments, analogs or homologs thereof, that contain the antibody-derived CDR, are utilized as pharmacologically-active compounds in therapeutic applications intended to treat a pathology in a subject that arises from the presence of the cSNP allele in the subject.

An anti-polymorphic protein antibody (e.g., monoclonal antibody) can be used to isolate polymorphic proteins by a variety of immunochemical techniques, such as immunoaffinity chromatography or immunoprecipitation. An anti-polymorphic protein antibody can facilitate the purification of natural polymorphic protein from cells and of recombinantly produced polymorphic proteins expressed in host cells. Moreover, an anti-polymorphic protein antibody can be used to detect polymorphic protein (e.g., in a cellular lysate or cell supernatant) in order to evaluate the abundance and pattern of expression of the polymorphic protein. Anti-polymorphic antibodies can be used diagnostically to monitor protein levels in tissue as part of a clinical testing procedure, e.g., to, for example, determine the efficacy of a given treatment regimen. Detection can be facilitated by coupling (i.e., physically linking) the antibody to a detectable substance. Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, -galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials includes umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive material include

125

I,

131

I,

35

S or

3

H.

TABLE 1

Protein

Similarity (pValue)

CuraGen

Base pos.

Polymorphic

Base

base

Amino acid

Amino acid

classification of

Name of protein identified following a

following a

Map

Seq ID

sequence ID

of SNP

sequence

before

after

before

after

Type of change

BLASTX analysis of the CuraGen sequence

BLASTX analysis

location

1-2

cg34158766

254

TCGGGCTCGT

G

A

SILENT-

angiopoietin

Human Gene Similar to SPTREMBL-ID: O08538

6.00E−34

4

CCAGCCAAAG

NONCODING

ANGIOPOIETIN-1 -

MUS MUSCULUS

(MOUSE),

CAGCT[G/A]CC

498 aa.

ACTCAAAAGAA

AGTAGAAAGAA

A

3-4

cg43299481

218

GTCCCGGTCA

G

gap

SILENT-

dehydrogenase

Human Gene Similar to SPTREMBL-ID: Q44020

9.30E−33

GGGTGGGCTG

NONCODING

4-HYDROXYBUTYRATE DEHYDROGENASE

CGGGA[G/gap]C

(GBD), ORF 2 AND 4-10 GENES, COMPLETE

CTCAGGCCAC

CDS, AND ORF3 AND 11, 3′ END -

GGGGAAGTAG

ALCALIGENES EUTROPHUS

, 173 aa.

TGGG

5-6

cg43933536

1628

ACACCGGGGG

C

T

SILENT-

eph

Human Gene Similar to TREMBLNEW-ID:

7.70E−37

3

GGCGCGGGGG

NONCODING

G2735762 HEAT SHOCK PROTEIN DNAJ -

TCTCC[C/T]TGG

LEPTOSPIRA INTERROGANS, 369 aa.

TCCGCAGAGA

CAGCTAGCTAG

C

7-8

cg44021557

391

TGATATCCAGG

T

C

Leu

Leu

SILENT-

helicase

Human Gene Similar to SWISSPROT-ID: Q11039

1.50E−42

2

CTGAATCTATC

CODING

ATP-DEPENDENT RNA HELICASE DEAD

CCA[T/C]TGATC

HOMOLOG -

MYCOBACTERIUM

TTAGGAGGAG

TUBERCULOSIS

, 563 aa.

GTGATGTACT

9-10

cg44021557

516

GAAAGACCAAC

gap

A

SILENT-

helicase

Human Gene Similar to SWISSPROT-ID: Q11039

1.50E−42

2

AGGAAGGCAA

NONCODING

ATP-DEPENDENT RNA HELICASE DEAD

AAAA[gap/A]GG

HOMOLOG -

MYCOBACTERIUM

AAAAACAACAA

TUBERCULOSIS

, 563 aa.

TTAAAACTGGT

G

11-12

cg43305492

323

TCACCATCACC

A

G

Lys

Lys

SILENT-

immunoglob

Human Gene Similar to TREMBLNEW-ID:

740E−49

AAGGACACCTC

CODING

G2734101 IMMUNOGLOBULIN HEAVY CHAIN,

CAA[A/G]AACC

VD(5)J(4) LIKE GENE PRODUCT -

HOMO

AGGTGGTCCTT

SAPIENS

(HUMAN), 151 aa.

ACAATGACCA

13-14

cg43941918

896

GATTAGAACCC

C

T

Gln

Gln

SILENT-

immunoglob

Human Gene Similar to TREMBLNEW-ID:

2.80E−42

AAATAGATCAG

CODING

G240581 IMMUNOGLOBULIN G2B VARIABLE

GAG[C/T]TGTG

REGION LIGHT CHAIN, AUTOANTOBODY

GAGACTGCCCT

BV04-01 VARIABLE REGION LIGHT CHAIN -

GGCTTCTGCA

MUSSP, 113 aa.

15-16

cg42312510

235

CTACTTTGGAA

A

G

Ser

Ser

SILENT-

immunoglob

Human Gene Similar to TREMBLNEW-ID:

1.20E−39

AGTGGGGTCC

CODING

G300839 IMMUNOGLOBULIN LIGHT CHAIN

CATC[A/G]AGAT

VARIABLE REGION {CLONE ALPHA FOG1-

TCAGCGGCAG

A4} -

HOMO SAPIENS

, 107 aa.

TGGATCTGGGA

17-18

cg43147217

2161

TAACCTATTTAT

A

T

SILENT-

interleukin

Human Gene Similar to SWISSPROT-ID: P10145

8.30E−48

4

TATTTATGTATT

NONCODING

INTERLEUKIN-8 PRECURSOR (IL-8)

(4q12)

T[A/T]TTTAAGC

(MONOCYTE-DERIVED NEUTROPHIL

ATCAAATATTT

CHEMOTACTIC FACTOR) (MDNCF) (T-CELL

GTGCAAG

CHEMOTACTIC FACTOR) (NEUTROPHIL-

ACTIVATING PROTEIN 1) (NAP-1)

(LYMPHOCYTE-DERIVED NEUTROPHIL-

ACTIVATING FACTOR) (LYNAP) (PROTEIN

3-10C) (NEUTROPHIL-ACTIVATING FACTOR)

(NAF) (GRANULOCYTE CHEMOTACTIC

PROTEIN 1) (GCP-1) (EMOCTAKIN) -

HOMO

SAPIENS

19-20

cg20725546

402

AGGCCGCTGA

T

C

Gly

Gly

SILENT-

kinase

Human Gene Similar to SWISSPROT-ID: O06821

9.40E−42

CACTCTCGTTA

CODING

PHOSPHOGLYCERATE KINASE (EC 2.7.2.3) -

TTGG[T/C]GGC

MYCOBACTERIUM TUBERCULOSIS

, 412 aa.

GGTATGGCGTA

CACCTTCCTCA

21-22

cg32160481

218

GGGAAATGGC

C

T

SILENT-

MHC

Human Gene Similar to SWISSPROT-ID: P30508

5.60E−37

CTCTGTGGGG

NONCODING

HLA CLASS HISTOCOMPATIBILITY

AGGAG[C/T]GA

ANTIGEN, CW*1201 ALPHA CHAIN PRE-

GGGGCCCGCC

CURSOR (HLA-CX52) -

HOMO SAPIENS

CGGCGGGGGC

(HUMAN), 366 aa.

GCA

23-24

cg29691725

109

CTCGATGTCGC

T

C

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

CCGAGACATG

NONCODING

E1264534 ENDONUCLEASE III -

GAGA[T/C]CGT

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

CGGCCCCTCG

CCAAGGCATTG

C

25-26

cg29691725

122

GAGACATGGA

G

A

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

GATCGTCGGC

NONCODING

E1264534 ENDONUCLEASE III -

CCCTC[G/A]CC

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

AAGGCATTGCA

AGCCGTGATC

GG

27-28

cg29691725

146

CGCCAAGGCA

G

A

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

TTGCAAGCCGT

NONCODING

E1264534 ENDONUCLEASE III -

GATC[G/A]GGC

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

ATGGGCCGGC

CCTCTGTGGC

GT

29-30

cg29691725

169

CGGGCATGGG

gap

C

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

CCGGCCCTCT

NONCODING

E1264534 ENDONUCLEASE III -

GTGGC[gap/C]G

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

TCCCGGAACTT

TTCGCAATCGG

CC

31-32

cg29691725

204

CTTTTCGCAAT

A

G

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

CGGCCCCGAC

NONCODING

E1264534 ENDONUCLEASE III -

GGCA[A/G]ATG

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

CGAGCACCCC

GGTATTCCGGC

A

33-34

cg29691725

231

TGCGAGCACC

T

C

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

CCGGTATTCCG

NONCODING

E1264534 ENDONUCLEASE III -

GCAT[T/C]TTCA

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

CGCTGATGAG

CTATGCCCGCA

35-36

cg29691725

232

GCGAGCACCC

T

C

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

CGGTATTCCGG

NONCODING

E1264534 ENDONUCLEASE III -

CATT[T/C]TCAC

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

GCTGATGAGCT

ATGCCCGCAT

37-38

cg29691725

247

TTCCGGCATTT

T

C

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

TCACGCTGATG

NONCODING

E1264534 ENDONUCLEASE III -

AGC[T/C]ATGC

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

CCGCATCTCCC

CCGCAGCCAG

39-40

cg29691725

300

AACATCAAGGC

C

T

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

GGCCATCACC

NONCODING

E1264534 ENDONUCLEASE III -

GCGC[C/T]GAA

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

AACGTTCATCC

CCCTCATCGAC

41-42

cg29691725

80

ACGCCCAGATC

G

A

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

GTCACGACGG

NONCODING

E1264534 ENDONUCLEASE III -

TCAC[G/A]CCC

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

CTCGATGTCGC

CCGAGACATG

G

43-44

cg29691725

89

TCGTCACGACG

T

C

SILENT-

nuclease

Human Gene Similar to TREMBLNEW-ID:

1.90E−34

GTCACGCCCCT

NONCODING

E1264534 ENDONUCLEASE III -

CGA[T/C]GTCG

MYCOBACTERIUM TUBERCULOSIS

, 226 aa.

CCCGAGACAT

GGAGATCGTC

G

45-46

cg43986329

1496

GGGTTTCCCAT

C

T

SILENT-

protease

Human Gene Similar to SWISSNEW-ID: P55032

2.20E-48

20

CAGCATTGCCG

NONCODING

MATRILYSIN PRECURSOR (EC 3.4.24.23)

(16q13)

TCC[C/T]GGGT

(PUMP 1 PROTEASE) (UTERINE METALLO-

GTAGAGTCTCT

PROTEINASE) (MATRIX METALLO-

CGCTGGGGCA

PROTEINASE-7) (MMP-7) (MATRIN) -

FELIS

SILVESTRIS CATUS

(CAT), 262 aa (fragment).|

pcls: SWISSPROT-ID: P55032 MATRILYSIN

PRECURSOR (EC 3.4.24.23) (PUMP 1

PROTEASE) (UTERINE METALLOPRO-

TEINASE) (MATRIX METALLOPROTEINASE-7)

(MMP-7) (MATRIN) -

FELIS SILVESTRIS CATUS

(CAT), 262 aa (fragment).

47-48

cg20438082

137

CTTCAACTGCT

A

T

Thr

Thr

SILENT-

ribosomalprot

Human Gene Similar to SWISSPROT-ID: P04447

2.80E−40

TTAGCAACATC

CODING

50S RIBOSOMAL PROTEIN L1 -

BACILLUS

CAT[A/T]GTTAC

STEAROTHERMOPHILUS

, 232 aa.

AGTACCAGTTT

TAGGGTTTG

49-50

cg20438082

206

CAAGGACACGT

A

T

Leu

Leu

SILENT-

ribosomalprot

Human Gene Similar to SWISSPROT-ID:P04447

2.80E−40

CCAAGACGTCC

CODING

50S RIBOSOMAL PROTEIN L1 -

BACILLUS

AAC[A/T]AGAGC

STEAROTHERMOPHILUS

, 232 aa.

CATCATATCAG

GTGTAGCGA

51-52

cg39547655

545

GCTGGATCCAC

A

G

Gly

Gly

SILENT-

ribosomalprot

Human Gene Similar to SWISSPROT-ID: P94977

2.30E−38

AGCGAGCGGA

CODING

50S RIBOSOMAL PROTEIN L20 -

AGTC[A/G]CCC

MYCOBACTERIUM TUBERCULOSIS

, 129 aa.|

TTCTTAGCACG

pcls: SPTREMBL-ID: P94977 50S RIBOSOMAL

ACGGTCACGG

PROTEIN L20 - MYCOBACTERIUM

A

53-54

cg29358731

181

TCTCGCCTAGG

G

T

Arg

Arg

SILENT-

struct

Human Gene Similar to SWISSPROT-ID: P93087

2.90E−47

TTGGTCATGAC

CODING

CALMODULIN -

CAPSICUM ANNUUM

(BELL

ATG[G/T]CGAA

PEPPER), 148 aa.

GCTCAGCAGC

GGAGATGAAG

C

55-56

cg29358731

196

TCATGACATGG

G

A

Ser

Ser

SILENT-

struct

Human Gene Similar to SWISSPROT-ID: P93087

2.90E−47

CGAAGCTCAG

CODING

CALMODULIN -

CAPSICUM ANNUUM

(BELL

CAGC[G/A]GAG

PEPPER), 148 aa.

ATGAAGCCGTT

CTGGTCCTTGT

57-58

cg29358731

229

AGCCGTTCTGG

A

G

Arg

Arg

SILENT-

struct

Human Gene Similar to SWISSPROT-ID: P93087

2.90E−47

TCCTTGTCGAA

CODING

CALMODULIN -

CAPSICUM ANNUUM

(BELL

CAC[A/G]CGGA

PEPPER), 148 aa.

AGGCCTCCTTG

AGCTCCTCCT

59-60

cg29358731

301

TGCGTGCCATC

A

G

Pro

Pro

SILENT-

struct

Human Gene Similar to SWISSPROT-ID: P93087

2.90E−47

AGGTTGAGAAA

CODING

CALMODULIN -

CAPSICUM ANNUUM

(BELL

CTC[A/G]GGAA

PEPPER), 148 aa.

AGTCGATGGTT

CCATTGCCAT

61-62

cg29358731

304

GTGCCATCAG

A

G

Phe

Phe

SILENT-

struct

Human Gene Similar to SWISSPROT-ID: P93087

2.90E−47

GTTGAGAAACT

CODING

CALMODULIN -

CAPSICUM ANNUUM

(BELL

CAGG[A/G]AAG

PEPPER), 148 aa.

TCGATGGTTCC

ATTGCCATCAG

63-64

cg44127439

554

TGCTGCGAAGA

C

T

Glu

Glu

SILENT-

synthase

Human Gene Similar to SWISSPROT-ID: P19206

7.90E−45

ATCTCGAGGG

CODING

BIOTIN SYNTHASE (EC 2.8.1.6) (BIOTIN

CCTC[C/T]TCGC

SYNTHETASE) -

BACILLUS SPHAERICUS

,

GGGTAATGGA

332 aa.

CTCCCCCCTCA

65-66

cg44127439

704

AGTCCATGAGG

A

G

SILENT-

synthase

Human Gene Similar to SWISSPROT-ID: P19206

7.90E−45

CAGCACCAATG

NONCODING

BIOTIN SYNTHASE (EC 2.8.1.6) (BIOTIN

TGG[A/G]CTAC

SYNTHETASE) -

BACILLUS SPHAERICUS

,

CCCCGGAAAT

332 aa.

GGTGTCGTCGT

67-68

cg25321479

101

AGAAAGTGTTG

G

A

Ile

Ile

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

GCTCCCAGGG

CODING

E1245760 PUATIVE COBALT TRANSPORT

TGGA[G/A]ATTC

PROTEIN -

STREPTOMYCES COELICOLOR

,

CCCCGTGAGC

257 aa.

CAGGAGCAGG

G

69-70

cg25321479

113

CTCCCAGGGT

A

G

Ala

Ala

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

GGAGATTCCCC

CODING

E1245760 PUATIVE COBALT TRANSPORT

CGTG[A/G]GCC

PROTEIN -

STREPTOMYCES COELICOLOR

,

AGGAGCAGGG

257 aa.

CCTGGAAGATG

A

71-72

cg25321479

197

GTTTGAACAGT

C

T

Thr

Thr

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

GCCGCCCCAA

CODING

E1245760 PUATIVE COBALT TRANSPORT

CTCC[C/T]GTTC

PROTEIN -

STREPTOMYCES COELICOLOR

,

CGGTGGGGTG

257 aa.

CGAGGACGAT

C

73-74

cg25321479

245

ATCCCGTAACG

G

A

Ala

Ala

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

CTGGGCAGTTT

CODING

E1245760 PUATIVE COBALT TRANSPORT

GAT[G/A]GCCG

PROTEIN -

STREPTOMYCES COELICOLOR

,

AGAGCACAAAG

257 aa.

GTGAAAGCGC

75-76

cg25321479

35

GCCCACGTAGT

A

G

Tyr

Tyr

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

TTCTTGGTGAG

CODING

E1245760 PUATIVE COBALT TRANSPORT

CTT[A/G]TAAAA

PROTEIN -

STREPTOMYCES COELICOLOR

,

GGCGTACCCA

257 aa.

GCCCACGGTC

77-78

cg25321479

38

CACGTAGTTTC

A

G

Phe

Phe

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

TTGGTGAGCTT

CODING

E1245760 PUATIVE COBALT TRANSPORT

ATA[A/G]AAGG

PROTEIN -

STREPTOMYCES COELICOLOR

,

CGTACCCAGC

257 aa.

CCACGGTCCC

G

79-80

cg25321479

83

GTCCCGCGATA

G

A

Asn

Asn

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

GCCATCGAGAA

CODING

E1245760 PUATIVE COBALT TRANSPORT

AGT[G/A]TTGG

PROTEIN -

STREPTOMYCES COELICOLOR

,

CTCCCAGGGT

257 aa.

GGAGATTCCCC

81-82

cg25321479

14

ACGCGTCAGG

A

G

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

CCC[A/G]CGTA

NONCODING

E1245760 PUATIVE COBALT TRANSPORT

GTTTCTTGGTG

PROTEIN -

STREPTOMYCES COELICOLOR

,

AGCTTATAAA

257 aa.

83-84

cg25321479

17

ACGCGTCAGG

T

A

SILENT-

transport

Human Gene Similar to TREMBLNEW-ID:

9.80E−47

CCCACG[T/A]A

NONCODING

E1245760 PUATIVE COBALT TRANSPORT

GTTTCTTGGTG

PROTEIN -

STREPTOMYCES COELICOLOR

,

AGCTTATAAAA

257 aa.

GG

85-86

cg39548335

298

CTTCTGTATTC

G

A

Ala

Ala

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P43618

2.90E−48

TCGTTGTTACC

CODING

FIED

HYPOTHETICAL 41.3 KD PROTEIN IN SAP155-

GGC[G/A]GCCT

YMR31 INTERGENIC REGION -

Saccharomyces

TCCTGGGAGT

cerevisiae

(Baker's yeast), 361 aa.

GCTCATTATTC

87-88

cg43335190

341

GTAGTTCATAT

C

T

SILENT-

UNCLASSI-

Human Gene Similar to SWISSNEW-ACC: P32803

7.60E−48

CTATTTACTTTT

NONCODING

FIED

ENDOSOMAL P24B PROTEIN PRECURSOR

GC[C/T]TACATA

(24 KD ENDOMEMBRANE PROTEIN) (BASIC

CGATTACATAC

24 KD LATE ENDOCYTIC INTERMEDIATE

ACGATTGG

COMPONENT) -

Saccharomyces cerevisiae

(Baker's yeast), 203 aa.

89-90

cg43335190

411

AATTCTCGGTT

T

C

SILENT-

UNCLASSI-

Human Gene Similar to SWISSNEW-ACC: P32803

7.60E−48

TCATACTTTTTA

NONCODING

FIED

ENDOSOMAL P24B PROTEIN PRECURSOR

CC[T/C]TGATCC

(24 KD ENDOMEMBRANE PROTEIN) (BASIC

TTCCACTGTTT

24 KD LATE ENDOCYTIC INTERMEDIATE

TTCCCTGT

COMPONENT) -

Saccharomyces cerevisiae

(Baker's yeast), 203 aa.

91-92

cg43298420

625

GGTGACCGTG

G

T

Pro

Pro

SILENT-

UNCLASSI-

Human Gene Similar to TREMBLNEW-ACC:

5.30E−47

GTCTGAAAGAA

CODING

FIED

MD27784 PTD001 -

HOMO SAPIENS

(HUMAN),

GGCT[G/T]GGT

218 aa.

TGAACTGGTAC

AGCTTCAGGAC

93-94

cg44928880

161

ACTGAGGTTGG

C

T

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: O17549

5.30E−47

GTTTCAGACCA

NONCODING

FIED

M18.8 PROTEIN -

CAENORHABDITIS ELEGANS

,

AGA[C/T]ACTG

447 aa.

GATTCTCCTAG

TTAAGATAAA

95-96

cg39410689

99

ATCACCACCAC

C

T

His

His

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P43638

5.50E−46

CGCCACCACC

CODING

FIED

MAP-HOMOLOGOUS PROTEIN 1 -

ATCA[C/T]ACGG

Saccharomyces cerevisiae

(Baker's yeast), 1398 aa.

AAGATGCTCCT

GCACCTAAGA

97-98

cg20297086

176

TGCACGACAAG

T

C

Asn

Asn

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: P95543

9.00E−45

TACCCGGAGCT

CODING

FIED

ELONGATION FACTOR TU1 -

PLANOBISPORA

GAA[T/C]GAGG

ROSEA

, 397 aa.

AGTCGCCGTTC

GACCAGATCG

99-100

cg20297086

239

AGGAGCGTCA

C

T

Ile

Ile

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: P95543

9.00E−45

GCGCGGCATC

CODING

FIED

ELONGATION FACTOR TU1 -

PLANOBISPORA

ACCAT[C/T]CG

ROSEA

, 397 aa.

ATCGCCCACAT

CGAGTACCAGA

101-102

cg20297086

248

AGCGCGGCAT

C

T

Ala

Ala

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: P95543

9.00E−45

CACCATCTCGA

CODING

FIED

ELONGATION FACTOR TU1 -

PLANOBISPORA

TCGC[C/T]CACA

ROSEA

, 397 aa.

TCGAGTACCAG

ACCGAGAAGC

103-104

cg39386301

1177

TATGTTAATGG

T

C

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P32612

1.00E−43

TGAAGAAATTC

NONCODING

FIED

PAU2 PROTEIN -

Saccharomyces cerevisiae

ACC[T/C]CCGA

(Baker's yeast), 120 aa.

CCGTGGTATGT

CAATGTGAGA

105-106

cg39386301

958

ATAATGATAAA

T

C

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P32612

1.00E−43

ATAGTTCGTTC

NONCODING

FIED

PAU2 PROTEIN -

Saccharomyces cerevisiae

ATA[T/C]ACTCC

(Baker's yeast), 120 aa.

GGTGGGATCAT

TGCAGAAAT

107-108

cg39515238

192

TCAAGATCTAT

C

A

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P53845

3.60E−40

ATTGCACACCA

NONCODING

FIED

HYPOTHETICAL 35.5 KD PROTEIN IN PIK1-

GAG[C/A]TGTT

POL2 INTERGENIC REGION -

Saccharomyces

GTTTTATACTA

cerevisiae

(Baker's yeast), 314 aa.

CAACTCATCT

109-110

cg29693502

181

CAGCGGGGTA

G

A

Tyr

Tyr

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

GCGCACCTGAT

CODING

FIED

O10379 PROBABLE GLUTAMATE-AMMONIA-

CGAC[G/A]TATT

LIGASE ADENYLYLTRANSFERASE

CCAACAGCTCA

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

TTCGTCAACT

ADENYLYLTRANSFERASE) (ATASE) -

Mycobacterium

111-112

cg29693502

226

TCAACTCCCGG

A

G

Ser

Ser

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

TCGCCGGCAC

CODING

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

CGTG[A/G]CTA

LIGASE ADENYLYLTRANSFERASE

GCCCGCAGCA

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

GGGCCTGGGA

ADENYLYLTRANSFERASE) (ATASE) -

TT

Mycobacterium

113-114

cg29693502

319

CCAGGCTCCG

T

C

Arg

Arg

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

TACCATCGGAC

CODING

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

CGGA[T/C]CGG

LIGASE ADENYLYLTRANSFERASE

CCTTCCGGGC

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

GCAGATCGGC

ADENYLYLTRANSFERASE) (ATASE) -

AT

Mycobacterium

115-116

cg29693502

385

GGTCCGCGCC

G

A

Asn

Asn

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

GTGTTTGCCGA

CODING

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

GCAG[G/A]TTG

LIGASE ADENYLYLTRANSFERASE

CGTAGCTTAGT

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

GACGATTTTCA

ADENYLYLTRANSFERASE) (ATASE) -

Mycobacterium

117-118

cg29693502

388

CCGCGCCGTG

G

A

Arg

Arg

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

TTTGCCGAGCA

CODING

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

GGTT[G/A]CGT

LIGASE ADENYLYLTRANSFERASE

AGCTTAGTGAC

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

GATTTTCAGCG

ADENYLYLTRANSFERASE) (ATASE) -

Mycobacterium

119-120

cg29693502

397

GTTTGCCGAGC

A

G

Thr

Thr

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

AGGTTGCGTAG

CODING

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

CTT[A/G]GTGAC

LIGASE ADENYLYLTRANSFERASE

GATTTTCAGCG

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

CCTTCTCCC

ADENYLYLTRANSFERASE) (ATASE) -

Mycobacterium

121-122

cg29693502

403

CGAGCAGGTT

G

A

Ile

Ile

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

GCGTAGCTTAG

CODING

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

TGAC[G/A]ATTT

LIGASE ADENYLYLTRANSFERASE

TCAGCGCCTTC

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

TCCCCGACTC

ADENYLYLTRANSFERASE) (ATASE) -

Mycobacterium

123-124

cg43935044

511

GAGTCTGGGG

G

T

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: P70429

3.50E−39

CTGGCTGGGC

NONCODING

FIED

ENA-VASODILATOR STIMULATED PHOSPHO-

TTCTG[G/T]CTG

PROTEIN (ENA-VASP LIKE PROTEIN) -

MUS

TCCTCTGTCGC

MUSCULUS

(MOUSE), 393 aa.

CGGATGGGCT

C

125-126

cg27850036

116

TGAATGGTGGC

G

A

Asp

Asp

SILENT-

UNCLASSI-

Human Gene Similar to SWISSNEW-ACC: P11653

1.00E−38

AGACCGGCGT

CODING

FIED

METHYLMALONYL-COA MUTASE ALPHA-

AGGT[G/A]TCTA

SUBUNIT (EC 5.4.99.2) (MCM-ALPHA) -

GCCAGTCCATG

Propionibacterium freudenreichii

TCACCGTAGA

shermanii

, 727 aa.

127-128

cg27850036

14

ACGCGTTGGA

T

C

Lys

Lys

SILENT-

UNCLASSI-

Human Gene Similar to SWISSNEW-ACC: P11653

1.00E−38

CTC[T/C]TTAGC

CODING

FIED

METHYLMALONYL-COA MUTASE ALPHA-

GGTGGAGAAT

SUBUNIT (EC 5.4.99.2) (MCM-ALPHA) -

CCGGCGTACT

Propionibacterium freudenreichii shermanii

, 727 aa.

129-130

cg27850036

41

TAGCGGTGGA

G

A

Arg

Arg

SILENT-

UNCLASSI-

Human Gene Similar to SWISSNEW-ACC: P11653

1.00E−38

GAATCCGGCG

CODING

FIED

METHYLMALONYL-COA MUTASE ALPHA-

TACTG[G/A]CG

SUBUNIT (EC 5.4.99.2) (MCM-ALPHA) -

AATCGTCCACG

Propionibacterium freudenreichii shermanii

, 727 aa.

GCCGGAAGGC

AT

131-132

cg42331882

466

TGTGCGTAGG

G

A

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: Q15407

1.10E−37

18

GAAAGTCAGTG

NONCODING

FIED

RTSBETA -

HOMO SAPIENS

(HUMAN), 416 aa.

TCGT[G/A]CAG

CTCCCAGGAG

CCTCCTGAGC

GT

133-134

cg43945926

203

ACAACGCGAG

A

G

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

1.10E−37

CCAGGAGTACT

NONCODING

FIED

Q06003 GOLIATH PROTEIN (G1 PROTEIN) -

ACAC[A/G]GCG

Drosophila melanogaster

(Fruit fly), 284 aa.

CTCATCAACGT

GACGGTGCAG

G

135-136

cg39575634

12

AGATCTGGGAC

A

G

Thr

Thr

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: Q63965

3.00E−37

[A/G]ATGTCTG

CODING

FIED

TRICARBOXYLATE CARRIER -

RATTUS

GGGAAGTGCC

NORVEGICUS

(RAT), 357 aa (fragment).

ACCCAACA

137-138

cg39575634

120

TCTTCACGGTT

C

T

Asn

Asn

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: Q63965

3.00E−37

ACTGATCCCAG

CODING

FIED

TRICARBOXYLATE CARRIER -

RATTUS

AAA[C/T]ATCCT

NORVEGICUS

(RAT), 357 aa (fragment).

TTTAACGAACG

AACAGCTAG

139-140

cg39575634

165

AGCTAGAGAAT

A

G

Val

Val

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: Q63965

3.00E−37

GCGAGGAAAG

CODING

FIED

TRICARBOXYLATE CARRIER -

RATTUS

TGGT[A/G]CAC

NORVEGICUS

(RAT), 357 aa (fragment).

GATTACAGGCA

AGGAATCGTTC

141-142

cg39575634

214

TCCTGCCGGC

T

C

Leu

Leu

SILENT-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: Q63965

3.00E−37

CTCACGGAAAA

CODING

FIED

TRICARBOXYLATE CARRIER -

RATTUS

TGAG[T/C]TATG

NORVEGICUS

(RAT), 357 aa (fragment).

GAGAGCGAAG

TACGCGT

143-144

cg27826036

118

GTAGGGCGAC

C

T

Glu

Glu

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.40E−36

GGCGTATTTAT

CODING

FIED

Q10776 PUTATIVE LONG-CHAIN-FATTY-

GTCC[C/T]TCG

ACID--COA LIGASE (EC 6.2.1.3) (LONG-CHAIN

CCAAGCACGA

ACYLCOA SYNTHETASE) (LACS) -

CAGCGTTAGAC

A

145-146

cg27826036

163

TAGACAGGTAC

G

C

Leu

Leu

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.40E−36

GGGCAGTTGG

CODING

FIED

Q10776 PUTATIVE LONG-CHAIN-FATTY-

CCAT[G/C]AGA

ACID--COA LIGASE (EC 6.2.1.3) (LONG-CHAIN

GTGGCCTCGA

ACYLCOA SYNTHETASE) (LACS) -

CCTTCTGTGGG

G

147-148

cg42538578

207

AGAGACATTGC

G

A

Cys

Cys

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

7.40E−34

8

CCTCCACTGCT

CODING

FIED

Q09753 BETA-DEFENSIN 1 PRECURSOR

GAC[G/A]CAATT

(HBD-1) (DEFENSIN, BETA 1) -

Homo sapiens

GTAATGATCAG

(Human), 68 aa.

ATCTGTGGC

149-150

cg42538578

365

GGATTTCAGGA

G

A

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

7.40E−34

8

ACTGGGGAGA

NONCODING

FIED

Q09753 BETA-DEFENSIN 1 PRECURSOR

GGCT[G/A]GCT

(HBD-1) (DEFENSIN, BETA 1) -

Homo sapiens

CCTTTGGAGGC

(Human), 68 aa.

TGAGCTGACAG

151-152

cg44002673

1167

AGCAAGCTCTT

A

G

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P32740

7.40E−34

13

TGAAACCTGAG

NONCODING

FIED

HYPOTHETICAL 31.0 KD PROTEIN R107.2 IN

CCC[A/G]CGCA

CHROMOSOME III -

Caenorhabditis elegans

,

GACCAGAAGTA

285 aa.

AACAGGCACC

153-154

cg39710199

535

AGCCGGTGCG

C

T

SILENT-

UNCLASSI-

Human Gene Similar to TREMBLNEW-ACC:

1.20E−33

GCCTGAGGTG

NONCODING

FIED

CAB50754 PUTATIVE INTEGRAL MEMBRANE

CGGGG[C/T]GG

TRANSPORT PROTEIN -

STREPTOMYCES

AGATCGAGTGT

COELICOLOR

, 269 aa.

CGTCATGTCAA

T

155-156

cg39710199

736

TGTGGGTAGTG

C

A

SILENT-

UNCLASSI-

Human Gene Similar to TREMBLNEW-ACC:

1.20E−33

AGCACGACGG

NONCODING

FIED

CAB50754 PUTATIVE INTEGRAL MEMBRANE

AGAC[C/A]CCG

TRANSPORT PROTEIN -

STREPTOMYCES

TCATGACGCAT

COELICOLOR

, 269 aa.

TTGCTCAACGA

157-158

cg38821538

155

ATCACCTGAGG

T

C

Asp

Asp

SILENT-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P39195

3.50E−33

TCCGGAGTTCA

CODING

FIED

!!!! ALU SUBFAMILY SX WARNING ENTRY -

AGA[T/C]CAGC

Homo sapiens

(Human), 591 aa.

CTGGCCAACAT

GATGAAACCC

159-160

cg21632104

261

CGTCGCAGTAC

A

G

Leu

Leu

SILENT-

UNCLASSI-

Human Gene Similar to TREMBLNEW-ACC:

3.00E−32

GTTCTGGCCTG

CODING

FIED

CAB38487 PUTATIVE HELICASE -

TCA[A/G]CGTTT

STREPTOMYCES COELICOLOR

, 815 aa.

TGCATATCCCG

GCAAAGGCC

161-162

cg20370177

370

CCCAAGCCAAT

A

C

Val

Val

SILENT-

UNCLASSI-

Human Gene Similar to TREMBLNEW-ACC:

1.50E−31

ACCAAGATGAT

CODING

FIED

CAB49211 HYPOTHETICAL 57.7 KD PROTEIN -

CGC[A/C]ACTG

PYROCOCCUS ABYSSI

, 533 aa.

GCATCATGTCT

CCCATGCCTT

163-164

cg14810223

103

CGAGCAGCCC

A

G

SILENT-

GCCAGGACTCT

NONCODING

GGCT[A/G]CTG

GAGATGGGCG

CCCGGCTATC

GC

165-166

cg19882105

125

GAGATCAGTGT

C

T

SILENT-

GATGATGCACA

NONCODING

GGA[C/T]GGAT

GCGGGAATCC

CAGCTCTTCAT

167-168

cg19882105

131

AGTGTGATGAT

C

T

SILENT-

GCACAGGACG

NONCODING

GATG[C/T]GGG

AATCCCAGCTC

TTCATATGGCT

169-170

cg19885950

67

TCCTCAAGTCT

G

T

SILENT-

TTGTTCAAATAT

NONCODING

CA[G/T]CTTTTC

AGCAAGACCTT

CATTAACT

171-172

cg20452710

118

TGGCCTGGAG

A

G

SILENT-

AGAGGCGGGA

NONCODING

GGGAC[A/G]CT

GGCCTGGAGA

GAGGCGGGAG

GGA

173-174

cg20452710

64

GACAGGGGAG

A

G

SILENT-

AGAGGCGGGA

NONCODING

GGGAC[A/G]CT

GGCCTGGAGA

GAGGCGGGAG

GGA

175-176

cg20454325

152

ACATCCCTGCA

A

G

SILENT-

CTGTCACCAGC

NONCODING

CCG[A/G]CCCC

TTGTACCATGG

CAGGGTTGGG

177-178

cg20454325

159

TGCACTGTCAC

G

A

SILENT-

CAGCCCGACC

NONCODING

CCTT[G/A]TACC

ATGGCAGGGTT

GGGCTGACTG

179-180

cg20595730

225

CTCCAAAGACT

A

G

SILENT-

TGATTCCAAGA

NONCODING

AAC[A/G]TCTGT

GAAATTCACTA

AGTTTAAGA

181-182

cg20595730

247

AACATCTGTGA

A

C

SILENT-

AATTCACTAAG

NONCODING

TTT[A/C]AGATA

TGAAGAGACAG

ACTAGTTAT

183-184

cg20610793

156

GGGCCGACCC

C

T

SILENT-

GAGCAGATGT

NONCODING

GTCGT[C/T]ATC

GAGGACTCCG

CTTTCGGATTG

C

185-186

cg20610793

165

CGAGCAGATGT

C

T

SILENT-

GTCGTCATCGA

NONCODING

GGA[C/T]TCCG

CTTTCGGATTG

CGTGCCGGAC

187-188

cg20610793

198

TCGGATTGCGT

C

T

SILENT-

GCCGGACGGG

NONCODING

CTGC[C/T]GGA

GCGTGGGTTCT

CACGGTCGGA

C

189-190

cg20610793

228

CGTGGGTTCTC

G

A

SILENT-

ACGGTCGGAC

NONCODING

GCAG[G/A]CTC

AAGGGCCAGG

GGGACATGTG

GG

191-192

cg20610793

237

TCACGGTCGG

C

A

SILENT-

ACGCAGGCTC

NONCODING

AAGGG[C/A]CA

GGGGGACATG

TGGGTTCCCG

GGC

193-194

cg20610793

267

GGGACATGTG

T

C

SILENT-

GGTTCCCGGG

NONCODING

CTGGA]T/C]GAT

GAGCGGGTGA

CCTTCTGGGAA

C

195-196

cg20610793

284

GGGCTGGATG

T

C

SILENT-

ATGAGCGGGT

NONCODING

GACCT[T/C]CTG

GGAACCCCATC

GATGAGGGCG

T

197-198

cg20610793

296

GAGCGGGTGA

A

G

SILENT-

CCTTCTGGGAA

NONCODING

CCCC[A/G]TCG

ATGAGGGCGT

GCGAGCTGAC

AC

199-200

cg20610793

43

CGGATTGGCCT

A

G

SILENT-

CACAAGGCTG

NONCODING

GCTG[A/G]AAC

TGTTCGACACC

GTCCTTGGGGT

201-202

cg20711459

261

CCCGACTGAA

A

G

SILENT-

GGCACGGATG

NONCODING

AGTTC[A/G]CC

GATCCCATATT

TGGAGTGGAG

AG

203-204

cg20723460

148

TTCCCCGGCG

C

T

SILENT-

AAGAAAAAGGC

NONCODING

GTCG[C/T]CCAT

TCCTCTTCCAA

AACGCTACAA

205-206

cg20723460

193

CTACAACAAAA

C

T

SILENT-

ACCACCACGCT

NONCODING

TCC[C/T]TTCCT

TCTTCCTTGCC

CCTTTCCCT

207-208

cg20724182

184

GATCATTGTAG

G

A

SILENT-

GCTATTTCAAA

NONCODING

ACC[G/A]CCAA

ACAAGCCATGA

ACGCAGCAAA

209-210

cg20724182

197

TATTTCAAAAC

T

C

SILENT-

CGCCAAACAAG

NONCODING

CCA[T/C]GAAC

GCAGCAAAACA

ATTCCACTGG

211-212

cg20724182

95

TGCTGGGGGC

T

A

SILENT-

GCTTCACAGAC

NONCODING

AACA[A/A]CAAA

TACGCTGTAGC

TGCCCAATAT

213-214

cg20724182

99

GGGGGCGCTT

A

G

SILENT-

CACAGACAACA

NONCODING

TCAA[A/G]TACG

CTGTAGCTGCC

CAATATTGGA

215-216

cg20724478

247

ACTATCTGGGA

C

T

SILENT-

GTTGGGGCCC

NONCODING

TGCA[C/T]GGC

ACTGGAACCAA

ACCTGAGGCT

G

217-218

cg20724478

250

ATCTGGGAGTT

C

T

SILENT-

GGGGCCCTGC

NONCODING

ACGG[C/T]ACT

GGAACCAAACC

TGAGGCTGGG

G

219-220

cg20724478

298

GGGAGCTCGG

C

T

SILENT-

CCTGGCTGGG

NONCODING

ATACG[C/T]GAT

GTCGTCAACGC

CAGCCCGTGG

C

221-222

cg20724478

90

GTTGCCGAAAT

C

T

SILENT-

TGGGGCCGAT

NONCODING

GGTG[C/T]CCA

TGTTGGGCAGT

CTGACATGCCG

223-224

cg20726641

101

TCAAGAAATTT

C

T

SILENT-

GCCATTCTTGA

NONCODING

CCA[C/T]GACCT

GACCGAGGATT

CTCACTCAG

225-226

cg20726641

119

TTGACCACGAC

T

A

SILENT-

CTGACCGAGG

NONCODING

ATTC[T/A]CACT

CAGTGACGAC

CAGTCTCAAGG

227-228

cg20726641

224

CTGACGAAAAC

T

C

SILENT-

GATCAACCGG

NONCODING

GCGC[T/C]TCA

AAGGGAAGCG

ACGCTTCATGA

C

229-230

cg20726641

280

CTGTCGTCGTC

C

T

SILENT-

GATATTCCACT

NONCODING

GCG[C/T]TGGT

CCGATATGGAT

GCGCAGGGAC

231-232

cg20726641

307

GGTCCGATATG

T

C

SILENT-

GATGCGCAGG

NONCODING

GACA[T/C]GTTA

ATAACGTTCGT

ATTAGCGAGC

233-234

cg20726641

310

CCGATATGGAT

T

C

SILENT-

GCGCAGGGAC

NONCODING

ATGT[T/C]AATA

ACGTTCGTATT

AGCGAGCTCG

235-236

cg20726641

316

TGGATGCGCA

C

T

SILENT-

GGGACATGTTA

NONCODING

ATAA[C/T]GTTC

GTATTAGCGAG

CTCGAACA

237-238

cg20728487

53

GGAAACTCATC

T

C

SILENT-

GGCAATATCGT

NONCODING

TGC[T/C]GCTTG

GGAGACTGGC

TTCATGCTGG

239-240

cg20730743

23

ACGCGTACTG

A

G

SILENT-

GCGGATCTCA

NONCODING

GT[A/G]CGATAA

CCCACCAGATT

GCCGGTGA

241-242

cg20730743

26

ACGCGTACTG

A

G

SILENT-

GCGGATCTCA

NONCODING

GTACG[A/G]TAA

CCCACCAGATT

GCCGGTGAAC

T

243-244

cg20730743

95

GCTGTCCGGC

A

G

SILENT-

CCCACCGGCG

NONCODING

AGTTT[A/G]TCG

AGCTGGGAGG

GATCGATTTTC

C

245-246

cg20744814

84

GGCACCCGGG

C

gap

SILENT-

TGCTGCTGGC

NONCODING

CATGG[C/gap]C

ACCCACGAAG

CTCTCCCTGCC

CCC

247-248

cg21147609

219

GTTCCATGCCT

C

T

SILENT-

TTCTAGACCCC

NONCODING

AGG[C/T]CCTTT

CCTGCATGATT

TTATCAGCA

249-250

cg21147791

282

AATAAAGTGTT

G

A

SILENT-

TCCTTGAGTCC

NONCODING

TGT[G/A]AGTTG

CTCTAGCAAAT

TTATCAATC

251-252

cg21148047

203

ATGGTGATTCC

A

G

SILENT-

TCAAGAAATTA

NONCODING

GAA[A/G]CAGA

ATTACCCTATG

ATCCAGCATT

253-254

cg21148203

236

TATGTTTGCTG

T

C

SILENT-

GGGGAGTGGG

NONCODING

TGGG[T/C]TGC

AGAACTTAAGA

CCAGGACAATT

255-256

cg21150589

135

CCTTTGAAATT

A

G

SILENT-

CGATTTCCTTC

NONCODING

CCC[A/G]GGTG

AAAGAGGAGAA

CAGATTCTAC

257-258

cg21395558

63

ATTGACAGAGT

G

T

SILENT-

GACATTTGGGC

NONCODING

AAC[G/T]CGTG

AAGGAAGTGG

GTGGAGGAGG

T

259-260

cg21395558

65

TGACAGAGTGA

G

T

SILENT-

CATTTGGGCAA

NONCODING

CGC[G/T]TGAA

GGAAGTGGGT

GGAGGAGGTG

G

261-262

cg21395558

72

GTGACATTTGG

A

G

SILENT-

GCAACGCGTG

NONCODING

AAGG[A/G]AGT

GGGTGGAGGA

GGTGGCAGCC

AG

263-264

cg21415668

138

AGGACTGGTCA

C

G

SILENT-

GGGAGGAGTT

NONCODING

AGGG[C/G]AGG

AGGACTGGTCA

GGGAGGAGTT

A

265-266

cg21417734

43

CAACGGGTTAC

C

G

SILENT-

CCCGGCGCAC

NONCODING

CTGG[C/G]TTT

GCCCGATCACA

GCGGCACGCA

T

267-268

cg21428517

142

CCATGCCCATC

A

G

SILENT-

CCGGTGCCGC

NONCODING

AGAA[A/G]AAG

ATTCCTCGATC

GGCTTTTCCGT

269-270

cg21428762

113

TGGTCGTGGTC

A

G

SILENT-

TCATCAGAGGT

NONCODING

GAA[A/G]ACGA

TGAGCGGGGT

GCTCGGACGC

A

271-272

cg21428762

149

GGGTGCTCGG

A

G

SILENT-

ACGCAGACGA

NONCODING

GCGAT[A/G]CG

ACGGGCGGTG

TCACCGGACTT

GG

273-274

cg21428762

65

ACACCGGGGT

G

A

SILENT-

AACGACGGCG

NONCODING

TGAGC[G/A]CC

CCAGACCCAG

GCGAGGGTCT

TGG

275-276

cg21429119

356

TTGGTAGGCCA

C

T

SILENT-

AGGCAGGACG

NONCODING

ACCA[C/T]TTGA

GCCTGGGAATT

TGAAACCAGC

277-278

cg21429119

393

AATTTGAAACC

A

G

SILENT-

AGCCTGGGCA

NONCODING

ACAT[A/G]GTGA

GTCTTTGTTTC

TACAAGAAAT

279-280

cg21429119

408

TGGGCAACATA

C

T

SILENT-

GTGAGTCTTTG

NONCODING

TTT[C/T]TACAA

GAAATTTAAAA

AAAAAATTA

281-282

cg21429803

373

TGCACCCGGC

C

T

SILENT-

GTGCCCTGAAA

NONCODING

CACA[C/T]GCG

TGTGCCCCGAA

ATACCTGCATT

283-284

cg21433543

205

GGTGGATCTG

C

T

SILENT-

GTCGGGATCG

NONCODING

GTGAC[C/T]ACT

CTGGTCATCGT

CGATTATGCGA

285-286

cg21433543

239

CATCGTCGATT

C

T

SILENT-

ATGCGACGAC

NONCODING

CTTC[C/T]TACC

ACTGAAGTTAT

GGCGTCGCTG

287-288

cg21433543

269

ACTGAAGTTAT

G

gap

SILENT-

GGCGTCGCTG

NONCODING

CGTA[G/gap]CC

GAGGCTGGGG

TAGCGCTCCTG

GG

289-290

cg21433543

293

AGCCGAGGCT

G

A

SILENT-

GGGGTAGCGC

NONCODING

TCCTG]G/A]GC

GGAATCGTCCT

GACGCGGCCG

CC

291-292

cg21435199

96

AAATTTAATAAAA

G

A

SILENT-

TAAATTATAAA

NONCODING

GA[G/A]CTCCT

CTTACCTAGAA

ATAATTATT

293-294

cg21637172

37

CGGTTGGCCA

C

T

SILENT-

AGCCTGGCACT

NONCODING

CAAA[C/T]GTCC

GCCTAACCTGG

GGTCTTTATT

295-296

cg21643872

325

GGTTGAGTGG

A

G

SILENT-

GACGCCTTCTA

NONCODING

CGAG[A/G]AGC

ACCCTGAGCTT

GACCTGGAAA

G

297-298

cg21657573

102

AAAAAGGTTAA

G

C

SILENT-

AGATCAGACAG

NONCODING

ACA[G/C]CTGA

CCTTACTGCCC

TCAATGGCCA

299-300

cg21657879

270

CCAGGGAAAG

C

G

SILENT-

GCAGTCCCCCT

NONCODING

CCCC[C/G]ACA

GCAGTCACGAA

CCTCAGAAGCC

301-302

cg21659205

482

CCAAACAATCC

G

A

SILENT-

AGCTTGCTCCC

NONCODING

CTC[G/A]ACCA

CTCAGAACAAA

CGCCCTAAGT

303-304

cg21660634

198

CCACGTGACG

T

C

SILENT-

ACCGGAACATC

NONCODING

ACTG[T/C]GAC

GCTTCACTCGG

GCAACCGGTC

G

305-306

cg21660634

92

GCCACGGCTC

C

T

SILENT-

GGTGAATCCGA

NONCODING

CTCG[C/T]GGG

GCCAACACAAC

GGCCTCACCC

A

307-308

cg21661814

164

CGCCGAAATC

C

G

SILENT-

GGTGACGATG

NONCODING

GCCTT[C/G]GC

GTGGCCAATGT

GGAGGTAGCC

GT

309-310

cg21661814

239

GGACGCGCCC

T

G

SILENT-

GCCGTAGGTG

NONCODING

TCCTG[T/G]TG

GATGTCCGCG

CGMCAACCTG

AT

311-312

cg21661814

29

CGTGGACAAC

G

A

SILENT-

GTGGGCCGGG

NONCODING

GAGTA[G/A]CC

TAACCACTCAA

TGTCTGCAATG

A

313-314

c921661814

52

TAGCCTAACCA

T

C

SILENT-

CTCAATGTCTG

NONCODING

CAA[T/C]GATCG

ACTCGACATAC

TCGGTTTCC

315-316

cg21661814

98

TTTCCTCGGTG

A

G

SILENT-

CCTGGATTAGT

NONCODING

ATC[A/G]TCAAG

TCTCAGGTTGC

AGGTGCCGC

317-318

cg24113982

145

GCTCGGCTGC

C

T

SILENT-

TGCAGAAGTCT

NONCODING

CCTT[C/T]CCTC

CTTTGTGGCTG

GTATATAGAA

319-320

cg25268133

209

GGCCGTCATC

G

A

SILENT-

GCGGTCACGA

NONCODING

CTCCC[G/A]TG

ATCACCATGAT

CGTGGGCATG

AC

321-322

cg25309388

332

TGAGAGGGTAA

C

T

SILENT-

AGTGCCAGTCT

NONCODING

GTG[C/T]TAAAA

GAACGTGAAAA

GGAAACCTA

323-324

cg25339094

215

TTGCCCAGACC

T

C

SILENT-

AATGCGATGG

NONCODING

GTCG[T/C]CTC

CGCCACCATC

GAGAAACGAG

AA

325-326

cg25339094

345

TGCGAGCCTG

C

A

SILENT-

CACACCAACAA

NONCODING

CCCC[C/A]AGA

TCGGCGAGTC

GACCTCTCATC

G

327-328

cg25339094

351

CCTGCACACCA

G

A

SILENT-

ACAACCCCCAG

NONCODING

ATC[G/A]GCGA

GTCGACCTCTC

ATCGTGCCAG

329-330

cg27778388

118

GTCCCAGGGT

A

G

SILENT-

GACGCGAGGT

NONCODING

TGGGG[A/G]CT

GAGCAACCAG

GAATAGACCTT

CA

331-332

cg27802892

239

ATCTAACCGGT

A

G

SILENT-

TCTAGACAGCT

NONCODING

TAA[A/G]CAAAC

AGATACAGTGC

CCTTTTCTC

333-334

cg27802892

241

CTAACCGGTTC

A

C

SILENT-

TAGACAGCTTA

NONCODING

AAC[A/C]AACA

GATACAGTGCC

CTTTTCTCAG

335-336

cg27805688

354

AATCCCGTTGC

A

G

SILENT-

TGTCGTGATGT

NONCODING

GAA[A/G]CCAG

CACCAGTTCTG

CTGGCCACGC

337-338

cg27825173

254

GGCCACCGCG

C

T

SILENT-

GGCACCGCAC

NONCODING

GGACA[C/T]CC

CGACACACGA

GCACCCACAC

CCC

339-340

cg27827050

71

CCGATGGCAA

G

A

SILENT-

GTGGGACAGC

NONCODING

CTGGA[G/A]GG

CTTGCTCACCT

GCGAGCCCGG

CC

341-342

cg27828294

111

GTACAAAAACT

T

C

SILENT-

AGTAGATGTGT

NONCODING

GAA[T/C]GCAAT

AAAAGTGCTCA

GAAACACAC

343-344

cg27845127

25

ACGCGTCCTGA

A

G

SILENT-

AGCCGCCGAC

NONCODING

GCG[A/G]CGAG

AACAGCAGGC

CAGCAGCTCG

A

345-346

cg27845127

78

TCAGTGGCAGA

T

C

SILENT-

TAGCCAGCGG

NONCODING

CGAC[T/C]GAG

CGTGCGCCAT

GATGCCGCGA

CT

347-348

cg27845127

95

GCGGCGACTG

G

A

SILENT-

AGCGTGCGCC

NONCODING

ATGAT[G/A]CC

GCGACTGACA

CCACCTGCGG

TCC

349-350

cg27922064

325

CAAAAATGCTC

A

G

SILENT-

ATTTAGTTTTCCT

NONCODING

CA[A/G]CACCC

CCAGACTGACC

TTCAAAACT

351-352

cg27928117

11

GCTAGCAGCT

C

G

SILENT-

[C/G]TGGCCCTG

NONCODING

CAGCTGAGCA

CAGGCCA

353-354

cg27928117

160

TATTCAGTAGG

C

T

SILENT-

GAAAAGGGCA

NONCODING

AGGA[C/T]CTG

AAAAAAGTGTA

TTAAGAATCGT

355-356

cg27929704

206

GTCACTGGGC

G

gap

SILENT-

CTGATGCCACC

NONCODING

GGAG[G/gap]CT

GAGCTACTGG

GCACCTTCGG

CCA

357-358

cg27956615

73

TTCTCCATGCT

A

G

SILENT-

CCTAGATGGAA

NONCODING

AAC[A/G]CAGT

CATTCTGATCA

CTTTCTCTCT

359-360

cg27957329

105

GGAGCTATGGT

C

T

SILENT-

TTTCGCCAAGT

NONCODING

CAA[C/T]TCACT

GATTGTGGGAC

GGGTGGTGG

361-362

cg27962799

121

TGCTCCTCCCG

C

T

SILENT-

CGTGCTTCCGC

NONCODING

CGC[C/T]GGTG

GCTTGGACCC

GTCGGGGCTG

G

363-364

cg28315794

188

GGTTTAGGAAT

C

T

SILENT-

GCTAGCTTTTG

NONCODING

AAA[C/T]TTCAT

TCAAAATGTCT

TTGAAGCCA

365-366

cg28389525

109

TCGTGTTAGAA

A

G

SILENT-

AACTTTCGACC

NONCODING

TGG[A/G]GTCA

CGAAGCGTTTG

GGAGTGGATG

367-368

cg28389525

145

GTTTGGGAGTG

A

G

SILENT-

GATGCGGAAA

NONCODING

GTGT[A/G]CATA

AAACCAATCCG

CGAATAATAT

369-370

cg28389525

187

GAATAATATAC

T

C

SILENT-

GCCAGCATTTC

NONCODING

GGG[T/C]TTCG

GTCAAGAGGG

GCCGTTCCGAA

371-372

cg28389525

91

TCGTTGAGCAT

A

G

SILENT-

GCAGACGTCG

NONCODING

TGTT[A/G]GAAA

ACTTTCGACCT

GGAGTCACGA

373-374

cg28397602

82

CCTCTCTGCAC

A

G

SILENT-

GGCTGTGTGTG

NONCODING

TGC[A/G]TGTC

CATGCCTGTCC

AGGTCAGGAC

375-376

cg28459036

106

GCGACGAGGG

C

T

SILENT-

TACGACCGTCG

NONCODING

GTAG[C/T]CGT

GTAGATCATAC

GTCGGGGCCG

G

377-378

cg28459036

142

ATACGTCGGG

A

G

SILENT-

GCCGGGTGAC

NONCODING

GCGCC[A/G]GA

GGGCTTGCTGT

TCGGTGGCGG

TC

379-380

cg28459036

266

ATCCCGATCCA

G

A

SILENT-

AATCCAGCTAG

NONCODING

ACC[G/A]ACCAT

AATCGTCAATG

CGATCACCA

381-382

cg28459036

80

AGGCGGCCAA

C

G

SILENT-

GTCAGCGCAG

NONCODING

GAGGC[C/G]GC

GACGAGGGTA

CGACCGTCGG

TAG

383-384

cg28473092

278

CACCCTCGAG

G

A

SILENT-

CATCGTCACCT

NONCODING

CGAT[G/A]CTAA

TTAGAGCCATG

TGCCGATGAG

385-386

cg28473092

376

GGGGTGCGCC

A

G

SILENT-

ATACCAACTCC

NONCODING

CGAC[A/G]CAG

GACACCCTCG

CGGAAGTCGAT

C

387-388

cg28486260

209

TATTATTTGCTA

A

G

SILENT-

TTACCCAAGCT

NONCODING

GT[A/G]GGGGC

TGTCCATTTTTA

TGCGAAGT

389-390

cg29195033

109

GTGCAGGCTAA

T

C

SILENT-

TCCACGACATG

NONCODING

TAT[T/C]GACTT

CCGTCGCGGA

TCTTGCCGCC

391-392

cg29195033

333

CAAGCCATTCA

A

G

SILENT-

TCGCCGTGCG

NONCODING

GACC[A/G]TAG

TAACCGACCGC

CGAACCATTGA

393-394

cg29195033

339

ATTCATCGCCG

A

G

SILENT-

TGCGGACCATA

NONCODING

GTA[A/G]CCGA

CCGCCGAACC

ATTGAGGAAGA

395-396

cg29195033

348

CGTGCGGACC

C

T

SILENT-

ATAGTAACCGA

NONCODING

CCGC[C/T]GAA

CCATTGAGGAA

GATCCTGCAGC

397-398

cg29195033

368

ACCGCCGAAC

T

G

SILENT-

CATTGAGGAAG

NONCODING

ATCC[T/G]GCA

GCGCGGCGAG

GATGCTAAGGC

G

399-400

cg29195033

44

ACGAGGTTCAC

T

C

SILENT-

CGGCCCGCTC

NONCODING

ATAG[T/C]GTCG

TCAGTCAGAAT

CTTCATCATT

401-402

cg29195033

72

CGTCAGTCAGA

C

T

SILENT-

ATCTTCATCATT

NONCODING

GC[C/T]GATAC

GTGATCGTGCA

GGCTAATCC

403-404

cg29204207

203

GACACTCCCCT

G

T

SILENT-

14

CGACGCAGCC

NONCODING

(14q22)

TCCG[G/T]AGC

GGCGCGCACT

CTCCAGAGGC

CA

405-406

cg29207528

213

CGGGCCCGCC

C

T

SILENT-

CTAGCCCTCCT

NONCODING

CGAT[C/T]CAG

CGTGGGGACG

CCAGATCCACG

T

407-408

cg29207528

245

GGGGACGCCA

G

A

SILENT-

GATCCACGTG

NONCODING

GAGAC[G/A]AC

AGGGTGCCCC

AGCGCCGTGG

TCT

409-410

cg29207528

254

AGATCCACGTG

C

T

SILENT-

GAGACGACAG

NONCODING

GGTG[C/T]CCC

AGCGCCGTGG

TCTGGAATCCA

C

411-412

cg29207528

260

ACGTGGAGAC

C

T

SILENT-

GACAGGGTGC

NONCODING

CCCAG[C/T]GC

CGTGGTCTGG

AATCCACGCTC

CT

413-414

cg29207528

413

GCGACCTCAC

C

T

SILENT-

CATGTCCACAC

NONCODING

GGAT[C/T]AGC

GTCGAAACGTT

GTGATCGCTGC

415-416

cg29214234

79

GACGCGTACCT

C

T

SILENT-

GCCATCAGGAT

NONCODING

CCT[C/T]GTTTG

TTTCTGAAGCA

ACCCCCTTC

417-418

cg29216983

91

CCGTTGGGCC

T

C

SILENT-

ATACCCGTCTC

NONCODING

GTGA[T/C]CGA

GGAAGGCTCA

ACGGAATGCAT

T

419-420

cg29234854

193

GACGCTGTGC

A

G

SILENT-

CGTGGGATTTC

NONCODING

CTCA[A/G]CGA

GGCTCAAGAG

AGGCACGCGT

CG

421-422

cg29235319

75

CACACACACAC

G

A

SILENT-

ACACACACACA

NONCODING

CAC[G/A]CACG

CACGCACGCA

CGCACGCAAT

G

423-424

cg29242513

250

TAACGGTTGAG

A

C

SILENT-

TAACACATCAA

NONCODING

AAC[A/C]CCGTT

CGAGGTCAAG

CCTGGCGTGT

425-426

cg29254804

134

TACTTCATTTTT

A

G

SILENT-

TTTCCTATTTG

NONCODING

CA[A/G]CAACCT

GTAATGAGTAA

CTGTATTA

427-428

cg29345947

298

TAGTGACAGGC

A

G

SILENT-

GCAATGCACAC

NONCODING

CGA[A/G]CGGG

CGCCAACAGA

GCAGCCACGC

T

429-430

cg29352964

401

CGTCTTGCCCA

C

T

SILENT-

TATTGACGCCC

NONCODING

CGA[C/T]GCTG

CTGTCGGTGTG

GGGGAGTGAC

431-432

cg29357657

392

CATGTGAGGA

C

T

SILENT-

GGCCGGCCAT

NONCODING

GAGGT[C/T]GT

CGTGCTGGAC

GACCTATCCGC

GG

433-434

cg29360558

158

CGTTTATTTAT

T

C

SILENT-

GCTTTTGGTTG

NONCODING

GTT[T/C]TCCTT

TGATAAATGCG

GCCCTTGCT

435-436

cg29522548

137

CCATAATCATT

A

G

SILENT-

TCCAACTCTTT

NONCODING

CAA[A/G]GTTTT

TTTAAATTTCAG

CTCAAAAT

437-438

cg30177683

471

GGCTCTATATC

C

T

SILENT-

ATTGAAAACGA

NONCODING

ATT[C/T]TCCAC

GCAAAACCCAC

TTCACACCA

439-440

cg30377599

144

TGAGGCATCTA

G

T

SILENT-

AATTTTCACAT

NONCODING

CCT[G/T]CCTGT

GGAGCAGCAA

GCTGAAGAAA

441-442

cg30790712

273

CAGCACCGTG

T

C

SILENT-

GGGTCCAGGG

NONCODING

TCCAC[T/C]GTC

CACCAGGACCT

ACTGCGTGGG

G

443-444

cg32119538

286

TTCCCTAAATC

A

C

SILENT-

CAAAGCGAGC

NONCODING

AAAC[A/C]GGA

GAGAGAAACC

CTGAAAATGGG

C

445-446

cg32120712

608

CTCAACAGGTG

G

gap

SILENT-

GTGCACTGGG

NONCODING

ACCG[G/gap]CA

GCCGCCCGGG

GTCCCGCACG

ACC

447-448

cg32128189

81

TCAGCTTTATT

C

T

SILENT-

ATAATCTTATG

NONCODING

GGA[C/T]CATCA

TCATGTATGTG

GTCCACTGG

449-450

cg32153241

106

CGGTGCGGCA

C

T

SILENT-

CCATTCCACCG

NONCODING

CGAT[C/T]GAC

CCGGCTCCGG

TCCCGAGGTC

CC

451-452

cg32153241

121

CCACCGCGAT

C

A

SILENT-

CGACCCGGCT

NONCODING

CCGGT[C/A]CC

GAGGTCCCAC

AGCAGTTGACC

AG

453-454

cg32153241

175

TGGGCCGCAG

A

G

SILENT-

GGCTGCCAGC

NONCODING

GCGAC[A/G]GC

TCGTACCGCGT

GCTTGGTGATA

A

455-456

cg32153241

348

AACGCGGGGT

T

C

SILENT-

GGGGGAGCCG

NONCODING

AAGCC[T/C]GT

GTGACACAATC

AAGGGGACTC

GC

457-458

cg32153241

44

GGTACGTGCG

A

G

SILENT-

TTGGCGGCCC

NONCODING

GCTCA[A/G]CC

TTGCGTTCTAG

CCCGATCGCC

CG

459-460

cg32153241

55

TGGCGGCCCG

T

C

SILENT-

CTCAACCTTGC

NONCODING

GTTC[T/C]AGCC

CGATCGCCCG

ACAGGTCGGG

T

461-462

cg32153241

76

GTTCTAGCCCG

C

T

SILENT-

ATCGCCCGACA

NONCODING

GGT[C/T]GGGT

CGGTGCGGCA

CCATTCCACCG

463-464

cg32153241

82

GCCCGATCGC

G

A

SILENT-

CCGACAGGTC

NONCODING

GGGTC[G/A]GT

GCGGCACCATT

CCACCGCGAT

CG

465-466

cg32168828

367

TAGGGCAACG

T

C

SILENT-

CCAAGTTCGAA

NONCODING

GACG[T/C]CCC

CCTGTGCTTTT

CCGCCTCACTC

467-468

cg32177197

596

TGACGAGAGTC

G

A

SILENT-

CCCTGGGACG

NONCODING

AGGG[G/A]AAG

GAATGGAAAGC

GGTGGGGTCG

T

469-470

cg32177197

716

ATGTACCGTCC

A

G

SILENT-

GTCCCCTTCCA

NONCODING

CAT[A/G]ATCTG

GAGGCCGTAC

CAATCGGGTG

471-472

cg32180618

404

TGTGTGTGAAA

C

T

SILENT-

ATCAGCACGGT

NONCODING

GCG[C/T]GTGA

GGGGCGGGCG

CGCTTCTCACA

473-474

cg33206207

74

CAGGGATGAC

G

T

SILENT-

GCCGCCATGA

NONCODING

GTTGG[G/T]TG

ACGTGGGCCT

GCCGACTGTCT

CC

475-476

cg34715517

291

TGATGCCTCAG

A

gap

SILENT-

CAAAAAATTGT

NONCODING

GCT[A/gap]AAA

AAAAAAACTGG

AAGAAAAGTAC

477-478

cg35050153

456

CGACTTGTTCA

C

T

SILENT-

GCACCAGGAG

NONCODING

GAGG[C/T]GCT

GGCTGCTGTCA

CTGGGGCTCT

G

479-480

cg35066497

449

TGTCGATCTGA

G

A

SILENT-

TCGGAGAACTT

NONCODING

GCC[G/A]CCGG

TCTTGTCGTCG

ACAGTGTTGC

481-482

cg35066497

469

TTGCCGCCGG

T

C

SILENT-

TCTTGTCGTCG

NONCODING

ACAG[T/C]GTTG

CCAGCCTTGTC

GATGCCCTGC

483-484

cg35068462

289

GAGGACTCGG

C

T

SILENT-

CCTGACGACG

NONCODING

GTCAC[C/T]GTC

ATTCATGACCT

CGACTTGGCTG

485-486

cg35341776

413

TCAATTGCTTTT

A

G

SILENT-

TGTCCGACATC

NONCODING

TC[A/G]GACACT

CTCTTCACCAT

GACTCAGT

487-488

cg36508718

517

TGGAGCCCGG

C

T

SILENT-

CGACAACCTTG

NONCODING

ACAT[C/T]ACCG

TGCATAGCGCC

CTCAACGATG

489-490

cg36517624

234

GAACTGGCCC

G

A

SILENT-

AGCCAAACTCT

NONCODING

TCAA[G/A]CTGC

TGCCTAAAGCC

TGGGTTGGGG

491-492

cg36517624

307

TGATGGCTTCA

G

A

SILENT-

AGCACGTCCC

NONCODING

GCCA[G/A]CCT

AGCCCCGTCA

CAGTCATCACA

T

493-494

cg36517624

453

CATTCTTTGAA

G

A

SILENT-

GTGCTTTTTGA

NONCODING

TGG[G/A]TACCT

CAGGGGTATCA

GCGACCGGG

495-496

cg36517624

475

TGGGTACCTCA

C

T

SILENT-

GGGGTATCAG

NONCODING

CGAC[C/T]GGG

ATGCGAAGGTA

GGTGATATCCT

497-498

cg36618790

360

GGCGAAGCAC

C

T

SILENT-

CTGAGGTCAG

NONCODING

GAGTT[C/T]GA

GACCAGCCTG

GCCAACATGAC

AA

499-500

cg37003369

306

TCAGTTGATTT

gap

A

SILENT-

AAGGAATAAAA

NONCODING

AAA[gap/A]GAC

CATTTTGCTAA

ACACTATTAAA

501-502

cg37003369

404

TGTGACCTGTG

G

A

SILENT-

TTCATAGCTAA

NONCODING

CAT[G/A]AGCTC

TGACCTCCCTA

CGCCGGGCG

503-504

cg37026709

344

GGCCGTGTGC

C

G

SILENT-

TGGTACCAGG

NONCODING

GATAC[C/G]CA

GGAGCTCAGC

AGATTTTGGCC

TC

505-506

cg38206730

335

AGGCGTACAC

G

A

SILENT-

GTGCAGGTGT

NONCODING

GTTAC[G/A]TGT

TCATTTTCGGC

TCAAGGCGTAC

507-508

cg38278821

212

CGACGGTACT

A

G

SILENT-

GGTTGCTCCTC

NONCODING

GTAT[A/G]GAAA

GCGGTGAATG

CGAATGCAATG

509-510

cg38403377

1024

CGACGATCTCC

A

G

SILENT-

CCGCGGTGTC

NONCODING

GTCC[A/G]TAG

GCGATACCGC

GAGCATTGACG

A

511-512

cg38403377

1084

TCCCCGAACC

A

G

SILENT-

GTTGACGCCG

NONCODING

AGAAG[A/G]AC

ATTGTTGTTGT

AAATCTTCTTG

A

513-514

cg38403377

1171

TCGTTGCGTGC

A

G

SILENT-

CGTGAGCGAT

NONCODING

CCGG[A/G]CGT

TGCACCGGGT

CATCCTGCGGT

G

515-516

cg38403377

934

CCTCGGTGTGA

C

T

SILENT-

GTGGCGTTCGT

NONCODING

TAG[C/T]AGTGA

TGCGATGGCG

GTCGTGCGAT

517-518

cg38446357

292

TTTCCTCGCTG

C

T

SILENT-

GGTATCTCCGC

NONCODING

GAC[C/T]CCTC

GGGCCCGTAG

ACCGTCCTCGA

519-520

cg38446677

312

CACCTCCCATC

G

C

SILENT-

GGAATGACGTA

NONCODING

CGT[G/C]GTCA

GAGACGATCC

GACCGTCGTG

C

521-522

cg38446677

439

TCGTCGCCGAA

G

C

SILENT-

AGCAGAATGG

NONCODING

CCAT[G/C]ACTT

CGACGGCGGT

GGCCGAGTCG

A

523-524

cg38446677

447

GAAAGCAGAAT

C

T

SILENT-

GGCCATGACTT

NONCODING

CGA[C/T]GGCG

GTGGCCGAGT

CGAGGGCTCC

G

525-526

cg38869031

345

ACATTGATTGT

gap

T

SILENT-

TCACATTTTTTT

NONCODING

TT[gap/T]CTCTT

CTCAATTTCCC

TTGATTATA

527-528

cg38925867

196

GCAACGGAGA

T

C

SILENT-

TGACTCACAAG

NONCODING

CTCG[T/C]TACG

AGGCGGTAAG

GCTCACCCGC

G

529-530

cg38925867

306

AAGGCCAGTA

G

C

SILENT-

GCACTCGTGCA

NONCODING

GTTG[G/C]GAC

GATGAGACGTT

GGCCTCGCGG

C

531-532

cg39373569

271

AACATCTTGAA

G

A

SILENT-

AATACACAAGT

NONCODING

GGT[G/A]CAAA

GATGTGTCACG

TTCTGGACCT

533-534

cg39380084

120

CCTCGTTGCCT

T

C

SILENT-

TATCTCCAGAT

NONCODING

TCC[T/C]CAATT

TCTGTGAAACG

TAAACATTA

535-536

cg39380084

130

TTATCTCCAGA

T

G

SILENT-

TTCCTCAATTT

NONCODING

CTG[T/G]GAAA

CGTAAACATTA

TGGGAATAGT

537-538

cg39402442

172

TTCTTCTCTGC

T

C

SILENT-

CATTCCTGGAG

NONCODING

ATT[T/C]TGAAA

AGAGTTGGTAA

TGTGTTTCA

539-540

cg39404391

113

TTAGCCCAACA

A

G

SILENT-

GCCTGGCACA

NONCODING

AAGG[A/G]CAA

CAATGAGGAGA

GGAAAGGGGA

G

541-542

cg39404391

48

GATTCCTACAC

C

T

SILENT-

TATCCCCAAAA

NONCODING

TGG[C/T]AGAG

CTGGGCTCTCC

CTGCAGTGGC

543-544

cg39435025

186

AAACTTTAACG

A

T

SILENT-

TGTTATATCATT

NONCODING

CA[A/T]GGCGT

AACTTATACGC

GCGGGGTAC

545-546

cg39485034

346

CCGCCCAGGA

A

G

SILENT-

ATTCGTGAGTT

NONCODING

TCCA[A/G]GTTG

CTGAGCCATTG

CCCGGATTCC

547-548

cg39485034

400

AGTACAGGCG

A

G

SILENT-

ATCGCTGCCAC

NONCODING

CGTT[A/G]GAC

CATGGGAGCA

GGTAGGACCC

GC

549-550

cg39485034

406

GGCGATCGCT

T

G

SILENT-

GCCACCGTTAG

NONCODING

ACCAT[T/G]GGG

AGCAGGTAGG

ACCCGCCTTCC

T

551-552

cg39485034

508

GACCCGAGAT

A

G

SILENT-

GGACTTCTTCG

NONCODING

ACGA[A/G]GTT

GCCACGGCTC

CGTTAGCGAAA

G

553-554

cg39485034

519

GACTTCTTCGA

C

T

SILENT-

CGAAGTTGCCA

NONCODING

CGG[C/T]TCCG

TTAGCGAAAGC

GATACGGCCG

555-556

cg39515553

205

CGCACTAAACC

G

gap

SILENT-

TTAAAAATGCG

NONCODING

AGA[G/gap]CGC

ATGCACGGCG

GACGTCGTGG

AA

557-558

cg39515553

88

TGTGTCCGGA

A

G

SILENT-

GAAACCTCGTG

NONCODING

CGGA[A/G]AAC

AGCGCAAACC

GCAAAAACCCC

G

559-560

cg39516001

168

CGAAATGGAAA

A

G

SILENT-

ATCGTCAATGA

NONCODING

AGA[A/G]CCGA

AATACGATGCT

AAAGTTATTC

561-562

cg39517070

164

GCCGCCCAAA

C

T

SILENT-

CAACGCTCACC

NONCODING

GTTA[C/T]ACGC

CGCAATCGAG

GCGTCTCAACC

563-564

cg39517070

293

TGTTTGCGCTC

C

T

SILENT-

CACAGGGGAC

NONCODING

CGCA[C/T]CGC

CACTCCATCCA

CAGCGCAAACA

565-566

cg39517875

406

GCTGACGGAT

A

G

SILENT-

GAACGTCTCG

NONCODING

GACAC[A/G]GC

GGACACCTCA

GTGAATCTCCC

TT

567-568

cg39517875

426

GACACAGCGG

T

G

SILENT-

ACACCTCAGTG

NONCODING

AATC[T/G]CCCT

TTGAGTAGATA

CTGGACGAGA

569-570

cg39523703

182

GCACAGGAGA

C

T

SILENT-

GCGGCCGAGA

NONCODING

CCGGG[C/T]GC

AGCCCCTCCG

ACATCATGCGC

AC

571-572

cg39524728

343

ATATGCATATG

C

T

SILENT-

TATGCACTCAT

NONCODING

ACA[C/T]TCATA

CATATGTGCCC

CCTCAGAGA

573-574

cg39527111

101

GGCGACATGG

C

T

SILENT-

GCGTCCCCAC

NONCODING

GGCCC[C/T]GG

AGGCCGGCAG

CTGGCGCTGG

GGA

575-576

cg39530245

307

GACATTCTCAT

T

A

SILENT-

TCAGAGCGGA

NONCODING

GAAA[T/A]TTCA

GCAGATTTCAA

GGAGCCGCCA

577-578

cg39530249

729

ACAAGCCTTCA

C

gap

SILENT-

CTTTCTTTTCTT

NONCODING

TT[C/gap]TTTTT

TTTTTATCTAAC

AACTGAAG

579-580

cg39535150

137

CGGCTCTCCTG

A

G

SILENT-

GATGTGCCCC

NONCODING

CGCA[A/G]CAA

TGCCAGGTAAG

CCTTGGTCACG

581-582

cg39536028

735

CTGGCACACA

G

A

SILENT-

GTATCCCAACC

NONCODING

ATGG[G/A]TTTA

GTGTCCACCAG

ACTTAAAGGA

583-584

cg39543172

580

ATGTGTCTCCC

A

T

SILENT-

ACACTGGCCG

NONCODING

CTGC[A/T]CAAG

CTGAGAAGCTG

GGACGGCCCG

585-586

cg39545648

532

GCAATTTTACT

C

T

SILENT-

CTACAGCTGAG

NONCODING

ACA[C/T]TGCCA

AAGAGTCCAGA

ATTGTGAAG

587-588

cg39547799

846

AAAGTTGGAGA

C

T

SILENT-

AACAGAAACCA

NONCODING

AGG[C/T]GAGG

TGGTCCTTGGT

TAAGTCTGCA

589-590

cg39550340

579

TTTTTTTAAATA

A

G

SILENT-

ACATCGTTGAT

NONCODING

TA[A/G]AACAAT

CCTATTCACTG

CAGTCACA

591-592

cg39568672

164

GCTGGGTGGC

C

T

SILENT-

TGACCAGCGCT

NONCODING

TTGG[C/T]CAGT

CAGGGGTTCG

GTGGAATGTTC

593-594

cg39568672

248

TACCACGGCAT

C

T

SILENT-

CATGGTCGCTT

NONCODING

TCG[C/T]GCTC

GTTGGGTACG

GATGGCTTGC

G

595-596

cg39568672

398

TCGGGTATCAG

A

G

SILENT-

GCCGTTGACAT

NONCODING

GGC[A/G]CCCG

CTTGTTATCGA

TTCTCTCATC

597-598

cg39570661

415

ACGAGGGGAG

G

A

SILENT-

CAAGCACGAG

NONCODING

CCGGG[G/A]AG

AGAGCTCTGC

GCTCGCACAC

GGG

599-600

cg39575840

320

CCCATGGTCCT

C

T

SILENT-

CCCCATGTAAA

NONCODING

GAG[C/T]TCTG

GCCAATCAACA

AGGAGTGGAC

601-602

cg39575840

361

AGGAGTGGAC

C

G

SILENT-

AGCTCATACAA

NONCODING

GGAC[C/G]ACC

AAGTGGCCAAC

AAACATAAAGC

603-604

cg39602316

178

TAGCAGGAGG

A

G

SILENT-

AAGCTGATGAA

NONCODING

TTGA[A/G]GTCC

GATATAGGCAG

TTTGTGCTCC

605-606

cg39704218

348

CAAACTCCTGG

C

T

SILENT-

GCTCAAGCGAT

NONCODING

CCT[C/T]CAACC

CCGGCCTCCC

AAAGTGCTGG

607-608

cg41085637

479

AAGACGGCCAT

A

G

SILENT-

GAGGAGGCGA

NONCODING

TGGA[A/G]ACG

GAGGCCAGCA

CATCAGGGGA

GG

609-610

cg41591473

190

AAATTGACATT

G

A

SILENT-

TAAGTGGACCT

NONCODING

GCC[G/A]TATTT

GTATTTGCTAA

ATCTGGCCA

611-612

cg41592212

65

ACGCGTGAGC

G

T

SILENT-

CACCATGCCC

NONCODING

GACGT[G/T]AA

GACAGGAATTT

ATACCCATGGA

G

613-614

cg41618657

555

CAATTGGCTGT

C

T

SILENT-

CCTATTTACAC

NONCODING

TTA[C/T]GTGTC

ATGTTAAAATA

ATCATTTCT

615-616

cg42267484

619

GAACATCCCAT

A

G

SILENT-

GCAAAAGACTT

NONCODING

TTC[A/G]AAGG

GAAGGGCCTG

GTTTGAGAATG

617-618

cg42312996

675

TCACCTCCTTC

T

G

SILENT-

CCTTTATTCTA

NONCODING

CCG[T/G]CCCA

AGGGCCTGAG

ATTGGGCGACT

619-620

cg42322469

385

TCCTAAAACCA

A

T

SILENT-

TTAGTATCTAC

NONCODING

TAA[A/T]TTGAC

GCTGAAATTTT

GTATTTTTG

621-622

cg42327033

94

GGCTTCTCAGG

G

T

SILENT-

GGTCAGGTGC

NONCODING

ATTT[G/T]GGCA

GATGCGCTTGA

GTGGGGGGGC

623-624

cg42462046

142

GGCCACAGCG

C

T

SILENT-

CCCTGCCCCA

NONCODING

GAGAA[C/T]GG

CGGGTGGGCT

GGGTCCGGCT

GCG

625-626

cg42462046

98

CCTGCCGCGC

C

T

SILENT-

GCGGCGGGGC

NONCODING

TCCTC[C/T]TCG

CTGTGGGAAAA

GTGGGGCCAC

A

627-628

cg42468895

296

GCCTGGGCAA

G

gap

SILENT-

CAGAGCAGCA

NONCODING

AGACT[G/gap]T

CTTTACACTCG

GGGTGAGTAG

TCG

629-630

cg42518152

110

GACACACATGC

C

G

SILENT-

ACACGGTTTCA

NONCODING

CCA[C/G]CACG

GCTTCTCTCCA

GCCTTCTCTT

631-632

cg42534385

391

GCCGCCTACC

G

gap

SILENT-

ACAAGTCGGTG

NONCODING

TGGC[G/gap]GG

GGGTGGAGGC

CAAGCTGCACC

TG

633-634

cg42534385

396

TACCACAAGTC

gap

G

SILENT-

GGTGTGGCGG

NONCODING

GGGG[gap/G]TG

GAGGCCAAGC

TGCACCTGCG

CCG

635-636

cg42560726

616

CGCCTGTAATC

A

G

SILENT-

TCAGCACTCTG

NONCODING

GGA[A/G]GTCA

AGGCAGGTGG

ATCACTTGAGC

637-638

cg42656733

325

GGGACCACGA

T

C

SILENT-

TGGACTGAGC

NONCODING

CAGCT[T/C]TGC

CCGCCCGCCC

CCGCGCCCAG

GG

639-640

cg42658258

510

GGCCTGCAGA

C

gap

SILENT-

CTCCGGGCCC

NONCODING

AGGGC[C/gap]A

CCGGCCTCTC

CTACCTGCTCC

TGC

641-642

cg42673467

656

ATAGACCAACA

C

T

SILENT-

ATCATGTATCC

NONCODING

TGC[C/T]ACTTG

GGATGCCAGC

ACCCATGCCA

643-644

cg42691712

697

CCGAGCAGTG

C

A

SILENT-

GCCGCGTGCA

NONCODING

GGAGT[C/A]CA

GAGTGGAGCC

GTGACTCACAA

TT

645-646

cg42705180

89

CTTTGGCAAAT

G

T

SILENT-

TGGGGACTGA

NONCODING

AGAC[G/T]GGA

AGGGTGGAGA

GTAGGCGGAA

CC

647-648

cg42705180

96

AAATTGGGGAC

G

A

SILENT-

TGAAGACGGG

NONCODING

AAGG[G/A]TGG

AGAGTAGGCG

GAACCAGGTG

GT

649-650

cg42718789

107

GAATTCAACAA

A

G

SILENT-

CACTATAGAGT

NONCODING

CAA[A/G]AGGA

AACGAGTCGA

GTGAAACCAGT

651-652

cg42719781

228

GATTCTGATTT

G

A

SILENT-

TAGTGATGATG

NONCODING

AAC[G/A]CTGT

GGAGAATCCA

GCAAAAGGAAA

653-654

cg42848362

34

CATGCAGGCG

C

T

SILENT-

CGCCTGTAGTC

NONCODING

CCAG[C/T]TACT

CGGGAGGCTG

ACGCAGGAGA

A

655-656

cg42848627

156

TCCTCCATCAC

A

G

SILENT-

CAGGCTGTCTA

NONCODING

ACG[A/G]GGCT

GAAGAAGTACC

ATCCATGAGT

657-658

cg42895723

781

CAGGGTGCCA

T

G

SILENT-

GGCACTTCTTT

NONCODING

AATG[T/G]GTTC

TTTCTTTATGTG

ATTATTTGA

659-660

cg42910590

365

GCCCGGGACG

T

C

SILENT-

AGGGAGAATCT

NONCODING

GCAG[T/C]AGC

TGAGGACCCC

ACATGGGGTG

AG

661-662

cg42913480

419

ATGTGGGGAAA

A

G

SILENT-

AGCAAGAGAG

NONCODING

ATCA[A/G]ATTG

TTACTGTGTCT

GTGTAGAAAG

663-664

cg42919036

1141

GAGGCAGATG

C

T

SILENT-

ATTCCCAAGAG

NONCODING

AACT[C/T]ACCA

AATCAAGACAA

ATGTCCTAGA

665-666

cg42919304

313

GAGGAAGGCA

G

A

SILENT-

AACAGAAAGGC

NONCODING

AAGG[G/A]CAG

CAAACCTTTAA

TGCCTACCTCC

667-668

cg42922781

404

GAGGCGGGTT

G

A

SILENT-

9

AGTGCCCATG

NONCODING

GATCC[G/A]GT

GTCTGGGAAG

GGGCCCACAG

AAG

669-670

cg42924993

364

TTTTAGGCCAG

A

G

SILENT-

GTGCAGTGGC

NONCODING

TCAC[A/G]CCCT

TAATCCCAGCA

CTTTGGGAGG

671-672

cg42924993

450

AGACCAGCCT

A

G

SILENT-

GGCCAACATG

NONCODING

GTGAA[A/G]CC

CCGTCTCTACT

AAAAATACAAA

A

673-674

cg42925336

402

TCATGAGGAAG

G

T

SILENT-

GCCAGGACAA

NONCODING

GTGT[G/T]GCA

GAGCGGCTTA

CCCCCATGGC

AC

675-676

cg42943021

16

GCGCGCCAGG

C

T

SILENT-

ACGCC[C/T]GG

NONCODING

CTGTTTTGTATT

TTTAGTAGAGA

677-678

cg42943021

23

GCGCGCCAGG

T

C

SILENT-

ACGCCCGGCT

NONCODING

GT[T/C]TTGTAT

TTTTAGTAGAG

ACAGGGTT

679-680

cg42943021

27

CGCGCCAGGA

T

C

SILENT-

CGCCCGGCTG

NONCODING

TTTTG[T/C]ATT

TTTAGTAGAGA

CAGGGTTTTGA

681-682

cg42943021

68

AGGGTTTTGAG

T

C

SILENT-

TGATCTGTCCA

NONCODING

CCT[T/C]AGCCT

CCCAAAGTGCT

GGGATTACA

683-684

cg42943021

90

CCTTAGCCTCC

T

C

SILENT-

CAAAGTGCTGG

NONCODING

GAT[T/C]ACAG

GCATGAGCCA

CTGTGCCCCG

C

685-686

cg43008177

233

ACATCAAATTA

C

A

SILENT-

GCAATTACCAT

NONCODING

AGA[C/A]ATGTA

TTTCATTGAATA

AATAGCTT

687-688

cg43008177

280

CTTTTGTTTGTT

gap

G

SILENT-

TGTTTGTTTGTT

NONCODING

T[gap/G]CAGGG

AAATTTAGAAC

AATTATTAG

689-690

cg43008177

313

ATTTAGAACAA

A

T

SILENT-

TTATTAGATGTT

NONCODING

AT[A/T]GTGCCT

CTTCTCGTGTT

GATACGTG

691-692

cg43040173

591

GGGAAACCCC

C

gap

SILENT-

AGGGGAGGCG

NONCODING

GAGGC[C/gap]A

GCGGGGATTT

CTGAAGCCAAG

TGG

693-694

cg43054295

281

GAGGCTGGCG

C

G

SILENT-

X

GGCTAGGGCT

NONCODING

GAGTG[C/G]AG

CGCCTGCTTAG

AGACCTTCGG

GA

695-696

cg43054295

317

TAGAGACCTTC

C

T

SILENT-

X

GGGAGAACTTC

NONCODING

TGC[C/T]GGAA

CCCCGACGGC

TCAGAGGCGC

C

697-698

cg43054295

355

GCTCAGAGGC

A

C

SILENT-

X

GCCCTGGTGC

NONCODING

TTCAC[A/C]CTG

CGGCCCGGCA

CGCGCGTGGG

CT

699-700

cg43060167

673

TTGCCCCCCG

C

T

SILENT-

4

CCAACCTACTC

NONCODING

AACC[C/T]CTTC

CAGATAAAGAC

AGTGGGCACT

701-702

cg43076634

27

GCGCGCCTAC

T

G

SILENT-

CACGTCAAGCT

NONCODING

AATT[T/G]TTTG

TATTTTAGTAG

AGGTGGGGTT

703-704

cg43089031

202

AAATGGGCCA

A

G

SILENT-

GGCGCGGTGA

NONCODING

CTCAC[A/G]CCT

ATAATCCCAGC

ACTTTGGGAGG

705-706

cg43089031

244

TTTGGGAGGC

A

G

SILENT-

CGAGGAGGGT

NONCODING

GGATC[A/G]CC

TGAGGTCAGG

AATTCCAGACC

AG

707-708

cg43149120

317

CCAAGAAGAC

T

C

SILENT-

17

GCTGGAGGGA

NONCODING

GGCTG[T/C]TA

GGAGGGACTC

TGAGCTTCACA

CC

709-710

cg43256880

128

GGGAATCTTCT

A

G

SILENT-

CTTTGACGTAT

NONCODING

GGG[A/G]AGCC

TCAGAAAGACA

TTTTCCTAAT

711-712

cg43256880

191

GAATGGCCCTA

A

G

SILENT-

TGATGTTTCCT

NONCODING

TCG[A/G]AACT

GGTACTGCTCA

GCCCTGATCA

713-714

cg43261262

805

GTGTTTGGATT

G

A

SILENT-

16

TGATCATGGAT

NONCODING

GTA[G/A]CATAC

ACCAAAATCCA

CCGAGACCT

715-716

cg43276309

1051

TTGGCTTGGGG

C

T

SILENT-

19

GGTCCACAGT

NONCODING

(19q13.3)

GAGG[C/T]AGA

TGCTGGGCGT

GAAGAATCTGC

T

717-718

cg43276309

1149

GGCCTGGAGG

G

T

SILENT-

19

GGCCACCAAG

NONCODING

(19q13.3)

ATGCA[G/T]GA

GCTGGGCCTG

GAGAGGCTGC

AAA

719-720

cg43276380

686

GTCCACTGTGA

G

C

SILENT-

GGCAGAGGCT

NONCODING

GGGC[G/C]TGA

AGAATCTGCTG

TGAGGCAGAT

G

721-722

cg43304080

347

GCCGGGCAGA

G

A

SILENT-

GTGGAGCAGC

NONCODING

TTGGG[G/A]CC

GTGCCCAGGG

CGGTGGCTGT

GAG

723-724

cg43323676

303

GGTGTGGTGT

G

A

SILENT-

GGATTGTAGCT

NONCODING

TCCC[G/A]AAAC

TCATGGCGCCT

CCCCTCGGAC

725-726

cg43326623

188

GGATAACCAGA

C

T

SILENT-

ATTATCACAGC

NONCODING

ACC[C/T]TCTCA

TTCCCAGCGC

GTCCTTCTGA

727-728

cg43328092

168

TTGCTGTGTAA

A

G

SILENT-

8

GAATAGGTCCC

NONCODING

TCA[A/G]CATGA

AGATGTGTCTG

CGTCTGAGC

729-730

cg43328259

499

AGCCTTCAGG

C

T

SILENT-

GAGCGTGGAG

NONCODING

ACAGG[C/T]TTT

GAAGACAAGAT

TCCCAAAAGGA

731-732

cg43328259

771

ACGGCCATGG

T

C

SILENT-

AGACTGCAATT

NONCODING

CCAT[T/C]CAGA

TCACGTTTCTT

CAATCTGAAG

733-734

cg43336005

591

GATGTTTGGCT

T

G

SILENT-

22

GCACGCGAGC

NONCODING

CCAC[T/G]CGG

GACAGACCCAA

GAACACGAATT

735-736

cg43916688

3263

GCGGCGAGAG

G

gap

SILENT

11

CGGCTCCTCTG

NONCODING

CGCA[G/gap]CC

GGCGCCGGCT

CCGCTTCCCCT

TC

737-738

cg43917746

251

TGGATAGAAGT

T

C

SILENT-

20

GCTTCACTAAT

NONCODING

TGC[T/C]TTATT

TAAGCATACAA

GAAAAAAAG

739-740

cg43917746

471

CAAGCATGGC

A

G

SILENT-

20

CCAGCCAGCA

NONCODING

CCACC[A/G]CC

CCCAAAACGAA

CAAAACAAGAG

A

741-742

cg43918370

648

TCACCAGGAAA

A

G

SILENT-

GCCTGCCTCCT

NONCODING

CCG[A/G]GGAC

CTGCCCGCCT

CCGGGAGCAG

C

743-744

cg43919788

1419

AGGGACATAGA

C

G

SILENT-

ACCAAGCCCCA

NONCODING

GGG[C/G]TGCC

CAGCTACACGA

CCGCCGCTGG

745-746

cg43919798

780

ACTCACCTTAT

C

G

SILENT-

TCTTCATTTCC

NONCODING

CCT[C/G]GTGA

ATCCTCCAGGC

CTTTCTCTAC

747-748

cg43921083

318

CAGGCCGGTG

T

C

SILENT-

17

ACCCCCCATG

NONCODING

GAGCC[T/C]CA

CATGGCGAAG

AGGATGAGGA

AGG

749-750

cg43921103

471

CAGGGAGACG

gap

A

SILENT-

CCAGCATTAAA

NONCODING

AAAA[gap/A]GA

GAGATGTGTTT

ATTCCATGATC

A

751-752

cg43921107

386

GGGTGCTGGC

C

T

SILENT-

CTTCCTGGCCT

NONCODING

CGGC[C/T]TTCT

TCTTGGTGGTC

GACGCGTATT

753-754

cg43921619

466

AGAGCTGGAG

G

T

SILENT-

17

GAGGTATTTGT

NONCODING

GAAG[G/T]AGC

AGGGAGAAGA

GGAGCTGCTG

AG

755-756

cg43925523

173

GGATTCTTGGC

G

A

SILENT-

TGAAGAATTCC

NONCODING

TCT[G/A]GAAAT

ATTTCCCAGGC

CAAAGGTGG

757-758

cg43926586

75

ACACCATGCTG

G

gap

SILENT-

GCCGTGGTGC

NONCODING

GAAA[G/gap]CC

TCCCGAGTCCA

TACAGATACCA

C

759-760

cg43926872

697

TTCTGCGGCC

T

G

SILENT-

19

GCATCCTGAGC

NONCODING

ATGG[T/G]GAA

CACAGATGATG

TCAACGCCATC

761-762

cg43927587

4712

GCGGGATGTTT

G

A

SILENT-

2

CTTGGGGGCA

NONCODING

GCTC[G/A]GGT

GGAGACACGA

CACTTTCTACT

G

763-764

cg43927587

4793

CTTCCTCCAGC

C

T

SILENT-

2

GGCACAGGGT

NONCODING

ATTG[C/T]AGAT

GGGTCACCAG

CCCCATGTTTT

765-766

cg43927733

274

CTACACAGGAA

A

G

SILENT-

GCTGGAAAGAT

NONCODING

GAC[A/G]AGAT

GAATGGTTTTG

GAAGACTTGA

767-768

cg43927893

1071

AGCATTGGTGC

T

C

SILENT-

CCTCATGGCTC

NONCODING

ATG[T/C]GGAC

GCAGTAATCCG

CCACTGTGCA

769-770

cg43927893

1329

GCCAACATCGA

G

A

SILENT-

GCCACTCTTTG

NONCODING

ACC[G/A]GTTG

CTCATTTTCTG

GTCTGACCGG

771-772

cg43929036

299

TGGGCAGCCG

G

gap

SILENT-

ACCCCTTGCAG

NONCODING

CGCT[G/gap]GC

CAGCGGCCGC

CACCACCACAC

CG

773-774

cg43929036

300

GGGCAGCCGA

G

gap

SILENT-

CCCCTTGCAGC

NONCODING

GCTG[G/gap]CC

AGCGGCCGCC

ACCACCACACC

GC

775-776

cg43929139

2789

TAATCTGTTCA

A

G

SILENT-

ACCCGAGGTCT

NONCODING

TTG[A/G]AAACG

AAGATCAAAAC

AATAATGAA

777-778

cg43929139

2828

AACAATAATGA

C

T

SILENT-

ACCAGAGAGC

NONCODING

GAAT[C/T]TTGG

GATGATTTCTA

TCATTGCACA

779-780

cg43929139

2834

AATGAACCAGA

A

G

SILENT-

GAGCGAATCTT

NONCODING

GGG[A/G]TGAT

TTCTATCATTG

CACATACAAA

781-782

cg43929139

2861

GATTTCTATCA

A

G

SILENT-

TTGCACATACA

NONCODING

AAT[A/G]ATGG

GAATTTTAGTA

TGTTTTATCA

783-784

cg43929652

984

CACTTGTTAGG

G

A

SILENT-

17

CTCTTGTCAGC

NONCODING

ATT[G/A]ATAAC

TGGCATGTTTT

ATTGCAGCC

785-786

cg43929990

1931

GTGTTAACTTG

C

T

SILENT-

10

AAGAGATTCAA

NONCODING

CTT[C/T]TTCTT

TGACTGGTACT

TCCGCCCTA

787-788

cg43931352

111

GCGTCGTTCCT

T

C

SILENT-

CCGGTCCATCT

NONCODING

CGC[T/C]CATG

CTCAGGGCGG

TGGCAAAGGG

G

789-790

cg43931759

2018

AGCTGGGATGT

G

C

SILENT-

6

ACCTGGAGAG

NONCODING

ATAG[G/C]GGG

TAGTTCTCCCT

ACTGCCCAGG

C

791-792

cg43931795

1456

AGAATGGCAG

G

gap

SILENT-

7

GCGGACCGTG

NONCODING

GCGAA[G/gap]G

CTCTGCCCTGG

TTGAACATTTC

TG

793-794

cg43933469

689

TGTGCCAGGT

C

A

SILENT-

GCCCGTCTGA

NONCODING

GCTGG[C/A]TC

CATCATGACGC

GTCACTTTGTC

C

795-796

cg43934707

651

CGGTGGGACC

G

A

SILENT-

AGCGCCATCAC

NONCODING

CTCC[G/A]TAC

GGATGTTCTCC

CTCCGGAAGC

C

797-798

cg43934839

1383

CCAACCATGCA

A

G

SILENT-

13

TTAAGTTTAAC

NONCODING

CAA[A/G]AGCT

GCAATATTCCA

GATTCTTAAA

799-800

cg43934839

1547

GGTGCCTGTTT

C

T

SILENT-

13

ACTTCTGGTCT

NONCODING

GCG[C/T]GGGC

TCAGGTTTCAA

AGAGCTTGCT

801-802

cg43934938

4214

TGAGGAAACCT

C

T

SILENT-

16

AGGAAATCTCG

NONCODING

GTG[C/T]ACTAG

GAAGTGAATCC

CGCAGGACA

803-804

cg43934938

4231

TCTCGGTGCAC

C

T

SILENT-

16

TAGGAAGTGAA

NONCODING

TCC[C/T]GCAG

GACAGCTGCA

CTCAGGGATAC

805-806

cg43934938

4351

GATTGTCTTTC

A

T

SILENT-

16

TGCCACAGAAC

NONCODING

AGC[A/T]GCAG

ACGTGTCGGG

AGGTTAGCTGC

807-808

cg43934938

4391

AGGTTAGCTGC

A

C

SILENT-

16

GGAAAGAAATC

NONCODING

GGG[A/C]TGCC

GCGGAGCACA

GAGTGATTTGG

809-810

cg43934938

4395

TAGCTGCGGAA

C

T

SILENT-

16

AGAAATCGGGA

NONCODING

TGC[C/T]GCGG

AGCACAGAGT

GATTTGGAACT

811-812

cg43935748

1437

ATGCTATGGGT

C

T

SILENT-

14

ATCTGTTTCAG

NONCODING

AAG[C/T]TCTGT

TGGTATCTTGT

GGTGTCTGC

813-814

cg43935826

266

TCATTACGGTC

A

G

SILENT-

ACAATGACGAT

NONCODING

GTC[A/G]GAAA

CCATGCAATGA

AACCAATAAA

815-816

cg43936051

705

TCAGAAAAAAA

C

T

SILENT-

4

GCTATCCAGCT

NONCODING

TTT[C/T]GTGGA

ATCTGGTGAAG

TTTACACTT

817-818

cg43939976

1317

CCGTCGGTCCT

C

gap

SILENT-

6

GGCGTAGCGC

NONCODING

CTCC[C/gap]GT

GTCCGGGGTA

GATCTTGTACC

CG

819-820

cg43941070

1254

ATGTTCCCCGG

G

A

SILENT-

17

CCTGCGACCAA

NONCODING

GAC[G/A]CTTTT

TCCTGACTACT

TCTTCAACT

821-822

cg43941070

1278

CGCTTTTTCCT

C

T

SILENT-

17

GACTACTTCTT

NONCODING

CM[C/T]TCTGA

CATAGGTTTTG

CTGATATAA

823-824

cg43941070

1284

TTCCTGACTAC

C

T

SILENT-

17

TTCTTCAACTC

NONCODING

TGA[C/T]ATAGG

TTTTGCTGATA

TAAACGCAA

825-826

cg43941070

1305

CTGACATAGGT

C

T

SILENT-

17

TTTGCTGATAT

NONCODING

AAA[C/T]GCAAA

CCCGGCTCTAT

ACCTACCAA

827-828

cg43942215

5435

AGCCAATATAG

C

T

SILENT-

12

GGCCTCGTCTC

NONCODING

ACT[C/T]AGGTG

TCAGTGCTGCA

GTATGGAAG

829-830

cg43942920

1632

TGCGTGGTGA

T

G

SILENT-

X

CGGGCAGTGA

NONCODING

GGACA[T/G]GT

GCGTGCACTTC

TTTTGATGTGGA

G

831-832

cg43942990

826

GGCTGTAGAAT

G

A

SILENT-

1

ACCTTCTCCTT

NONCODING

(1p11)

GAC[G/A]GGGT

ACAGCAGCTCC

ACATCCCTCT

833-834

cg43943163

881

CTGAGCTATAA

T

C

SILENT-

1

ACTTGTCATAG

NONCODING

ATT[T/C]GCTGT

GTCATCGCAG

GCTGCAACTG

835-836

cg43944408

1076

AGGAGGCAAT

gap

G

SILENT-

17

GAGATGATGG

NONCODING

GGTGA[gap/G]G

GAAACATGAAA

GTAACACTTGA

TT

837-838

cg43944446

906

ACTCTGATCGG

T

gap

SILENT-

16

TTATTATCCCC

NONCODING

TCA]T/gap]TTTT

TGTAGGAAATA

AGTTTGCTTG

839-840

cg43946473

880

CACAGAAAAGA

C

T

SILENT-

11

CAACAGATGTG

NONCODING

TTT[C/T]TAAGG

CACGATTTACA

TACTAAATT

841-842

cg43946992

136

ACATCTGGGTT

A

G

SILENT-

GACCAGGAGC

NONCODING

CACA[A/G]AAGT

TCCCATCATGA

GAAAGGGGGC

843-844

cg43947759

876

TCAACTTCTGT

gap

G

SILENT-

22

AAGATGGGGG

NONCODING

GGGG[gap/G]A

CAAAAAGAGAA

GTAAAGTTAAG

AA

845-846

cg43948617

1024

AGAGGTTATCA

A

G

SILENT-

AGGACATTTAA

NONCODING

GGA[A/G]TCCT

GATCCTCAGAA

CTTCTCTGGG

847-848

cg43949585

1172

GTAATGTTAAA

A

G

SILENT-

8

ACTAAATACAG

NONCODING

ATG[A/G]TAATA

ATTGCTATTTC

ACAGTGATG

849-850

cg43949806

460

TTACCAACCCT

C

T

SILENT-

16

GGGGCTTTATA

NONCODING

(16p13.1)

CTC[C/T]CTCTC

CACCAATCCCT

GATGACCCC

851-852

cg43950348

1308

GGCATTGCAG

A

C

SILENT-

9

CGGCTCGGGG

NONCODING

TCCAA[A/C]GC

CTCACTACCAG

TCTGGGTCCG

GC

853-854

cg43950620

262

AGCAAGGAATG

A

G

SILENT-

8

TTCTTATTCTTT

NONCODING

GT[A/G]GGAGC

TCCTTCTTTAC

ACTGTCAGG

855-856

cg43951104

679

TAGTGAAAACC

C

T

SILENT-

17

AAGTGACAAAC

NONCODING

ACA[C/T]TCCTC

GACCCCAAGTT

CTTCCACAT

857-858

cg43951505

603

AAAGGTGGAAA

A

C

SILENT-

ATGAGGTTGAT

NONCODING

CGC[A/C]GCAT

TCAGAAAGTGT

ATAAGACCTA

859-860

cg43952456

394

AATGCTAAACT

T

C

SILENT-

GCTTTCATGCT

NONCODING

AAT[T/C]TTCTG

ACTGTTTACTT

ACCGGGTAA

861-862

cg43952456

401

AACTGCTTTCA

C

A

SILENT-

TGCTAATTTTCT

NONCODING

GA[C/A]TGTTTA

CTTACCGGGTA

AGAGCGAT

863-864

cg43952456

416

AATTTTCTGAC

G

C

SILENT-

TGTTTACTTAC

NONCODING

CGG[G/C]TAAG

AGCGATGGGA

CTGTTTTCATT

865-866

cg43953844

1867

CTCAACCTGCC

T

C

SILENT-

7

TTAGCTGCACT

NONCODING

CTC[T/C]TACCT

ACAGCTGGACA

GTACCTGTC

867-868

cg43955665

184

CTGCCAGCTGA

G

A

SILENT-

16

CAGGATCTTTT

NONCODING

(16q22)

GCT[G/A]GGCC

CCCTTCTCTGT

GCTGAGTGGA

869-870

cg43955829

593

ATAAGGCCATT

C

T

SILENT-

CAGCGAGGGA

NONCODING

CCAT[C/T]AAGT

GCAACTTTGCG

GGGGTTGCCT

871-872

cg43955863

290

AGGACACCAA

T

C

SILENT-

7

GGTACCCAATG

NONCODING

CCTG[G/C]TTAT

TCACCATCAAC

AAAGAAGACC

873-874

cg43955871

1011

AGCGCTCCTCC

A

G

SILENT-

7

AGCAGGGACA

NONCODING

GCTC[A/G]CTG

ATGAGGTCGGT

GATGGCGTTG

G

875-876

cg43957194

591

CCATTTTTAGC

G

A

SILENT-

—

ATCAGAAACAC

NONCODING

AAG]G/A]AAATA

AAATTCGTGGT

TAGATTGAT

877-878

c943957205

782

ATACCTGAGGT

T

C

SILENT-

13

j

TTCATGTCTTTA

NONCODING

GT[T/C]GCCTTA

TCATAATCCCA

AATATACA

879-880

cg43958108

777

CAGCAAACCTG

A

G

SILENT-

5

AATGGCACAAT

NONCODING

GGA[A/G]CACA

GACTTAAAAGA

TGCTTCAGTG

881-882

cg43959150

709

ACTTCCACGCG

C

gap

SILENT-

22

GTGAACGTGG

NONCODING

CGCA[C/gap]CC

GTTCGCTTCAG

CAGTTTCCTAG

G

883-884

cg43959363

614

CTTCATTTCTTT

G

A

SILENT-

17

GGTTTCTTGGG

NONCODING

TA[G/A]TGGGC

GCCGGAACAG

CAAGATGTGA

885-886

cg43960242

16

ACTGCAACGC

A

G

SILENT-

19

GGAGG[A/G]GC

NONCODING

AGGATGGAGAT

CCCTGTGCCTG

T

887-888

cg43960953

2143

AACAGTGGGC

C

A

SILENT-

ATGTCTTCTCG

NONCODING

CGGT[C/A]GAT

CGGTTTCTCTG

GCTCCTTCTTA

889-890

cg43962392

910

AAAGAAGGTAG

C

T

SILENT-

1

AGGAACTTGG

NONCODING

GAGA[C/T]TGA

GGGAAAGATA

GGAGAGAGGA

AG

891-892

cg43964611

319

GCTGGGCTTC

G

A

SILENT-

CCCGAGCTGG

NONCODING

AGAGC[G/A]GG

GAGGACCAGC

CCTTCTCCAGG

CT

893-894

cg43965993

2007

GTCGTTTTCTC

T

C

SILENT-

4

AAAAAAATATC

NONCODING

GTA[T/C]AAGTG

ACTCATCCTGT

CTGCTAACT

895-896

cg43966536

659

CCCAGTATGTA

G

A

SILENT-

7

CCACCCCGTTT

NONCODING

CTC[G/A]TAAAT

GAAGGCAGCA

GCTCCAGCCA

897-898

cg43967276

314

CACGTGCGTG

T

C

SILENT-

9

GGGTGTTGGC

NONCODING

ATTCT[T/C]GTT

ATTTAACACGG

GAAGGAGGTG

A

899-900

cg43967511

276

TAGTCCTGTTG

G

C

SILENT-

12

ACCTGGAAATG

NONCODING

GTG[G/C]CAGG

TGAAGTCTCTC

CACAGCATGC

901-902

cg43968814

3546

GCCAGAGCTTC

C

T

SILENT-

17

CGCGCCCTCG

NONCODING

(17q22)

CCTG[C/T]CCA

GGTGTCCTGCT

CGCCTCCATCT

903-904

cg43969044

734

CCTCTGATGTT

A

G

SILENT-

5

CAGTGAAGAG

NONCODING

GACC[A/G]GAA

AAGTCTGCTAG

AGCAGTACCAT

905-906

cg43970408

949

CTCTTTGGCTT

G

gap

SILENT-

17

GTTTTGGCGCT

NONCODING

GGC[G/gap]CTG

GCAGAGGCTG

AGACACGGCG

AG

907-908

cg43970722

769

CCACCATCTCC

C

T

SILENT-

GTTGTTTCTGG

NONCODING

AAG[C/T]ACCCT

CCAGGCAGGC

CAGCCAGCAT

909-910

cg43971702

705

CTCAGCTCCTC

C

T

SILENT-

20

CAAATGGTTGT

NONCODING

CCA[C/T]CCCA

GACGACTGGG

GGGTTGGTGG

A

911-912

cg43971764

4293

CCTCAGCCTCC

T

C

SILENT-

15

CAAAGTGCTGG

NONCODING

GAT[T/C]ACAG

GCATGAGCCA

CCACGCCCGG

C

913-914

cg43972482

412

TTAGTAGAGAC

T

C

SILENT-

8

GGGGTTTCACC

NONCODING

ATG[T/C]TGGTC

AGGCTGGTCTC

GAACTCCTG

915-916

cg43973408

205

ACCGAGGAGC

A

G

SILENT-

19

AGGAATATGAG

NONCODING

(19q13.4)

GAGG[A/G]GCA

GCCGGAAGAG

GAGGCTGCGG

AG

917-918

cg43974489

666

GGCTAGCCCA

T

C

SILENT-

1

CCTGCCATGGT

NONCODING

TGCC[T/C]TTCT

GCTTGGGGAT

GCCCTGTCT

G

919-920

cg43974987

1386

CATCCTCAGAG

C

gap

SILENT-

22

TCTGAGCGGC

NONCODING

ACCG[C/gap]AG

ACCTTCTTTTTC

AAGTTCACTAA

921-922

cg43977577

590

CGTAGTGTAAA

G

gap

SILENT-

GAACGTAAATT

NONCODING

GAA[G/gap]GCC

CCGGGCCAATT

CTGGGAAGA

G

923-924

cg43977577

591

GTAGTGTAAAG

G

gap

SILENT-

AACGTAAATTG

NONCODING

AAG[G/gap]CCC

CGGGCCAATTC

TGGGAAGAGG

A

925-926

cg43977954

2556

TGTGAACGGC

T

A

SILENT-

CCGGAGAGAG

NONCODING

CTGGG[T/A]GG

TGTATGGGGTG

ACCTCCTGGG

GG

927-928

cg43979039

674

CGCCCCTCACA

C

G

SILENT-

2

GTGAAGAATCA

NONCODING

GGA[C/G]AGCC

ACTCTCTGGTT

TTCTCACAAC

929-930

cg43979039

769

GTCCTGGTCCC

A

G

SILENT-

2

ACACAGAGAGA

NONCODING

GGA[A/G]GCGC

CACAACCCACT

CTGCCAACCT

931-932

cg43980463

970

AGAAGCTGGCT

G

gap

SILENT-

16

GGTAGGACCC

NONCODING

GCAG[G/gap]GA

CCAGCTGACCA

GGCTTGTGCTC

A

933-934

cg43980508

349

TTCTTGCAATT

C

T

SILENT-

5

ATCCAGGCAG

NONCODING

(5q13.3)

GTGA[C/T]GAC

AACTTGATGCA

GGAAATCAACC

935-936

cg43980653

1001

CCCCCTCAGC

C

gap

SILENT-

18

CATTGCCCATG

NONCODING

AGGG[C/gap]CT

CCACGTTGTCT

GATGGTCGCT

GG

937-938

cg43981661

577

TCATGAGCCCG

C

T

SILENT-

17

CTGACCCGGT

NONCODING

GGGC[C/T]GAG

GGCAGGTAGG

GAGACTTCCTC

G

939-940

cg43982025

488

TGGGAAGGCG

G

gap

SILENT-

19

CATATCCTGGC

NONCODING

GGCA[G/gap]CA

GCACGTGGCA

CCAGGTGCCA

GGC

941-942

cg43982782

1142

GGTCCCTGCC

G

C

SILENT-

4

ACAGCCGTGG

NONCODING

(4p16.3)

AGGGC[G/C]GA

CGTGACCTACG

CGGCCATGGT

GG

943-944

cg43983035

1104

CAGGCGGACC

T

C

SILENT-

5

TGGAGGTGTCA

NONCODING

(5q31.3)

GCCA[T/C]GGC

CCTGACCACTT

CAGAAGCCAAC

945-946

cg43983194

196

GGCCAACCAT

T

C

SILENT-

17

GAGGAGTGCA

NONCODING

AAGGG[T/C]GG

CACCGGCCAG

TGCCCCTGGA

CAC

947-948

cg43983194

451

GCAGCCATGC

G

A

SILENT-

17

GCACCTCCAC

NONCODING

GCACG[G/A]CC

GAGCTCAACCC

GAAGACCACG

CG

949-950

cg43983314

730

CAAATGGCAAC

T

C

SILENT-

ATAATGAAATC

NONCODING

ACT[T/C]TCTGC

ATCCAGAATTA

CACCCCCAA

951-952

cg43983314

733

ATGGCAACATA

T

A

SILENT-

ATGAAATCACT

NONCODING

TTC[T/A]GCATC

CAGAATTACAC

CCCCAAGGT

953-954

cg43983314

856

CCGCGAGGTG

C

T

SILENT-

CCCTATGCCTA

NONCODING

CATC[C/T]GTGA

GGGCCATGAG

AAGCAGGCCG

A

955-956

cg43984242

307

AGCTCATCCAG

C

gap

SILENT-

CTGGACCAGG

NONCODING

CGAA[C/gap]CC

CTGGCCGCTG

TGCTGAAGGA

GGT

957-958

cg43986540

1009

GACACCGCCT

G

A

SILENT-

GGCCTGGTGC

NONCODING

TCCAG[G/A]GG

TGAAGCAGGC

CAGAATCCTGG

GG

959-960

cg43987682

1309

AGCCCTTCTCC

A

G

SILENT-

17

AGCACCTTGGC

NONCODING

AAA[A/G]ATGTC

CGTCAGCACCT

CTTTGATGG

961-962

cg43991793

915

CCCCTGTGCCA

C

T

SILENT-

9

TCATTTGGGCC

NONCODING

CCC[C/T]AGAC

ACTGGAGGAC

AGCGTGAGATA

963-964

cg43991835

194

AAAAGCGGCG

A

G

SILENT-

19

AGGAACGCTTG

NONCODING

(19q13.1)

AAGG[A/G]AAT

GGAGGCGGAG

ATGGCCCTGTT

T

965-966

cg43994222

230

CACAGAGATTT

G

A

SILENT-

18

TACATCACCTT

NONCODING

TCA[G/A]AACG

CAACAGGGTC

CGGGACAGGG

A

967-968

cg43995297

288

AGGATCCCTG

C

T

SILENT-

GCTCGCGTCC

NONCODING

CCAAC[C/T]GG

TTCCGTGTCTC

ACCTGGGTCCT

G

969-970

cg43995517

460

TGGCCGTAGG

A

G

SILENT-

12

AGAAAATACAA

NONCODING

CCCT[A/G]CCC

GGAACAGCAAT

AGCTCCCGCC

A

971-972

cg43995517

516

TTACCTTGGAA

C

G

SILENT-

12

CCCAGCCCTAC

NONCODING

AGC[C/G]CGAG

CAGCTGTCCCT

CTGCCTCCCC

973-974

cg43995517

532

CCCTACAGCCC

C

T

SILENT-

12

GAGCAGCTGT

NONCODING

CCCT[C/T]TGCC

TCCCCGGGCC

CGCCCTGGCC

G

975-976

cg43997174

735

GGTCATCACAC

A

G

SILENT-

1

ACAAGTGGACC

NONCODING

(1p13.1)

ACC[A/G]GCCT

GAGTGCAAAAT

TCAAGTGCAC

977-978

cg43997490

511

AACATGAGGAC

G

A

SILENT-

1

CCTCTGGATTA

NONCODING

ATC[G/A]AATTA

CAGCTGCTAGC

CAGGAACAT

979-980

cg43997710

242

ATGTGAGGTGT

G

A

SILENT-

GACCTCACGAA

NONCODING

GAA[G/A]CAAAT

TTAATATTATAA

TGGGAAGC

981-982

cg43997768

261

AGATCCCAAAG

A

G

SILENT-

8

CCCAGTACACA

NONCODING

AGT[A/G]TCTAC

GGAGCCCTCA

AGAAAATCAT

983-984

cg43999766

367

CATGTGAAGAG

gap

T

SILENT-

17

ACCCAGCCTCT

NONCODING

TCA[gap/T]AGG

GTATCCAAGAT

AAACTTCCGTT

985-986

cg44000102

879

CTTATTCTTCTT

T

gap

SILENT-

X

TGAATACAATG

NONCODING

AC[T/gap]TCTG

GCACTGATCG

GGTCAGTTTCT

987-988

cg44000102

880

TTATTCTTCTTT

T

gap

SILENT-

X

GAATACAATGA

NONCODING

CT[T/gap]CTGG

CACTGATCGG

GTCAGTTTCTT

989-990

cg44000102

882

ATTCTTCTTTGA

T

gap

SILENT-

X

ATACAATGACT

NONCODING

TC[T/gap]GGCA

CTGATCGGGTC

AGTTTCTTCC

991-992

cg44000241

412

CTCCCTCACGG

gap

T

SILENT-

AGCCAGCGGC

NONCODING

CGGG[gap/T]AA

TGCAGACATCA

GAACGTGAGG

GG

993-994

cg44001933

566

CTCCGCGCAC

C

G

SILENT-

AGTGGTGGCC

NONCODING

ACCGC[C/G]AC

TGGTGCTGAAG

TGTCGGCGTGT

995-996

cg44002491

17

GCGCGCCCTT

T

C

SILENT-

CTTCCC[T/C]TA

NONCODING

CTGCGAGGAG

CCACCGCCTCT

TT

997-998

cg44003839

280

CCCCATCACCA

A

G

SILENT-

GGCGGTTCTC

NONCODING

CCCG[A/G]TCT

CCAGCGACAG

CCCCAGGGCT

CC

999-1000

cg44003987

1374

CACACGCACAC

A

C

SILENT-

GTACATTCACT

NONCODING

ACA[A/C]ACGT

GCAGCCTCCT

GCACACGTGC

A

1001-1002

cg44005542

627

TCCTAATCCCA

G

gap

SILENT-

TGCCAGAACC

NONCODING

GAAG[G/gap]CT

AATGGCCACAT

TCTTCTTTTAAA

1003-1004

cg44007769

1257

GGGTCTGCTG

A

T

SILENT-

AGTTGGAGGA

NONCODING

GTGCA[A/T]TGT

CGCCCTGGGA

GCCCTCCTGG

AG

1005-1006

cg44009153

928

AGCCATGGGTT

A

G

SILENT-

TGGGTAATAAG

NONCODING

AAG[A/G]GAGA

GCATTTGGGGT

TCAAGAGAGG

1007-1008

cg44009645

5337

CTTTTCCATGT

T

C

SILENT-

GGCTCAATATC

NONCODING

AAC[T/C]TTTCC

CGTCTAATGAT

GACAAATCT

1009-1010

cg44012500

579

GGCATCTCATC

G

gap

SILENT-

19

TTGCTGGGGCT

NONCODING

(19q13.2)

GTT[G/gap]GGT

CCCCTGGACCT

CAAATCCCAAT

1011-1012

cg44014720

482

CAGAGCCCTGT

T

C

SILENT-

CGTGGCCCTG

NONCODING

TCCA[T/C]CTCC

TGCGCCAGGA

AACACAGGTTC

1013-1014

cg44020161

854

CTCCAAGAGTT

G

gap

SILENT-

17

CTGGTCTCCCG

NONCODING

CGA[G/gap]GG

GCGGAGTTCC

CTCCCCAGTCC

CG

1015-1016

cg44021014

561

GAGGCACACA

G

A

SILENT

GACAGATGATG

NONCODING

AGCA[G/A]CTCT

TCTCCTTAAAG

AAGTCTGTGT

1017-1018

cg44024536

1149

AGGGTGTGTGT

G

A

SILENT-

9

GTGTGTGTGTG

NONCODING

TGT[G/A]TGTGT

GTGTGTGTGCG

TGTGCG

1019-1020

cg44024536

1153

TGTGTGTGTGT

G

A

SILENT-

9

GTGTGTGTGTG

NONCODING

TGT[G/A]TGTGT

GTGTGCGTGTG

CG

1021-1022

cg44026832

299

TGGCTGCAGA

G

A

SILENT

X

GGTTGAGCCTC

NONCODING

CTGA[G/A]CCC

CTGCTTGGTGA

CAAGGGACCT

G

1023-1024

cg44026832

446

CTTCTGGGCTG

C

T

SILENT-

X

TGGCCCTGCTC

NONCODING

TTG[C/T]TGGCT

ACTCTCATGGA

GCAGGGCTT

1025-1026

cg44029982

1149

GGCTGTAGGG

G

A

SILENT-

GATGTTGGTCT

NONCODING

CCTG[G/A]AAAA

AGGCGCTGAG

GGCTGTCTCGA

1027-1028

cg44030164

245

GGGTGTAGAC

G

T

SILENT-

14

GCTGCTGGCC

NONCODING

AGCCC[G/T]CC

GCAGCCGAGG

TTCTCGGCACC

GC

1029-1030

cg44031677

249

CAGACCCGAG

C

T

SILENT-

GTGCCCAGGG

NONCODING

CATTC[C/T]GGA

GGCAGCCGAG

GGCAGCAGCT

CC

1031-1032

cg44031863

401

GGACTTCTGCT

C

T

SILENT-

1

GCGTCTTCGG

NONCODING

CCAC[C/T]TCTC

CTCTTGCCTTT

TGGTGGACCC

1033-1034

cg44033624

1409

GAAAAGGGATA

G

gap

SILENT-

10

CTTTGATAATTA

NONCODING

(7q31)

AG[G/gap]CCAG

AGGCCCATTAG

TTGAGAAAGT

1035-1036

cg44033878

1552

AAGGAGGGAT

T

G

SILENT-

1

ATGTTCCACGT

NONCODING

AACT[T/G]GCTG

GGACTGTACCC

AAGAATTAAA

1037-1038

cg44034830

701

GCAGGAGCCT

G

A

SILENT-

5

GCAGGAGGCT

NONCODING

GGAAA[G/A]TC

AGGCTAGGGA

TATAGCAGGGA

TG

1039-1040

cg44127556

521

TGGCGACGAC

A

G

SILENT-

TCTGGAGTGG

NONCODING

CGGAT[A/G]CG

GGGGAGGCGG

ATGTCCCTGGG

TC

1041-1042

cg44128344

137

CCCCAGGATTC

T

C

SILENT-

19

TGGCCTGCTTC

NONCODING

ACC[T/C]CTGG

AGCACCAGGC

CAGGCGGTGT

C

1043-1044

cg44131644

894

TGGCCCCCGA

G

C

SILENT-

21

CGAGCTGTACA

NONCODING

CGCC[G/C]CAC

AGCTGGCTGCT

CCGCGTGGTAT

1045-1046

cg44911913

967

AGGGCGGCTG

C

T

SILENT-

15

CGGGGCTGCC

NONCODING

CCTGG[C/T]CC

CCCGGCCCCT

CCTGGGCGCC

CTC

1047-1048

cg44911913

992

CCCCCCGGCC

C

G

SILENT-

15

CCTCCTGGGC

NONCODING

GCCCT[C/G]GT

CCCGCTCCTG

GCCCTGCTCC

CTG

1049-1050

cg44912347

853

TCTTTACATTA

A

G

SILENT-

11

GAGCCAATTTA

NONCODING

AGA[A/G]GGCG

CTGGTATGTAT

GAGCTTTTTG

1051-1052

cg44913333

159

CTCCCCAGAAT

A

G

SILENT-

12

TCCTAGACTGG

NONCODING

GTT[A/G]ATAGG

GTCATATTGTG

AATGTCTCA

1053-1054

cg44914547

175

GGTCCCCCTG

C

T

SILENT-

CTTCTTCCCTG

NONCODING

CAGA[C/T]ATG

GTGGAGCTGC

TGCTGCTGCAG

A

1055-1056

cg44914547

86

GGACCTGCAC

G

C

SILENT-

AGTGTACAGAC

NONCODING

ACAC[G/C]TGTT

CTCTGGTCCTA

TAATGCTCTA

1057-1058

cg44914864

641

GCTCCGATGC

T

C

SILENT-

GCGATGCATTC

NONCODING

ATAG[T/C]GTCG

CCTTTCAGGAA

AGTTCGGTGT

1059-1060

cg44915149

857

GGGTCATCGA

C

T

SILENT-

CCCCATTGATG

NONCODING

GCAC[C/T]AAG

AACTTCGTGCA

CGGGTCTGTTG

1061-1062

cg44916019

445

CCCCTACCTGG

A

G

SILENT-

22

CCTGGCTGGC

NONCODING

CTTC[A/G]CGA

CCACACTCAAC

TACTGCGTATG

1063-1064

cg44916367

200

GGTGCCACCA

A

G

SILENT-

GGCTCTTTTTA

NONCODING

ACAA[A/G]CAGT

TCTCACAGAAA

CTAATCAAGT

1065-1066

cg44916367

248

AGTGAGAACTC

A

G

SILENT-

ACTTGTTACCA

NONCODING

CAA[A/G]GATG

GCACCAGGCT

ATTCATGAAGG

1067-1068

cg44919480

413

GCCACGAAGT

A

G

SILENT-

GATTGTGTCTG

NONCODING

CAGC[A/G]TGT

GGGCGGAACC

ACACCTTGGCC

T

1069-1070

cg44919480

504

GATGGGGCAG

C

T

SILENT-

CTGGGCCTTG

NONCODING

GCAAC[C/T]AG

ACAGACGCTGT

TCCCAGCCCC

GC

1071-1072

cg44919623

302

GCCCGCTGCG

G

A

SILENT-

ATATGTCGTCC

NONCODING

TTTG[G/A]GAAC

CTCAGCAGCCA

GCCGTAAGTC

1073-1074

cg44919623

307

CTGCGATATGT

C

T

SILENT-

CGTCCTTTGGG

NONCODING

AAC[C/T]TCAGC

AGCCAGCCGT

AAGTCCTCAC

1075-1076

cg44920877

23

ACGCGTTCTGC

G

A

SILENT-

GAGGCCATGC

NONCODING

G[G/A]GTCTAT

GCCCCGCGGC

CGTTGGCCT

1077-1078

cg44920877

27

CGCGTTCTGC

T

C

SILENT-

GAGGCCATGC

NONCODING

GGGTC[T/C]AT

GCCCCGCGGC

CGTTGGCCTC

GCC

1079-1080

cg44920877

45

GCGGGTCTAT

G

A

SILENT-

GCCCCGCGGC

NONCODING

CGTTG[G/A]CC

TCGCCCACACC

CCCGGCCCCA

CT

1081-1082

cg44920877

58

CCGCGGCCGT

C

T

SILENT-

TGGCCTCGCC

NONCODING

CACAC[C/T]CC

CGGCCCCACT

GCGGGTGGAG

AGA

1083-1084

cg44920877

68

TGGCCTCGCC

A

G

SILENT-

CACACCCCCG

NONCODING

GCCCC[A/G]CT

GCGGGTGGAG

AGACGTCGGG

CCC

1085-1086

cg44923068

71

CGGACACCCG

G

gap

SILENT-

1

GCGGGAGCTG

NONCODING

GCGGA[G/gap]C

TCGTGAAGCG

GAAGCAGGAG

CTGG

1087-1088

cg44928274

452

AGCAAGGGGA

G

gap

SILENT-

AGATCATCAGC

NONCODING

GGCA[G/gap]CA

GCGGCAGCCT

GCTGTCTTCAG

GT

1089-1090

cg44928732

299

GTGTACCAGC

T

G

SILENT-

GCCTGATCCG

NONCODING

GGACA[T/G]TC

CCTGCCGCAC

GGTCACGCCT

GAC

1091-1092

cg44932136

964

ACAGACGCGC

gap

C

SILENT-

ACACACACGC

NONCODING

GCACA[gap/C]G

ACGCACACACA

GACGCACACA

CGC

1093-1094

cg44932136

976

ACACACGCGC

gap

A

SILENT-

ACAGACGCACA

NONCODING

CACA[gap/A]GA

CGCACACACG

CACAGACACAC

AC

1095-1096

cg44932184

644

TGGGCACGAG

G

gap

SILENT-

CGTGGCTTCG

NONCODING

GCGGA[G/gap]C

AGGATGAACTG

TCTCAGAGACT

GG

1097-1098

cg44938456

2706

TGACCTTGACC

C

T

SILENT-

2

AGTTTGATCAG

NONCODING

TTA[C/T]TGCCC

ACGCTGGAGA

AGGCAGCACA

1099-1100

cg44938456

2729

TACTGCCCACG

A

G

SILENT-

2

CTGGAGAAGG

NONCODING

CAGC[A/G]CAG

TTGCCAGGCTT

ATGTGAGACAG

1101-1102

cg44938456

2738

CGCTGGAGAA

A

G

SILENT-

2

GGCAGCACAG

NONCODING

TTGCC[A/G]GG

CTTATGTGAGA

CAGACAGGAT

GG

1103-1104

cg44938456

2744

AGAAGGCAGC

A

G

SILENT-

2

ACAGTTGCCAG

NONCODING

GCTT[A/G]TGTG

AGACAGACAG

GATGGATGGT

1105-1106

cg44938456

2775

GACAGACAGG

A

C

SILENT-

2

ATGGATGGTGC

NONCODING

GGTC[A/C]CCA

GTGTAACCATC

AAATCGGAGAT

1107-1108

cg44938456

2792

GTGCGGTCAC

A

G

SILENT-

2

CAGTGTAACCA

NONCODING

TCAA[A/G]TCG

GAGATCCTGCC

AGCTTCACTTC

1109-1110

cg44938456

2807

TAACCATCAAA

A

C

SILENT-

2

TCGGAGATCCT

NONCODING

GCC[A/C]GCTT

CACTTCAGTCC

GCCACTGCCA

1111-1112

cg5633308

186

ACAAGGGGAA

A

T

SILENT-

CATCAACACAA

NONCODING

GAGA[A/T]GCC

TACAAAGGGC

GAGCGTGCAA

GC

1113-1114

cg32160481

124

GAGATGGGGG

C

G

Thr

Ser

CONSERVA-

MHC

Human Gene Similar to SWISSPROT-ID: P30508

5.60E−37

TCATGGCGCC

(1193)

(1194)

TIVE

HLA CLASS 1 HISTOCOMPATIBILITY

CCGAA[C/G]CC

ANTIGEN, CW*1201 ALPHA CHAIN

TCCTCCTGCTG

PRECURSOR (HLA-CX52) -

HOMO SAPIENS

CTCTCAGGGG

(HUMAN), 366 aa.

CC

1115-1116

cg32160481

164

TCTCAGGGGC

G

T

Glu

Asp

CONSERVA-

MHC

Human Gene Similar to SWISSPROT-ID: P30508

5.60E−37

CCTGGCCCTG

(1195)

(1196)

TIVE

HLA CLASS 1 HISTOCOMPATIBILITY

ACCGA[G/T]AC

ANTIGEN, CW*1201 ALPHA CHAIN

CTGGGCGGGT

PRECURSOR (HLA-CX52) -

HOMO SAPIENS

GAGTGCGGGG

(HUMAN), 366 aa.

TCG

1117-1118

cg27790564

65

AGCTTGAGCAG

A

G

Val

Ala

CONSERVA-

protease

Human Gene Similar to SWISSNEW-ID: P53616

9.70E−31

AGCCTTGACCT

(1197)

(1198)

TIVE

PROTEASOME COMPONENT SUN4 -

TCT[A/G]CACAC

SACCHAROMYCES CEREVISIAE

AAACTGTTAGT

(BAKER'S YEAST), 420 aa.|

GTCGGTGTT

pcls: SWISSPROT-ID: P53616

PROTEASOME COMPONENT SUN4 -

SACCHAROMYCES CEREVISIAE

(BAKER'S YEAST), 420 aa.

1119-1120

cg43967452

418

GCGTTACCGG

G

A

Asp

Asn

CONSERVA-

ribosomalprotrot

Human Gene Similar to SWISSPROT-ID: P32899

2.80E−49

15

CTGCAGCGGC

(1199)

(1200)

TIVE

PUTATIVE 40S RIBOSOMAL PROTEIN

GGGAG[G/A]AC

YHR148W -

SACCHAROMYCES CEREVISIAE

TACACGCGCTA

(BAKER'S YEAST), 183 aa.

CAACCAGCTGA

1121-1122

cg39548335

302

TGTATTCTCGT

T

C

Lys

Arg

CONSERVA-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

2.90E−48

TGTTACCGGCG

(1201)

(1202)

TIVE

FIED

P43618 HYPOTHETICAL 41.3 KD PROTEIN

GCC[T/C]TCCT

IN SAP155-YMR31 INTERGENIC REGION -

GGGAGTGCTC

Saccharomyces cerevisiae

(Baker's yeast), 361 aa.

ATTATTCATTT

1123-1124

cg38435145

132

TGCTGGAGAAT

C

T

Ala

Val

CONSERVA-

UNCLASSI-

Human Gene Similar to TREMBLNEW-ACC:

4.10E−46

TTGGCCACAAA

(1203)

(1204)

TIVE

FIED

BM74914 KIAA0891 PROTEIN -

HOMO SAPIENS

GAG[C/T]TGCC

(HUMAN), 1371 aa (fragment).

AAGATAGCTGG

GCCAGGAAGA

1125-1126

cg42894694

1008

TTTTGTATTTTT

T

C

Val

Ala

CONSERVA-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

8.30E−34

AGTAGAGACG

(1205)

(1206)

TIVE

FIED

P39194 !!!! ALU SUBFAMILY SQ WARNING

GGG[T/C]TTCA

ENTRY -

Homo sapiens

(Human), 593 aa.

CCATGTTGGCC

AGGCTGGTCT

1127-1128

cg27847601

281

TGAGGTGGGA

A

G

Lys

Arg

CONSERVA-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

1.00E−31

GGACTGCCTG

(1207)

(1208)

TIVE

FIED

P39191 !!!! ALU SUBFAMILY SB2 WARNING

AACCA[A/G]GG

ENTRY -

Homo sapiens

(Human), 603 aa.

AGGTGGAGGC

TGCAGTGAGC

CAA

1129-1130

cg43976973

1025

GTGAGCATCAT

C

G

Lys

Asn

NON-

ATPase_asso-

Human Gene Similar to TREMBLNEW-ID:

3.50E−33

GATGCTGCTGT

(1209)

(1210)

CONSERVA-

ciated

G263099 TAT BINDING PROTEIN 7, TBP-

CGG[C/G]TTCG

TIVE

7 = TRANSCRIPTIONAL ACTIVATOR -

GGGGGCAGCA

HOMO SAPIENS

, 458 aa.

CGTCCACCAGT

1131-1132

cg43925450

848

GCAACTGGTG

G

T

Arg

Leu

NON-

glycoprotein

Human Gene Similar to TREMBLNEW-ID:

3.80E−37

17

CTCCAGGCCG

(1211)

(1212)

CONSERVA-

E1249608 MEMBRANE-BOUND SMALL GTP-

AAACC[G/T]AG

TIVE

BINDING - LIKE PROTEIN -

ARABIDOPSIS

GTGTGGACCTC

THALIANA (MOUSE-EAR CRESS), 217 aa.

CAGGAGAACAA

C

1133-1134

cg43305492

342

CTCCAAAAACC

A

C

Met

Leu

NON-

immunoglob

Human Gene Similar to TREMBLNEW-ID:

7.40E−49

AGGTGGTCCTT

(1213)

(1214)

CONSERVA-

G2734101 IMMUNOGLOBULIN HEAVY CHAIN,

ACA[A/C]TGACC

TIVE

VD(5)J(4) LIKE GENE PRODUCT -

HOMO

AACATGGACCC

SAPIENS (HUMAN), 151 aa.

TGTGGACAC

1135-1136

cg34758710

463

GGGCATCATTG

C

T

Leu

Phe

NON-

MHC

Human Gene Similar to SPTREMBL-ID: Q30916

2.20E−39

CTGGCCTGGTT

(1215)

(1216)

CONSERVA-

MHC CLASS A -

PAN TROGLODYTES

GTC[C/T]TTGCA

TIVE

(CHIMPANZEE), 357 aa (fragment).

GCTGTAGTCAC

TGGAGCTGC

1137-1138

cg43923640

511

TCCGCTGAGG

T

C

Phe

Leu

NON-

oncogene

Human Gene Similar to SWISSPROT-ID: P24407

1.30E−34

1

AAATTCAAGCT

(1217)

(1218)

CONSERVA

RAS-RELATED PROTEIN RAB-8 (ONCOGENE

GGTG[T/C]TCCT

TIVE

C-MEL) -

HOMO SAPIENS

(HUMAN), AND

GGGGGAGCAA

CANIS FAMILIARIS

(DOG), 207 aa.

AGCGTTGGAAA

1139-1140

cg27790564

15

GGTACCACAGA

C

T

Ala

Thr

NON-

protease

Human Gene Similar to SWISSNEW-ID: P53616

9.70E−31

TAG[C/T]AATGG

(1219)

(1220)

CONSERVA-

PROTEASOME COMPONENT SUN4 -

AGCCGGAGAC

TIVE

SACCHAROMYCES CEREVISIAE

(BAKER'S

CTTGTTAACA

YEAST), 420 aa.|pcls: SWISSPROT-ID: P53616

PROTEASOME COMPONENT SUN4 -

SACCHAROMYCES CEREVISIAE

(BAKER'S

YEAST), 420 aa.

1141-1142

cg39517733

241

CCCTCATCAAA

G

A

Pro

Ser

NON-

struct

Human Gene Similar to SPTREMBL-

3.90E−36

GATGGGGGCT

(1221)

(1222)

CONSERVA-

ID: Q14425 GASTRIC MUCIN -

HOMO SAPIENS

GTTG[G/A]TGG

TIVE

(HUMAN), 850 aa (fragment).

GCACTTGGGG

TAGCAGCCTTC

C

1144-1143

cg39517733

88

CCAGCATCAT

T

C

Ser

Gly

NON-

struct

Human Gene Similar to SPTREMBL-ID: Q14425

3.9E−36

GGGTACAGTT

(1223)

(1224)

CONSERVA-

GASTRIC MUCIN -

HOMO SAPIENS

(HUMAN),

CACAC[T/C]A

TIVE

850 aa (fragment).

CTCTCCGTAC

AAACGCAGG

AATAA

1145-1146

cg39565684

240

TTCTGGGATAA

C

T

Arg

Gln

NON-

synthase

Human Gene Similar to SWISSNEW-ID: P53167

1.40E−33

GGTATTGGTAC

(1225)

(1226)

CONSERVA-

PSEUDOURIDYLATE SYNTHASE 2

CAT[C/T]GGGA

TIVE

(EC 4.2.1.70) (PSEUDOURIDINE SYNTHASE 2) -

ATCGCACGCG

SACCHAROMYCES CEREVISIAE

(BAKER'S

GATCTAGCATT

YEAST), 370 aa.″pcls: SWISSPROT-ID: P53167

PSEUDOURIDYLATE SYNTHASE 2

(EC 4.2.1.70) (PSEUDOURIDINE SYNTHASE 2) -

SACCHAROMYCES CEREVISIAE

(BAKER'S

YEAST), 370 aa.|pcls: SPTREMBL-ID: Q06713

PSEUDOURIDINE SYNTHASE 2 -

SACCHAROMYCES CEREVISIAE

(BAKER'S

YEAST), 370 aa.

1147-1148

cg32152874

143

CAAGGTGTACG

A

C

Met

Leu

NON-

transcriptfactor

Human Gene Similar to SPTREMBL-ID: Q24140

1.50E−39

TGTCCATGCCG

(1227)

(1228)

CONSERVA-

NEURON SPECIFIC ZINC FINGER TRANSCRIP-

GCC[A/C]TGGC

TIVE

TION FACTOR -

DROSOPHILA

CATGCACCTGC

MELANOGASTER (FRUIT FLY), 664 aa.

TCACGCACGA

1149-1150

cg43025141

686

CTACGCCAAGC

G

A

Ala

Thr

NON-

transferase

Human Gene Similar to SPTREMBL-ID: O08832

1.10E−49

GCAACGCCCT

(1229)

(1230)

CONSERVA-

POLYPEPTIDE GALNAC TRANSFERASE-T4 -

GCGC[G/A]CCG

TIVE

MUS MUSCULUS

(MOUSE), 578 aa.

CCGAGGTGTG

GATGGATGACT

T

1151-1152

cg44028935

80

GGCAGGATGA

T

A

Leu

Gln

NON-

ubiquitin

Human Gene Similar to SWISSPROT-ID: P52491

3.00E−36

TCAAGCTGTTC

(1231)

(1232)

CONSERVA-

UBIQUITIN-CONJUGATING ENZYME E2-21.2

TCGC[T/A]GAA

TIVE

KD (EC 6.3.2.19) (UBIQUITIN-PROTEIN

GCAGCAGAAG

LIGASE) (UBIQUITIN CARRIER PROTEIN) -

AAGGAGGAGG

SACCHAROMYCES CEREVISIAE

(BAKER'S

AG

YEAST), 188 aa.

1153-1154

cg44028935

82

CAGGATGATCA

A

C

Lys

Gln

NON-

ubiquitin

Human Gene Similar to SWISSPROT-ID: P52491

3.00E−36

8

AGCTGTTCTCG

(1233)

(1234)

CONSERVA-

UBIQUITIN-CONJUGATING ENZYME E2-21.2

CTG[A/C]AGCA

TIVE

KD (EC 6.3.2.19) (UBIQUITIN-PROTEIN

GCAGAAGAAG

LIGASE) (UBIQUITIN CARRIER PROTEIN) -

GAGGAGGAGT

SACCHAROMYCES CEREVISIAE

(BAKER'S

C

YEAST), 188 aa.

1155-1156

cg44028935

83

AGGATGATCAA

A

T

Lys

Met

NON-

ubiquitin

Human Gene Similar to SWISSPROT-ID: P52491

3.00E−36

8

GCTGTTCTCGC

(1235)

(1236)

CONSERVA-

UBIQUITIN-CONJUGATING ENZYME E2-21.2

TGA[A/T]GCAG

TIVE

KD (EC 6.3.2.19) (UBIQUITIN-PROTEIN

CAGAAGAAGG

LIGASE) (UBIQUITIN CARRIER PROTEIN) -

AGGAGGAGTC

SACCHAROMYCES CEREVISIAE

(BAKER'S

YEAST), 188 aa.

1157-1158

cg44028935

86

ATGATCAAGCT

A

G

Gln

Arg

NON-

ubiquitin

Human Gene Similar to SWISSPROT-ID: P52491

3.00E−36

8

GTTCTCGCTGA

(1237)

(1238)

CONSERVA-

UBIQUITIN-CONJUGATING ENZYME E2-21.2

AGC[A/G]GCAG

TIVE

KD (EC 6.3.2.19) (UBIQUITIN-PROTEIN

AAGAAGGAGG

LIGASE) (UBIQUITIN CARRIER PROTEIN) -

AGGAGTCGGC

SACCHAROMYCES CEREVISIAE

(BAKER'S

YEAST), 188 aa.

1159-1160

cg39548335

228

ATCAGTACCTC

G

A

Gln

End

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P43618

2.90E−48

ATCCTTGAGAC

(1239)

(1240)

CONSERVA-

FIED

HYPOTHETICAL 41.3 KD PROTEIN IN SAP155-

GTT[G/A]TTCTT

TIVE

YMR31 INTERGENIC REGION -

Saccharomyces

CAAGGGCTCTC

cerevisiae

(Baker's yeast), 361 aa.

TCCCGAAAG

1161-1162

cg39548335

48

TCATCTATTAG

T

C

Thr

Ala

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P43618

2.90E−48

ATTCCGTGCTT

(1241)

(1242)

CONSERVA-

FIED

HYPOTHETICAL 41.3 KD PROTEIN IN SAP155-

GAG[T/C]TTTAT

TIVE

YMR31 INTERGENIC REGION -

Saccharomyces

TAGTAGTTGTA

cerevisiae

(Baker's yeast), 361 aa.

TCGTTGCCT

1163-1164

cg39510144

109

ATTGGGTTGTA

C

T

Pro

Leu

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

2.00E−47

TGCAGAACTCT

(1243)

(1244)

CONSERVA-

FIED

Q12284 ERV2 PROTEIN PRECURSOR -

ATC[C/T]ATGCG

TIVE

Saccharomyces cerevisiae

(Baker's yeast), 196 aa.

GGGAATGTTCA

TATCACTTT

1165-1166

cg39565075

28

CGGAGAAGAG

T

G

Phe

Val

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P36121

1.40E−46

GAGTCACTGCA

(1245)

(1246)

CONSERVA-

FIED

HYPOTHETICAL 32.1 KD PROTEIN IN DBP7-

CGTG[T/G]TTCA

TIVE

GCN3 INTERGENIC REGION -

Saccharomyces

GTATGCTAATA

cerevisiae (Baker's yeast), 282 aa.

GACCAAGGCT

1167-1168

cg21427396

193

TCAAACCCATG

C

T

Arg

Cys

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P30870

7.60E−44

GTCGAGAGTG

(1247)

(1248)

CONSERVA-

FIED

GLUTAMATE-AMMONIA-LIGASE ADENYLYL-

GGAG[C/T]GTT

TIVE

TRANSFERASE (EC 2.7.7.42) (GLUTAMINE-

ATGCCATGGTT

SYNTHETASE ADENYLYLTRANSFERASE)

AAAGCCCGTGT

(ATASE) -

Escherichia coli

, 946 aa.

1169-1170

cg34394308

431

GTTGATGCTGC

G

A

Gly

Asp

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC: P50442

1.50E−42

TCAGCCGGCG

(1249)

(1250)

CONSERVA-

FIED

GLYCINE AMIDINOTRANSFERASE PRE-

CACG[G/A]CAC

TIVE

CURSOR (EC 2.1.4.1) (L-ARGININE:GLYCINE

GCCTCCAGAAT

AMIDINOTRANSFERASE) (TRANSAMIDINASE)

TGTGCTACTCG

(AT) -

Rattus norvegicus

(Rat), 423 aa.

1171-1172

cg43129081

415

CCTCGATCTCC

G

A

Ala

Thr

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.70E−41

TGACCTCGTGA

(1251)

(1252)

CONSERVA-

FIED

P39189 !!!! ALU SUBFAMILY SB WARNING

TCC[G/A]CCCA

TIVE

ENTRY -

Homo sapiens

(Human), 587 aa.

CCTTGGCCTCC

CAAAGTGCTG

1173-1174

cg29693502

197

CTGATCGACGT

T

G

Asn

Thr

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

3.00E−39

ATTCCAACAGC

(1253)

(1254)

CONSERVA-

FIED

Q10379 PROBABLE GLUTAMATE-AMMONIA-

TCA]T/G]TCGTC

TIVE

LIGASE ADENYLYLTRANSFERASE

AACTCCCGGTC

(EC 2.7.7.42) (GLUTAMINE-SYNTHETASE

GCCGGCACC

ADENYLYLTRANSFERASE) (ATASE) -

Mycobacterium

1175-1176

cg27850036

136

TAGGTGTCTAG

C

T

Gly

Ser

NON-

UNCLASSI-

Human Gene Similar to SWISSNEW-ACC: P11653

1.00E−38

CCAGTCCATGT

(1255)

(1256)

CONSERVA-

FIED

METHYLMALONYL COA MUTASE ALPHA-

CAC[C/T]GTAGA

TIVE

SUBUNIT (EC 5.4.99.2) (MCM-ALPHA) -

CGTCCTCGGA

Propionibacterium freudenreichii shermanii

, 727 aa.

GTAGAGGTGA

1177-1178

cg43936560

1406

TTGGGCAGCTT

G

C

Ala

Pro

NON-

UNCLASSI-

Human Gene Similar to SPTREMBL-ACC: Q69566

4.70E−36

ATCTGTGTGCC

(1257)

(1258)

CONSERVA-

FIED

(HHV-6) U1102, VARIANT A DNA, COMPLETE

CAG[G/C]CGGC

TIVE

VIRION GENOME - HUMAN HERPESVIRUS-6,

ATATCTGTGCA

413 aa.

TGTGCGTGTG

1179-1180

cg42538578

142

GCAGCACTTG

C

T

Cys

Tyr

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

7.40E−34

8

GCCTTCCCTCT

(1259)

(1260)

CONSERVA-

FIED

Q09753 BETA-DEFENSIN 1 PRECURSOR

GTAA[C/T]AGGT

TIVE

(HBD-1) (DEFENSIN, BETA 1) -

Homo sapiens

GCCTTGAATTT

(Human), 68 aa.

TGGTAAAGAT

1181-1182

cg42475469

376

CTGACCTCAAG

C

T

Glu

Lys

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

5.70E−32

TGATCCACCTG

(1261)

(1262)

CONSERVA-

FIED

P39194 !!!! ALU SUBFAMILY SQ WARNING

CCT[C/T]AGCCT

TIVE

ENTRY -

Homo sapiens

(Human), 593 aa.

CCCAAAGTGCT

GGGATTACA

1183-1184

cg27847601

280

CTGAGGTGGG

A

G

Lys

Glu

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

1.00E−31

AGGACTGCCT

(1263)

(1264)

CONSERVA-

FIED

P39191 !!!! ALU SUBFAMILY SB2 WARNING

GAACC[A/G]AG

TIVE

ENTRY -

Homo sapiens

(Human), 603 aa.

GAGGTGGAGG

CTGCAGTGAG

CCA

1185-1186

cg27847601

298

CTGAACCAAGG

G

A

Val

Met

NON-

UNCLASSI-

Human Gene Similar to SWISSPROT-ACC:

1.00E−31

AGGTGGAGGC

(1265)

(1266)

CONSERVA-

FIED

P39191 !!!! ALU SUBFAMILY SB2 WARNING

TGCA[G/A]TGA

TIVE

ENTRY -

Homo sapiens

(Human), 603 aa.

GCCAAGATCAC

AGCACTATGCT

1187-1188

cg44931270

1970

CGGCGCCGGG

C

gap

Gln

His

FRAMESHIFT

collagen

Human Gene Similar to SPTREMBL-ID: Q28396

2.10E−37

AGGCTGTGGG

(1267)

(1268)

TYPE II COLLAGEN -

EQUUS CABALLUS

TCTGG[C/gap]T

(HORSE), 1418 aa.

GCGCACGGTC

TCGGTCAGCA

GAGC

1189-1190

cg43925450

333

GATCTGTCAAT

gap

T

Leu

Ser

FRAMESHIFT

glycoprotein

Human Gene Similar to TREMBLNEW-ID:

3.80E−37

17

TTAAGCTGGTT

(1269)

(1270)

E1249608 MEMBRANE-BOUND SMALL GTP-

CTG[gap/T]CTG

BINDING - LIKE PROTEIN -

ARABIDOPSIS

GGGGAGTCTG

THALIANA

(MOUSE-EAR CRESS), 217 aa.

CGGTAGGCAA

AT

1191-1192

cg43941918

724

CAGCCACAGTT

gap

T

Val

Ser

FRAMESHIFT

immunoglob

Human Gene Similar to TREMBLNEW-ID:

2.80E−42

CGTTTGATCTC

(1271)

(1272)

G240581 IMMUNOGLOBULIN G2B VARIABLE

CAC[gap/T]CTT

REGION LIGHT CHAIN, AUTOANTOBODY

GGTCCCTTGG

BV04-01 VARIABLE REGION LIGHT CHAIN -

CCGAAAGTGC

MUSSP, 113 aa.

GC

SEQUENCE LISTING

The patent contains a lengthy “Sequence Listing” section. A copy of the “Sequence Listing” is available in electronic form from the USPTO

web site (http://seqdata.uspto.gov/sequence.html?DocID=06670464B1). An electronic copy of the “Sequence Listing” will also be available from the

USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

Number	Date	Country
717113	Jun 1996	EP
0 785 280	Jul 1997	EP
WO 9322456	Nov 1993	WO
WO 9511995	May 1995	WO
WO 9729212	Aug 1997	WO
WO 9814466	Apr 1998	WO
WO 9814470	Apr 1998	WO
WO 9818967	May 1998	WO
WO 9820165	May 1998	WO
WO 9821316	May 1998	WO
WO 9830717	Jul 1998	WO
WO 9838846	Sep 1998	WO
WO 9856954	Dec 1998	WO

Nucleic acids containing single nucleotide polymorphisms and methods of use thereof

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

RELATED APPLICATIONS

US Referenced Citations (1)

Foreign Referenced Citations (13)

Non-Patent Literature Citations (33)

Provisional Applications (1)