Methods and products related to genotyping and DNA analysis

FIELD OF THE INVENTION

The present invention relates to methods and products associated with genotyping. In particular, the invention relates to methods of detecting single nucleotide polymorphisms and reduced complexity genomes for use in genotyping methods as well as to various methods of genotyping, fingerprinting, and genomic analysis. The invention also relates to products and kits, such as panels of single nucleotide polymorphism allele specific oligonucleotides, reduced complexity genomes, and databases for use in the methods of the invention.

BACKGROUND OF THE INVENTION

Genomic DNA varies significantly from individual to individual, except in identical siblings. Many human diseases arise from genomic variations. The genetic diversity amongst humans and other life forms explains the heritable variations observed in disease susceptibility. Diseases arising from such genetic variations include Huntington's disease, cystic fibrosis, Duchenne muscular dystrophy, and certain forms of breast cancer. Each of these diseases is associated with a single gene mutation. Diseases such as multiple sclerosis, diabetes, Parkinson's, Alzheimer's disease, and hypertension are much more complex. These diseases may be due to polygenic (multiple gene influences) or multifactorial (multiple gene and environmental influences) causes. Many of the variations in the genome do not result in a disease trait. However, as described above, a single mutation can result in a disease trait. The ability to scan the human genome to identify the location of genes which underlie or are associated with the pathology of such diseases is an enormously powerful tool in medicine and human biology.

Several types of sequence variations, including insertions and deletions, differences in the number of repeated sequences, and single base pair differences result in genomic diversity. Single base pair differences, referred to as single nucleotide polymorphisms (SNPs) are the most frequent type of variation in the human genome (occurring at approximately 1 in 10

3

bases). A SNP is a genomic position at which at least two or more alternative nucleotide alleles occur at a relatively high frequency (greater than 1%) in a population. SNPs are well-suited for studying sequence variation because they are relatively stable (i.e., exhibit low mutation rates) and because single nucleotide variations can be responsible for inherited traits.

Polymorphisms identified using microsatellite-based analysis, for example, have been used for a variety of purposes. Use of genetic linkage strategies to identify the locations of single Mendelian factors has been successful in many cases (Benomar et al. (1995),

Nat. Genet

., 10:84-8; Blanton et al. (1991),

Genomics

, 11:857-69). Identification of chromosomal locations of tumor suppressor genes has generally been accomplished by studying loss of heterozygosity in human tumors (Cavenee et al. (1983),

Nature

, 305:779-784; Collins et al. (1996),

Proc. Natl. Acad Sci. USA

, 93:14771-14775; Koufos et al. (1984),

Nature

, 309:170-172; and Legius et al. (1993),

Nat. Genet

., 3:122-126). Additionally, use of genetic markers to infer the chromosomal locations of genes contributing to complex traits, such as type I diabetes (Davis et al. (1994),

Nature

, 371:130-136; Todd et al. (1995),

Proc. Natl. Acad. Sci. USA

, 92:8560-8565), has become a focus of research in human genetics.

Although substantial progress has been made in identifying the genetic basis of many human diseases, current methodologies used to develop this information are limited by prohibitive costs and the extensive amount of work required to obtain genotype information from large sample populations. These limitations make identification of complex gene mutations contributing to disorders such as diabetes extremely difficult. Techniques for scanning the human genome to identify the locations of genes involved in disease processes began in the early 1980s with the use of restriction fragment length polymorphism (RFLP) analysis (Botstein et al. (1980),

Am. J. Hum. Genet

., 32:314-31; Nakamura et al. (1987),

Science

, 235:1616-22). RFLP analysis involves southern blotting and other techniques. Southern blotting is both expensive and time-consuming when performed on large numbers of samples, such as those required to identify a complex genotype associated with a particular phenotype. Some of these problems were avoided with the development of polymerase chain reaction (PCR) based microsatellite marker analysis. Microsatellite markers are simple sequence length polymorphisms (SSLPs) consisting of di-, tri-, and tetra-nucleotide repeats.

Other types of genomic analysis are based on use of markers which hybridize with hypervariable regions of DNA having multiallelic variation and high heterozygosity. The variable regions which are useful for fingerprinting genomic DNA are tandem repeats of a short sequence referred to as a mini satellite. Polymorphism is due to allelic differences in the number of repeats, which can arise as a result of mitotic or meiotic unequal exchanges or by DNA slippage during replication.

The most commonly used method for genotyping involves Weber markers, which are abundant interspersed repetitive DNA sequences, generally of the form (dC-dA)

n

(dG-dT)

n

. Weber markers exhibit length polymorphisms and are therefore useful for identifying individuals in paternity and forensic testing, as well as for mapping genes involved in genetic diseases. In the Weber method of genotyping, generally 400 Weber or microsatellite markers are used to scan each genome using PCR. Using these methods, if 5,000 individual genomes are scanned, 2 million PCR reactions are performed (5,000 genomes×400 markers). The number of PCR reactions may be reduced by multiplexing, in which, for instance, four different sets of primer are reacted simultaneously in a single PCR, thus reducing the total number of PCRs for the example provided to 500,000. The 500,000 PCR mixtures are separated by polyacrylamide gel electrophoresis (PAGE). If the samples are run on a 96-lane gel, 5,200 gels must be run to analyze all 500,000 PCR reaction mixtures. PCR products can be identified by their position on the gels, and the differences in length of the products can be determined by analyzing the gels. One problem with this type of analysis is that “stuttering” tends to occur, causing a smeared result and making the data difficult to interpret and score.

More recent advances in genotyping are based on automated technologies utilizing DNA chips, such as the Affymetrix HuSNP Chip™ analysis system. The HuSNP Chip™ is a disposable array of DNA molecules on a chip (400,000 per half inch square slide). The single stranded DNA molecules bound to the slide are present in an ordered array of molecules having known sequences, some of which are complementary to one allele of a SNP-containing portion of a genome. If the same 5,000 individual genome study described above is performed using the Affymetrix HuSNP Chip™ analysis system, approximately 5,000 gene chips having 1,000 or more SNPs per chip would be required. Prior to the chip scan, the genomic DNA samples would be amplified by PCR in a similar manner to conventional microsatellite genotyping. The gene chip method is also expensive and time-intensive.

SUMMARY OF THE INVENTION

The present invention relates to methods and products for identifying points of genetic diversity in genomes of a broad spectrum of species. In particular, the invention relates to a high throughput method of genotyping of SNPs in a genome (e.g. a human genome) using reduced complexity genomes (RCGs) and, in some exemplary embodiments, using SNP allele specific oligonucleotides (SNP-ASO) and specific hybridization reactions performed, for example, on a surface. The method of genotyping, in some aspects of the invention, is accomplished by scanning a RCG for the presence or absence of a SNP allele. Using this method, tens of thousands of genomes from one species may be simultaneously assayed for the presence or absence of each allele of a SNP. The methods can be automated, and the results can be recorded using a microarray scanner or other detection/recordation devices.

The invention encompasses several improvements over prior art methods. For instance, a genome-wide scan of thousands of individuals can be carried out at a fraction of the cost and time required by many prior art genotyping methods.

The invention, in one aspect, is a method for detecting the presence of a SNP allele in a genomic sample. The method, in one aspect, includes preparing a RCG from a genomic sample and analyzing the RCG for the presence of the SNP allele. In some aspects, the analysis is performed using a hybridization reaction involving a SNP allele specific oligonucleotide (SNP-ASO) which is complementary to a given allele of the SNP and the RCG. If the allele of the SNP is present in the genomic sample, then the SNP-ASO hybridizes with the RCG.

In some aspects, the method is a method for determining a genotype of a genome, whereby the genotype is identified by the presence or absence of alleles of the SNP in the RCG. In other aspects, the method is a method for characterizing a tumor, wherein the RCG is isolated from a genome obtained from a tumor of a subject and wherein the tumor is characterized by the presence or absence of an allele of the SNP in the RCG.

In other aspects, the method is a method for determining allelic frequency for a SNP, and further comprises determining the number of arbitrarily selected genomes from a population which include each allele of the SNP in order to determine the allelic frequency of the SNP in the population.

In some embodiments, the hybridization reaction is performed on a surface and the RCG or the SNP-ASO is immobilized on the surface. In yet other embodiments, the SNP-ASO is hybridized with a plurality of RCGs in individual reactions.

In other aspects, the method includes performing a hybridization reaction involving a RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization with a plurality of RCGs from the plurality of genomes, and determining the genotype based on whether the SNP-ASO hybridizes with at least some of the RCGs.

The RCG may be a PCR-derived RCG or a native RCG. In some embodiments, the RCG is prepared by performing degenerate oligonucleotide priming-PCR (DOP-PCR) using a degenerate oligonucleotide primer having a tag-(N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotides and wherein x is an integer from 0 to 9, and wherein N is any nucleotide. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9). Preferably, the method of genotyping is performed to determine genotypes more than one locus. In other embodiments, the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.

The methods can be performed on a support. Preferably, the support is a solid support such as a glass slide, a membrane such as a nitrocellulose membrane, etc.

In yet other embodiments, the RCG is prepared by interspersed repeat sequence-PCR (IRS-PCR), arbitrarily primed-PCR (AP-PCR), adapter-PCR, or multiple primed DOP-PCR.

In a preferred embodiment, the methods are useful for determining a genotype associated with or linked to a specific phenotype, and the distinct isolated genomes or RCGs are associated with a common phenotype.

The SNP-ASO used according to the methods of the invention are polynucleotides including one allele of two possible nucleotides at the polymorphic site. In one embodiment, the SNP-ASO is composed of from about 10 to 50 nucleotides. In a preferred embodiment, the SNP-ASO is composed of from about 10 to 25 nucleotides.

According to one embodiment, the SNP-ASO is labeled. The methods can, optionally, also include addition of an excess of non-labeled SNP-ASO in which the polymorphic nucleotide residue corresponds to a different allele of the SNP and which is added during the hybridization step. Additionally, a parallel reaction may be performed wherein the labeling of the two SNP-ASOs is reversed. The label on the SNP-ASO in one embodiment is a radioactive isotope. In this embodiment, the labeled hybridized products on the surface may be exposed to an X-ray film to produce a signal on the film which corresponds to the radioactively labeled hybridization products. In another embodiment, the SNP-ASO is labeled with a fluorescent molecule. In this embodiment, the labeled hybridized products on the surface may be exposed to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products.

According to one embodiment, the RCG is labeled. The label on the RCG in one embodiment is a radioactive isotope. In this embodiment, the labeled hybridized products on the surface may be exposed to an X-ray film to produce a signal on the film which corresponds to the radioactively labeled hybridization products. In another embodiment, the RCG is labeled with a fluorescent molecule. In this embodiment, the labeled hybridized products on the surface may be exposed to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products.

In one embodiment, a plurality of different SNP-ASOs are attached to the surface. In another embodiment, the plurality includes at least 500 different SNP-ASOs. In yet another embodiment, the plurality includes at least 1000.

In another embodiment, a plurality of SNP-ASOs are labeled with fluorescent molecules, each SNP-ASO being labeled with a spectrally distinct fluorescent molecule. In various embodiments, the number of spectrally distinct fluorescent molecules is two, three, four, five, six, seven, or eight.

In yet another embodiment, the plurality of RCGs are labeled with fluorescent molecules, each RCG being labeled with a spectrally distinct fluorescent molecule. All of the RCGs having a spectrally distinct fluorescent molecule can be hybridized with a single support. In various embodiments the number of spectrally distinct fluorescent molecules is two, three, four, five, six, seven, or eight.

According to other aspects, the invention encompasses methods for characterizing a tumor by assessing the loss of heterozygosity, determining allelic frequency for a SNP, generating a genomic pattern for an individual genome, and generating a genomic classification code for a genome.

In one aspect, the method for characterizing a tumor includes isolating genomic DNA from tumor samples obtained from a plurality of subjects, preparing a plurality of RCGs from the genomic DNA, performing a hybridization reaction involving a SNP-ASO and the plurality of RCGs (e.g. immobilized on a surface), and identifying the presence of a SNP allele in the genomic DNA based on whether the SNP-ASO hybridizes with at least some of the RCGs in order to characterize the tumor. One or more of the RCGs or one or more of the SNP-ASOs can be immobilized on a surface.

In another aspect, the invention is a method generating a genomic pattern for an individual genome. The method, in one aspect, includes preparing a plurality of RCGs, analyzing the RCGs for the presence of one or more SNP alleles, and identifying a genomic pattern of SNPs for each RCG by determining the presence or absence therein of SNP alleles. In some embodiments, the analysis involves performing a hybridization reaction involving a panel of SNP-ASOs (e.g. ones which are each complementary to one allele of a SNP), and the plurality of RCGs. The genomic pattern can be identified by determining the presence or absence of a SNP allele for each RCG by detecting whether the SNP-ASOs hybridize with the RCGs. In one embodiment, a plurality of SNP-ASOs are hybridized with the support, and each SNP-ASO of the panel is hybridized with a different support than the other SNP-ASO.

In some embodiments, the genomic pattern is a genomic classification code which is generated from the pattern of SNP alleles for each RCG. In other embodiments, the genomic classification code is also generated from the allelic frequency of the SNPs. In yet other embodiments, the genomic pattern is a visual pattern. The genomic pattern may be in physical or electronic form.

In another aspect, the invention includes is a method for generating a genomic pattern for an individual genome. The method includes identifying a genomic pattern of SNP alleles for each RCG by determining the presence or absence therein of selected SNP alleles.

A method for generating a genomic classification code for a genome is provided in another aspect of the invention. The method includes preparing a RCG, analyzing the RCG for the presence of one or more SNP alleles (e.g. ones of known allelic frequency), identifying a genomic pattern of SNP alleles for the RCG by determining the presence or absence therein of SNP alleles, and generating a genomic classification code for the RCG based on the presence or absence (and, optionally, the allelic frequency) of the SNP alleles. In some embodiments, the analysis involves performing a hybridization reaction involving the RCG and a panel of SNP-ASOs (e.g. corresponding to SNP alleles of known allelic frequency), each of which is complementary to one allele of a SNP. The genomic pattern is identified based on whether each SNP-ASO hybridizes with the RCG.

The method for determining allelic frequency for a SNP, in another aspect, includes preparing a plurality of RCGs from distinct isolated genomes, performing a hybridization reaction involving one RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization with each of the plurality of RCGs, and determining the number of RCGs which include each allele of the SNP in order to determine the allelic frequency of the SNP. In other embodiments the RCGs are immobilized on the surface.

In another aspect, the method for generating a genomic pattern for an individual genome includes preparing a plurality of RCGs, performing a hybridization reaction involving a RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization step with each of the plurality of RCGs, and identifying a genomic pattern of SNPs for each RCG by determining the presence therein of SNPs based on whether each SNP-ASO hybridizes with each RCG.

The method for generating a genomic classification code for a genome, in another aspect, includes preparing a RCG, performing a hybridization reaction involving the RCG and a panel of SNP-ASOs (e.g. immobilized on a surface), identifying a genomic pattern of SNPs for the RCG by determining the presence therein of SNPs based on whether each SNP-ASO hybridizes with the RCG, and generating a genomic classification code for the RCG based on the identities of the SNPs which hybridize with the RCG, the identities of the SNPs which do not hybridize with the RCG, and, optionally, also based on the allelic frequency of the SNPs.

In one embodiment, each SNP-ASO of the panel is immobilized on a separate surface. In another embodiment, more than one SNP-ASO of the panel is being immobilized on the same surface, each SNP-ASO being immobilized on a distinct area of the surface.

In an embodiment, the genomic classification code is encoded as one or more computer-readable signals on a computer-readable medium.

In other aspects of the invention, compositions are provided. According to one aspect, the composition is a plurality of RCGs immobilized on a surface, wherein the RCGs are prepared by a method including the step of performing DOP-PCR using a DOP primer having a tag-(N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 5 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8 or 9).

According to another aspect, the composition is a panel of SNP-ASOs immobilized on a surface, wherein the SNPs are identified by a method including preparing a set of primers from a RCG, performing PCR using the set of primers on a plurality of isolated genomes to yield DNA products, isolating and, optionally, sequencing the DNA products, and identifying a SNP based on the sequences of the PCR products. In one embodiment, the plurality of isolated genomes includes at least four isolated genomes.

According to another aspect of the invention, a kit is provided. The kit includes a container housing a set of PCR primers for reducing the complexity of a genome, and a container housing a set of SNP-ASOs. The SNPs which correspond to the SNP-ASOs of the kit are preferably present within a RCG made using the PCR primers of the kit with a frequency of at least 50%.

In one embodiment, the set of PCR primers are primers for DOP-PCR. Preferably, the degenerate oligonucleotide primer has a tag-(N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 nucleotide residues wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g., 6, 7, 8 or 9).

In yet other embodiments, the RCG is prepared by IRS-PCR, AP-PCR, or adapter-PCR.

The SNP-ASOs of the invention are polynucleotides including one of the alternative nucleotides at a polymorphic nucleotide residue of a SNP. In one embodiment, the SNP-ASO is composed of from about 10 to 50 nucleotide residues. In a preferred embodiment the SNP-ASO is composed of from about 10 to 25 nucleotide residues. In another embodiment, the SNP-ASOs are labeled with a fluorescent molecule.

According to yet another aspect of the invention, a composition is provided. The composition includes a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment including a tag (N)

x

-TARGET nucleotide, wherein the TARGET nucleotide sequence is identical in all of the DNA fragments of each RCG, wherein the TARGET nucleotidesequence includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).

In one aspect, the invention is a method for identifying a SNP. The method includes preparing a set of primers from a RCG, wherein the RCG is composed of a first set of PCR products, PCR-amplifying a plurality of isolated genomes using the set of primers to yield a second set of PCR products, isolating, and optionally, sequencing the PCR products, and identifying a SNP based on the sequences of one or both sets of PCR products. In one embodiment, the plurality of isolated genomes is a pool of genomes. Preferably, the isolated genomes are RCGs. RCGs can be prepared in a variety of ways, but it is preferred, in some aspects, that the RCG is prepared by DOP-PCR.

In one embodiment, the method of preparing the set of primers is performed by at least: preparing a RCG, separating the first set of PCR products into individual PCR products, determining the nucleotide sequence of each end of at least one of the PCR products, and generating primers for use in the subsequent PCR step based on the sequence of the ends of the PCR product(s).

The set of PCR products may be separated by any means known in the art for separating polynucleotides. In a preferred embodiment, the set of PCR products is separated by gel electrophoresis. Preferably, one or more libraries are prepared from segments of the gel containing several PCR products and clones are isolated from the library, each clone including a PCR product from the library. In other embodiments, the set of PCR products is separated by high pressure liquid chromatography or column chromatography.

The RCG used to generate primers or PCR products for identifying SNPs can be prepared by PCR methods. Preferably, the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3-9 (e.g. 6, 7, 8, or 9). In other embodiments, the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.

In yet other embodiments, the RCG is prepared by IRS-PCR, AP-PCR, or adapter-PCR.

In a preferred embodiment of the invention, the set of primers is composed of a plurality of polynucleotides, each polynucleotide including a tag (N)

x

-TARGET nucleotide sequence, wherein TARGET is the same sequence in each polynucleotide in the set of primers. The sequence of (N)

x

is different in each primer within a set of primers. In some embodiments, the set of primers includes at least 4

3

, 4

4

, 4

5

, 4

6

, 4

7

, 4

8

, or 4

9

different primers in the set.

In another aspect, the invention is a method for generating a RCG using DOP-PCR. The method includes the step of performing degenerate DOP-PCR using a degenerate oligonucleotide primer having an (N)

x

-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).

According to one embodiment, the tag includes 6 nucleotide residues. Preferably the RCG is used in a genotyping procedure. In other embodiments, the RCG is analyzed to detect a polymorphism. The analysis step may be performed using mass spectroscopy.

In another aspect the invention is a method for assessing whether a subject is at risk for developing a disease. The method includes the steps of using the methods of the invention identify a plurality of SNPs that occur in at least, for example 10% of genomes obtained from individuals afflicted with the disease and determining whether one or more of those SNPs occurs in the subject. In the method the affected individuals are compared with the unaffected individuals. Important information can be generated from the observation that there is a difference between affected and unaffected individuals alone.

In other aspects the invention is a method for identifying a set of one or more SNPs associated with a disease or disease risk. The method includes the steps of preparing individual RCGs obtained from subjects afflicted with a disease, using the same set of primers to prepare each RCG, and comparing the SNP allele frequency identified in those RCGs with the same genetic SNP allele frequency in normal (i.e., non-afflicted) subjects to identify SNP associated with the disease. In other aspects the invention is a method for identifying a set of SNPs randomly distributed throughout the genome. The set of SNPs is used as a panel of genetic markers to perform a genome-wide scan for linkage analysis.

In an embodiment, a computer-readable medium having computer-readable signals stored thereon is provided. The signals define a data structure that one or more data components. Each data component includes a first data element defining a genomic classification code that identifies a corresponding genome. Each genomic classification code classifies the corresponding genome based one or more single nucleotide polymorphisms of the corresponding genome.

In an optional aspect of this embodiment, the genomic classification code is a unique identifier of the corresponding genome.

In an optional aspect of this embodiment, the genomic classification code is based on a pattern of the single nucleotide polymorphisms of the corresponding genome, where the pattern indicates the presence or absence of each single nucleotide polymorphism.

In another optional aspect of this embodiment, each data component also includes one or more data elements, each data element defining an attributes of the corresponding genome. Each of the embodiments of the invention can encompass various recitations made herein. It is, therefore, anticipated that each of the recitations of the invention involving any one element or combinations of elements can, optionally, be included in each aspect of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1

is a schematic flow chart depicting a method according to the invention for identifying SNPs.

FIG. 2

shows data depicting the process of identifying a SNP: (a) depicts a gel in which inter-Alu PCR genomic DNA products prepared from the 8C primer (which has the nucleotide sequence SEQ ID NO:3) were separated; (b) depicts a gel in which inserts from the library clones were separated; and (c) depicts a filter having two positive or matched clones.

FIG. 3

depicts the results of a genotyping and mapping experiment: (a) depicts hybridization results obtained using G allele ASO; (b) depicts hybridization results obtained using A allele ASO; (c) is a pedigree of CEPH family #884 with genotypes indicted from (a) and (b); and (d) is a map of chromosome 3q21-23.

FIG. 4

is a schematic flow chart depicting a method according to the invention for detecting SNPs.

FIG. 5

is a block diagram of a computer system for storing and manipulating genomic information.

FIG. 6A

is an example of a record for storing information about a genome and/or genes or SNPs within the genome.

FIG. 6B

is an example of a record for storing genomic information.

FIG. 7

is a flow chart of a method for determining whether genomic information of a sample genome such as SNPs match that of another genome.

FIG. 8

depicts results obtained from a hybridization reaction involving RCGs prepared by DOP-PCR and SNP-ASOs immobilized on a surface in a microarray format.

BRIEF DESCRIPTION OF THE SEQUENCES

SEQ. ID. NO. 1 is CAGNNNCTG

SEQ. ID. NO. 2 is TTTTTTTTTTCAG

SEQ. ID. NO. 3 is CTT GCA GTG AGC CGA GATC

SEQ. ID. NO. 4 is CTCGAGNNNNNNAAGCGATG

SEQ ID NO. 5-691 are nucleotide sequences containing SNPs.

DETAILED DESCRIPTION OF THE INVENTION

The invention relates in some aspects to genotyping methods involving detection of one or more single nucleotide polymorphisms (SNPs) in a reduced complexity genome (RCG) prepared from the genome of a subject. The invention includes methods of identifying SNPs associated with a disease or with pre-disposition to a disease. The invention further includes methods of screening RCGs prepared from one or more subjects in a population. Such screening can be used, for example, to determine whether the subject is afflicted with, or is likely to become afflicted with, a disorder, to determine allelic frequencies in the population, or to determine degrees of interrelation among subjects in the population. Additional aspects and details of the compositions, kits, and methods of the invention are described in the following sections.

The invention involves several discoveries which have led to new advances in the field of genotyping. The invention is based on the development of high throughput methods for analyzing genomic diversity. The methods combine use of SNPs, methods for reducing the complexity of genomes, and high throughput screening methods. As discussed in the background of the invention, many prior art methods for genotyping are based on use of hypervariable markers such as Weber markers, which predominantly detect differences in numbers of repeats. Use of a high throughput SNP analysis method is advantageous in view of the Weber marker system for several reasons. For instance, the results of a Weber analysis system are displayed in the form of a gel, which is difficult to read and must be scored by a professional. The high throughput SNP analysis method of the invention provides a binary result which indicates the presence or absence of the SNP in the sample genome. Additionally, the method of the invention requires significantly less work and is considerably less expensive to perform. As described in the background of the invention, the Weber system requires the performance of 500,000 PCR reactions and use of 5,200 gels to analyze 5,000 genomes. The same study performed using the methods of the invention could be performed without using gels. Additionally, SNPs are not species-specific and therefore the methods of the invention can be performed on diverse species and are not limited to humans.

It is more tedious to perform inter-species analysis using Weber markers than using the methods of the invention.

Some prior art methods do use SNPs for genotyping but the high throughput method of the invention has advantages over these methods as well. Affymetrix utilizes a HuSNP Chip™ system having an ordered array of SNPs immobilized on a surface for analyzing nucleic acids. This system is, however, prohibitively expensive for performing large studies such as the 5,000 genome study described above.

The invention is useful for identifying polymorphisms within a genome. Another use for the invention involves identification of polymorphisms associated with a plurality of distinct genomes. The distinct genomes may be isolated from populations which are related by some phenotypic characteristic, familial origin, physical proximity, race, class, etc. In other cases, the genomes are selected at random from populations such that they have no relation to one another other than being selected from the same population. In one preferred embodiment, the method is performed to determine the genotype (e.g. SNP content) of subjects having a specific phenotypic characteristic, such as a genetic disease or other trait.

Other uses for the methods of the invention involve identification or characterization of a subject, such as in paternity and maternity testing, immigration and inheritance disputes, breeding tests in animals, zygosity testing in twins, tests for inbreeding in humans and animals, evaluation of transplant suitability, such as with bone marrow transplants, identification of human and animal remains, quality control of cultured cells, and forensic testing such as forensic analysis of semen samples, blood stains, and other biological materials. The methods of the invention may also be used to characterize the genetic makeup of a tumor by testing for loss of heterozygosity or to determine the allelic frequency of a particular SNP. Additionally, the methods may be used to generate a genomic classification code for a genome by identifying the presence or absence of each of a panel of SNPs in the genome and to determine the allelic frequency of the SNPs. Each of these uses is discussed in more detail herein.

The genotyping methods of the invention are based on use of RCGs that can be reproducibly produced. These RCGs are used to identify SNPs, and can be screened individually for the presence or absence of the SNP alleles.

The invention, in some aspects, is based on the finding that the complexity of the genome can be reduced using various PCR and other genome complexity reduction methods and that RCG's made using such methods can be scanned for the presence of SNPs. One problem with using SNP-ASOs to screen a whole genome (i.e. a genome, the complexity of which has not been reduced) is that the signal to noise (S/N) ratio is high due to the high complexity of the genome and relative frequency of occurrence of a particular SNP-specific sequence within the whole genome. When an entire genome of a complex organism is used as the target for allele-specific oligonucleotide hybridization, the target sequence (e.g. about 17 nucleotide residues) to be detected represents only e.g. approximately 10

8

-10

9

1 part in 10

8

of the DNA sample (e.g. for a NP-ASO about 17 nucleotides). It has been discovered, according to the invention, that the complexity of the genome can be reduced in a reproducible manner and that the resulting RCG is useful for identifying the presence of SNPs in the whole genome and for genotyping methods. Reduction in complexity allows genotyping of multiple SNPs following performance of a single PCR reaction, reducing the number of experimental manipulations that must be performed. The RCG is a reliable representation of a specific subfraction of the whole genome, and can be analyzed as though it were a genome of considerably lower complexity.

RCGs are prepared from isolated genomes. An “isolated genome” as used herein is genomic DNA that is isolated from a subject and may include the entire genomic DNA. For instance, an isolated genome may be a RCG, or it may be an entire genomic DNA sample. Genomic DNA is a population of DNA that comprises the entire genetic component of a species excluding, where applicable, mitochondrial and chloroplast DNA. Of course, the methods of the invention can be used to analyze mitochondrial, chloroplast, etc., DNA as well. Depending on the particular species of the subject, the genomic DNA can vary in complexity. For instance, species which are relatively low on the evolutionary scale, such as bacteria, can have genomic DNA which is significantly less complex than species higher on the evolutionary scale. Bacteria such as

E. coli

have approximately 2.4×10

9

grams per mole of haploid genome, and bacterial genomes having a size of less than about 5 million base pairs (5 megabases) are known. Genomes of intermediate complexity, such as those of plants, for instance, rice, have a genome size of approximately 700-1,000 megabases. Genomes of highest complexity, such as maize or humans, have a genome size of approximately 10

9

-10

11

. Humans have approximately 7.4×10

12

grams per mole of haploid genome.

A “subject” as used herein refers to any type of DNA-containing organism, and includes, for example, bacteria, viruses, fungi, animals, including vertebrates and invertebrates, and plants.

A “RCG” as used herein is a reproducible fraction of an isolated genome which is composed of a plurality of DNA fragments. The RCG can be composed of random or non-random segments or arbitrary or non-arbitrary segments. The term “reproducible fraction” refers to a portion of the genome which encompasses less than the entire native genome. If a reproducible fraction is produced twice or more using the same experimental conditions the fractions produced in each repetition include at least 50% of the same sequences. In some embodiments the fractions include at least 70%, 80%, 90%, 95%, 97%, or 99% of the same sequences, depending on how the fractions are produced. For instance, if a RCG is produced by PCR another RCG can be generated under identical experimental conditions having at a minimum greater than 90% of the sequences in the first RCG. Other methods for preparing a RCG such as size selection are still considered to be reproducible but often produce less than 99% of the same sequences.

A “plurality” of elements, as used throughout the application refers to 2 or more of the element. A “DNA fragment” is a polynucleotide sequence obtained from a genome at any point along the genome and encompassing any sequence of nucleotides. The DNA fragments of the invention can be generated according to any one of two types mechanisms, and thus there are two types of RCGs, PCR-generated RCGs and native RCGs.

PCR-generated RCGs are randomly primed. That is, each of the polynucleotide fragments in the PCR-generated RCG all have common sequences at or near the 5′ and 3′ end of the fragment (When a tag is used in the primer, all of the 5′ and 3′ ends are identical. When a tag is not used the 5′ and 3′ ends have a series of N's followed by the TARGET sequence (reading in a 5′ to 3′ direction). The TARGET sequence is identical in each primer, with the exception of multiple-primed DOP-PCR) but the remaining nucleotides within the fragments do not have any sequence relation to one another. Thus, each polynucleotide fragment in a RCG includes a common 5′ and 3′ sequence which is determined by the constant region of the primer used to generate the RCG. For instance, if the RCG is generated using DOP-PCR (described in more detail below) each polynucleotide fragment would have near the 5′ or 3′ end nucleotides that are determined by the “TARGET nucleotide sequence”. The TARGET nucleotide sequence is a sequence which is selected arbitrarily but which is constant within a set or subset (e.g. multiple primed DOP-PCR) of primers. Thus, each polynucleotide fragment can have the same nucleotide sequence near the 5′ and 3′ end arising from the same TARGET nucleotide sequence. In some cases more than one primer can be used to generate the RCG. When more than one primer is used, each member of the RCG would have a 5′ and 3′ end in common with at least one other member of the RCG and, more preferably, each member of the RCG would have a 5′ and 3′ end in common with at least 5% of the other members of the RCG. For example, if a RCG is prepared using DOP-PCR with 2 different primers having different TARGET nucleotide sequences, a population containing of four sets of PCR products having common ends could be generated. One set of PCR products could be generated having the TARGET nucleotide sequence of the first primer at or near both the 5′ and 3′ ends and another set could be generated having the TARGET nucleotide sequence of the second primer at or near both the 5′ and 3′ ends. Another set of PCR products could be generated having the TARGET nucleotide sequence of the second primer at or near the 5′ end and the TARGET nucleotide sequence of the first primer at or near the 3′ end. A fourth set of PCR products could be generated having the TARGET nucleotide sequence of the second primer at or near the 3′ end and the TARGET nucleotide sequence of the first primer at or near the 5′ end. The PCR generated genomes are composed of synthetic DNA fragments.

The DNA fragments of the native RCGs have arbitrary sequences. That is, each of the polynucleotide fragments in the native RCG do not have necessarily any sequence relation to another fragment of the same RCG. These sequences are selected based on other properties, such as size or, secondary characteristics. These sequences are referred to as native RCGs because they are prepared from native nucleic acid preparations rather than being synthesized. Thus they are native-non-synthetic DNA fragments. The fragments of the native RCG may share some sequence relation to one another (e.g. if produced by restriction enzymes). In some embodiments they do not share any sequence relation to one another.

In some preferred embodiments, the RCG includes a plurality of DNA fragments ranging in size from approximately 200 to 2,000 nucleotide residues. In a preferred embodiment, a RCG includes from 95 to 0.05% of the intact native genome. The fraction of the isolated genome which is present in the RCG of the invention represents at most 90% of the isolated genome, and in preferred embodiments, contains less than 50%, 40%, 30%, 20%, 10%, 5%, or 1% of the genome. A RCG preferably includes between 0.05 and 1% of the intact native genome. In a preferred embodiment, the RCG encompasses 10% or less of an intact native genome of a complex organism.

Genomic DNA can be isolated from a tissue sample, a whole organism, or a sample of cells. Additionally, the isolated genomes of the invention are preferably substantially free of proteins that interfere with PCR or hybridization processes, and are also substantially free of proteins that damage DNA, such as nucleases. Preferably, the isolated genomes are also free of non-protein inhibitors of polymerase function (e.g. heavy metals) and non-protein inhibitors of hybridization when the PCR-generated RCGs are formed. Proteins may be removed from the isolated genomes by many methods known in the art. For instance, proteins may be removed using a protease, such as proteinase K or pronase, by using a strong detergent such as sodium dodecyl sulfate (SDS) or sodium lauryl sarcosinate (SLS) to lyse the cells from which the isolated genomes are obtained, or both. Lysed cells may be extracted with phenol and chloroform to produce an aqueous phase containing nucleic acid, including the isolated genomes, which can be precipitated with ethanol.

Several methods can be used to generate PCR-generated RCG including IRS-PCR, AP-PCR, DOP-PCR, multiple primed PCR, and adaptor-PCR. Hybridization conditions for particular PCR methods are selected in the context of the primer type and primer length to produce to yield a set of DNA fragments which is a percentage of the genome, as defined above. PCR methods have been described in many references, see e.g., U.S. Pat. Nos. 5,104,792; 5,106,727; 5,043,272; 5,487,985; 5,597,694; 5,731,171; 5,599,674; and 5,789,168. Basic PCR methods have been described in e.g., Saiki et al., Science, 230: 1350 (1985) and U.S. Pat. Nos. 4,683,195, 4,683,202 (both issued Jul. 18, 1987) and U.S. Pat. No. 4,800,159 (issued Jan. 24, 1989).

The PCR methods described herein are performed according to PCR methods well-known in the art. For instance, U.S. Pat. No. 5,333,675, issued to Mullis et al. describes an apparatus and method for performing automated PCR. In general, performance of a PCR method results in amplification of a selected region of DNA by providing two DNA primers, each of which is complementary to a portion of one strand within the selected region of DNA. The primer is hybridized to a template strand of nucleic acid in the presence of deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and a chain extender enzyme, such as DNA polymerase. The primers are hybridized with the separated strands, forming DNA molecules that are single stranded except for the region hybridized with the primer, where they are double stranded. The double stranded regions are extended by the action of the chain extender enzyme (e.g. DNA polymerase) to form an extended double stranded molecule between the original two primers. The double stranded DNA molecules are separated to produce single strands which can then be re-hybridized with the primers. The process is repeated for a number of cycles to generate a series of DNA strands having the same nucleotide sequence between and including the primers.

Chain extender enzymes are well known in the art and include, for example,

E. coli

DNA polymerase I, klenow fragment of

E. coli

DNA polymerase I, T4 DNA polymerase, T7 DNA polymerase, recombinant modified T7 DNA polymerase, reverse transcriptase, and other enzymes. Heat stable enzymes are particularly preferred as they are useful in automated thermal cycle equipment. Heat stable polymerases include, for example, DNA polymerases isolated from

bacillus stearothermophilus

(Bio-Rad),

thermus thermophilous

(finzyme, ATCC number 27634), thermus species (ATCC number 31674),

thermus aquaticus

strain TV 11518 (ATCC number 25105),

sulfolobus acidocaldarius

, described by Bukhrashuili et al.,

Biochem. Biophys. Acta

., 1008:102-07 (1909),

thermus filiformus (ATCC number

43280), Taq DNA polymerase, commercially available from Perkin-Elmer-Cetus (Norwalk, Conn.), Promega (Madison, Wis.) and Stratagene (La Jolla, Calif.), and AmpliTaq™ DNA polymerase, a recombinant thermus equitus Taq DNA polymerase, available from Perkin-Elmer-Cetus and described in U.S. Pat. No. 4,889,818.

Preferably, the PCR-based RCG generation methods performed according to the invention are automated and performed using thermal cyclers. Many types of thermal cyclers are well-known in the art. For instance, M. J. Research (Watertown, Mass.) provides a thermal cycler having a peltier heat pump to provide precise uniform temperature control in the thermal cyclers; DeltaCycler thermal cyclers from Ericomp (San Diego, Calif.) also are peltier-based and include automatic ramping control, time/temperature extension programming and a choice of tube or microplate configurations. The RoboCycler™ by Stratagene (La Jolla, Calif.) incorporates robotics to produce rapid temperature transitions during cycling and well-to-well uniformity between samples; and a particularly preferred cycler, is the Perkin-Elmer Applied Biosystems (Foster City, Calif.) ABI Prism™ 877 Integrated Thermal cycler, which is operated through a programmable interface that automates liquid handling and thermocycling processes for fluorescent DNA sequencing and PCR reactions. The Perkin-Elmer Applied Biosystems machine is designed specifically for high-throughput genotyping projects and fully automates genotyping steps, including PCR product pooling.

Degenerate oligonucleotide primed-PCR (DOP-PCR) involves use of a single primer set, wherein each primer of the set is typically composed of 3 parts. A DOP-PCR primer as used herein can have the following structure:

5′ tag-(

N

)

x

-TARGET 3′

The “TARGET” nucleotide sequence includes at least 5 arbitrarily selected nucleotide residues that are the same for each primer of the set. x is an integer from 0 to 9, and N is any nucleotide residue. The value of x is preferably the same for each primer of a DOP-PCR primer sety. In other embodiments, the TARGET nucleotide sequence includes at least 6 or 7 and preferably at least 8, 9, or 10 arbitrarily-selected nucleotides. The tag is optional.

A “TARGET nucleotide” can be used herein is selected arbitrarily. A set of primers is used to generate a particular RCG. Each primer in the set includes the same TARGET nucleotide sequence as the other primers. Of course, sets of primers having different TARGET sequences can be combined.

The “tag”, as used herein, is a sequence which is useful for processing the RCG but not necessary. The tag, unlike the other sequences in the primer, does not necessarily hybridize with genomic DNA during the initial round of genomic PCR amplification. In later amplification rounds, the tag hybridizes with PCR, amplified DNA. Thus, the tag does not contribute to the sequence initially recognized by the primer. Since the tag does not participate in the initial hybridization reaction with genomic DNA, but is involved in the primer extension process, the PCR products that are formed (i.e., the reproducible DNA fragments) include the tag sequence. Thus, the end products are DNA fragments that have a sequence identical to a sequence found in the genome except for the tag sequence. The tag is useful because in later rounds of PCR it allows use of a higher annealing temperature than could otherwise be used with shorter oligonucleotides. The arbitrarily selected sequence is positioned at the 3′ end of the primer. This sequence, although arbitrarily selected, is the same for each primer in a set of DOP-PCR primers. From 0 to 9 nucleotide residues (“N” in the formula above) are located at the 5′-end of the TARGET sequence in the DOP-PCR primers of the invention. Each of these residues can be independently selected from naturally-occurring or artificial nucleotide residues. By way of example, each “N” residue can be an inosine or methylcytosine residue. In the formula, “x” is an integer that can be from 0 to 9, and is preferably from 3 to 9 (e.g. 3, 4, 5, 6, 7, 8, or 9). Each set of DOP-PCR primers of the invention can thus contain up to 4

x

unique primers (i.e., 1, 4, 16, 64 . . . , 262144 primers for x=0, 1, 2, 3, . . . , 9). Finally, a base pair tag can be positioned at the 5′ end of the primer. This tag can optionally include a restriction enzyme site. In general, inclusion of a tag sequence in the DOP-PCR primers of the invention is preferred, but not necessary.

The initial rounds of DOP-PCR are preferably performed at a low temperature given that the specificity of the reaction will be determined by only the 3′ TARGET nucleotide sequence. A slow ramp time during these cycles ensures that the primers do not detach from the template before being extended. Subsequent rounds are carried out at a higher annealing temperature because in the subsequent rounds the 5′ end of the DOP-PCR primer (the tag) is able to contribute to the primer annealing. A PCR cycle performed under low stringency hybridization conditions generally is from about 35° C. to about 55° C.

Because DOP-PCR involves a randomly chosen sequence, the resultant PCR products are generated from genome sequences arbitrarily distributed throughout the genome and will generally not be clustered within specific sites of the genome. Additionally, creation of new sets of DOP-PCR-amplified DNA fragments can be easily accomplished by changing the sequence, length, or both, of the primer. RCGs having greater or lesser complexity can be generated by selecting DOP-PCR primers having shorter or longer, respectively, TARGET and (N)

x

nucleotide sequences. This approach can also be used with multiple DOP-PCR primers such as in the “multiple-primed DOP-PCR” method (described below). Finally, use of arbitrarily chosen sequences of DOP-PCR is useful in many species because the arbitrarily-selected sequences are not species-specific, as with some forms of PCR which require use of a specific known sequence.

Another method for generating a PCR-generated RCG involves interspersed repeat sequence PCR (IRS-PCR). Mammalian chromosomes include both repeated and unique sequences. Some of the repeated sequences are short interspersed repeated sequences (IRS's) and others are long IRS's. One major family of short IRS's found in humans includes Alu repeat sequences. Amplification using a single Alu primer will occurs whenever two Alu elements lie in inverted orientation to each other on opposite strands. There are believed to be approximately 900,000 Alu repeats in a human haploid genome. Another type of IRS sequence is the L1 element (most common is L1Hs) which is present in 10

4

-10

5

copies in a human genome. Because the L1 sequence is expressed less abundantly in the genome than the Alu sequence, fewer amplification products are produced upon amplification using an L1 primer. In IRS-PCR, a primer which has homology to a repetitive sequence present on opposite strands within the genome of the species to be analyzed is used. When two repeat elements having the primer sequence are present in a head-to-head fashion within a limited distance (approximately 2000 nucleotide residues), the inter-repeat sequence can be amplified. The method has the advantage that the complexity of the resulting PCR products can be controlled by how homologous the primer chosen is with the repeat consensus (that is, the more homologous the primer is with the repeat consensus sequence, the more complex the PCR product will be).

In general, an IRS-PCR primer has a sequence wherein at least a portion of the primer is homologous with (e.g. 50%, 75%, 90%, 95% or more identical to) the consensus nucleotide sequence of an IRS of the subject.

In mammalian genomes, small interspersed repeat sequences (SINES) are present in extremely high copy number and are often configured such that a single copy sequence of between 500 nucleotide residues and 1000 nucleotide residues is situated between two repeats which are oriented in a head-to-head or tail-to-tail manner. Genomic DNA sequences having this configuration are substrates for Alu PCR in human DNA and B1 and B2 PCR in the mouse. The precise number of products which are represented in a specific Alu, B1, or B2 PCR reaction depends on the choice of primer used for the reaction. This variation in product complexity is due to the variation in sequence among the large number of representative sequences of the IRS family in each species. A detailed study of this variation was described by Britten (Britten, R. J. (1994),

Proc. Natl. Acad. Sci. USA

, 91:5992-5996). In the Britten study, the sequence variation for each nucleotide residue of the Alu consensus sequence was analyzed for 1574 human Alu sequences. The complexity of Alu PCR products generated by amplification using a given Alu PCR primer can be predicted to a significant extent based on the degree to which the nucleotide sequence of the primer matches consensus nucleotide sequences. As a general rule, Alu PCR products become progressively less complex as the primer sequence diverges from the Alu consensus. Because two hybridized primers are required at each site for which Alu PCR is to be accomplished, it is predictable that linear variation and the number of genomic sites to which a primer may bind will be reflected in the complexity of PCR products, which is roughly proportional to the square of primer binding efficiency. This prediction conforms to experimental results, permitting synthesis of Alu PCR products having a wide range of product complexity values. Therefore, when it is desirable to reduce the number of PCR products obtained using Alu PCR, the primer sequence should be designed to diverge by a predictable amount from the Alu consensus sequence.

Another method for generating a RCG involves arbitrarily primed PCR (AP-PCR). AP-PCR utilizes short oligonucleotides as PCR primers to amplify a discrete subset of portions of a high complexity genome. For AP-PCR, the primer sequence is arbitrary and is selected without knowledge of the sequence of the target nucleic acids to be amplified. The arbitrary primer is generally 50-60% G+C. The AP-PCR method is similar to the DOP-PCR method described above, except that the AP-PCR primer consists of only the arbitrarily-selected nucleotides and not the 5′ flanking degenerate residues or the tag (i.e. N

x

residue described for the DOP-PCR primers). The genome may be primed using a single arbitrary primer or a combination of two or more arbitrary primers, each having a different, but optionally related, sequence.

AP-PCR is performed under low stringency hybridization conditions, allowing hybridization of the primer with targets with which the primer can exhibit a substantial degree of mismatching. A PCR cycle performed under low stringency hybridization conditions generally is from about 35° C. to about 55° C. Mismatches refer to non complementary nucleotide bases in the primer, relative to the template with which it is hybridized.

AP-PCR methods have been used previously in combination with gel electrophoresis to determine genotypes. AP-PCR products are generationally fractionated on a high resolution polyacrylamide gel, and the presence or absence of specific bands is used to genotype a specific locus. In general, the difference between the presence and absence of a band is a consequence of a single nucleotide DNA sequence difference in one of the primer binding sites for a given single copy sequence.

The product complexity obtained using a given primer or primer set can be determined by several methods. For instance, the product complexity can be determined using PCR amplification of a panel of human yeast artificial chromosome (YAC) DNA samples from a CEPH 1 library. These YACs each carry a human DNA segment approximately 300-400 kilobase pairs in length. Product complexity for each primer set can be inferred by comparing the number of bands produced per YAC when analyzed on agarose gel with an IRS-PCR product of known complexity. Additionally, for products of relatively low complexity, electrophoresis on polyacrylamide gels can establish the product complexity, compared to a standard. Alternatively, an effective way to estimate the complexity of the product is to carry out a reannealing reaction using resistance to S1 nuclease-catalyzed degradation to determine the rate of reannealing of internally labeled, denatured, double-stranded DNA product. Comparison with reannealing rates of standards of known complexity permits accurate estimation of product complexity. Each of these three methods may be used for IRS PCR. The second and third methods are best for AP-PCR and DOP-PCR which, unlike IRS-PCR, will not selectively amplify human DNA from a crude YAC DNA preparation.

The complexity of PCR products generated by AP-PCR can be regulated by selecting the primer sequence length, the number of primers in a primer set, or some combination of these. By choosing the appropriate combination, AP-PCR may also be used to reduce the complexity of a genome for SNP identification and genotyping, as described herein. AP-PCR markers are different from Alu PCR primers, have a different genomic distribution, and can therefore complement an IRS-PCR genome complexity-reducing method. The methods can be used in combination to produce complementary information from genome scans.

One PCR method for preparing RCGs is an adapter-linker amplification PCR method (previously described in e.g., Saunders et al., Nuc. Acids Res., 17 9027 (1990); Johnson, Genomics, 6: 243 (1990) and PCT Application WO90/00434, published Aug. 9, 1990. In this method, genomic DNA is digested using a restriction enzyme, and a set of linkers is ligated onto the ends of the resulting DNA fragments. PCR amplification of genomic DNA is accomplished using a primer which can bind with the adapter linker sequence. Two possible variations of this procedure which can be used to limit genome complexity are (a) to use a restriction enzyme which produces a set of fragments which vary in length such that only a subset (e.g. those smaller than a PCR-amplifiable length) are amplified; and (b) to digest the genomic DNA using a restriction enzyme that produces an overhang of random nucleotide sequence (e.g., AlwN1 recognizes CAGNNNCTG; SEQ ID NO: 1) and cleaves between NNN and CTG). Adapters are constructed to anneal with only a subset of the products. For example, in the case of AlwN1, adapters having a specific 3 nucleotide residue overhang (corresponding to the random 3 base pair sequence produced by the restriction enzyme digestion) would be used to yield (4

3

) 64-fold reduction in complexity. Fragments which have an overhang sequence complementary to the adapter overhang are the only ones which are is amplified.

Another method for generating RCGs is based on the development of native RCGs. Several methods can be used to generate native RCGs, including DNA fragment size selection, isolating a fraction of DNA from a sample which has been denatured and reannealed, pH-separation, separation based on secondary structure, etc.

Size selection can be used to generate a RCG by separating polynucleotides in a genome into different fractions wherein each fraction contains polynucleotides of an approximately equal size. One or more fractions can be selected and used as the RCG. The number of fractions selected will depend on the method used to fragment the genome and to fractionate the pieces of the genome, as well as the total number of fractions. In order to increase the complexity of the RCG, more fractions are selected. One method of generating a RCG involves fragmenting a genome into arbitrarily size pieces and separating the pieces on a gel (or by HPLC or another size fractionation method). A portion of the gel is excised, and DNA fragments contained in the portion are isolated. Typically, restriction enzymes can be used to produce DNA fragments in a reproducible manner.

Separation based on secondary structure can be accomplished in a manner similar to size selection. Different fractions of a genome having secondary structure can be separated on a gel. One or more fractions are excised from the gel, and DNA fragments are isolated therefrom.

Another method for creating a native RCG involves isolating a fraction of DNA from a sample which has been denatured and reannealed. A genomic DNA sample is denatured, and denatured nucleic acid molecules are allowed to reanneal under selected conditions. Some conditions allow more of the DNA to be reannealed than other conditions. These conditions are well known to those of ordinary skill in the art. Either the reannealed or the remaining denatured fractions can be isolated. It is desirable to select the smaller of these two fractions in order to generate RCG. The reannealing conditions used in the particular reaction determine which fraction is the smaller fraction. Variations of this method can also be used to generate RCGs. For instance, once a portion of the fraction is allowed to reanneal, the double stranded DNA may be removed (e.g., using column chromatography), the remaining DNA can then be allowed to partially reanneal, and the reannealed fraction can be isolated and used. This variation is particularly useful for removing repetitive elements of the DNA, which rapidly reanneal.

The amount of isolated genome used in the method of preparing RCGs will vary, depending on the complexity of the initial isolated genome. Genomes of low complexity, such as bacterial genomes having a size of less than about 5 million base pairs (5 megabases), usually are used in an amount from approximately 10 picograms to about 250 nanograms. A more preferred range is from 30 picograms to about 7.5 nanograms, and even more preferably, about 1 nanogram. Genomes of intermediate complexity, such as plants (for instance, rice, having a genome size of approximately 700-1,000 megabases) can be used in a range of from approximately 0.5 nanograms to 250 nanograms. More preferably, the amount is between 1 nanogram and 50 nanograms. Genomes of highest complexity (such as maize or humans, having a genome size of approximately 3,000 megabases) can be used in an amount from approximately 1 nanogram to 250 nanograms (e.g. for PCR).

In addition to the DOP-PCR methods described above, PCR-generated RCGs can be prepared using DOP-PCR involving multiple primers, which is referred to herein as “multiple-primed-DOP-PCW”. Multiple-primed-DOP-PCR involves the use of at least two primers which are arranged similarly to the single primers discussed above and are typically composed of 3 parts. A multiple-primed-DOP-PCR primer as used herein has the following structure:

tag-(

N

)

x

-TARGET

2

The TARGET

2

nucleotide sequence includes at least 5, and preferably at least 6, TARGET nucleotide residues, x is an integer from 0-9, and N is any nucleotide residue.

The sequence chosen arbitrarily and positioned at the 3′ end of the primer can be manipulated in multiple-primed-DOP-PCR to produce a different end product than for DOP-PCR because use of two or more sets of primers adds another level of diversity, thus producing a RCG or amplified genome, depending on the primers chosen. Each of the at least two sets of primers of multiple-primed-DOP-PCR has a different TARGET sequence. Similar to the single primer of DOP-PCR a set of primers is generated for each of the at least two primers and, every primer within a single set has the same TARGET sequence as the other primers of the set. This TARGET sequence is flanked at its 5′ end by 0 to 9 nucleotide residues (“N”s). The set of N's will differ from primer to primer within a set of primers. A set of primers may include up to 4

x

different primers, each primer having a unique (N)

x

sequence. Finally a tag can be positioned at the 5′ end.

In other aspects of the invention, methods for identifying SNPs can be performed using RNA genomes rather than RCGs. RNA genomes differ from RCGs in that they are generated from RNA rather than from DNA. An RNA genome can be, for instance, a cDNA preparation made by reverse transcription of RNA obtained from cells of a subject (e.g. human ovarian carcinoma cells). Thus, an RNA genome can be composed of DNA sequences, as long as the DNA is derived from RNA. RNA can also be used directly.

The genotyping and other methods of the invention can also be performed using a RNA genotyping method. This method involves use of RNA, rather than DNA, as the source of nucleic acid for genotyping. In this embodiment, RNA is reverse transcribed (e.g. using a reverse transcriptase) to produce cDNA for use as an RNA genome. The RNA method has at least one advantage over DNA-based methods. SNPs in coding regions (cSNPs) are more likely to be directly involved in detectable phenotypes and are thus more likely to be informative with regard to how such phenotypes can be affected. Furthermore, since this method can require only a reverse transcription step, it is amenable to high-throughput analysis. In a preferred embodiment, a reverse transcriptase primer which only binds a subset of RNA species (e.g. a dT primer having a 3-base anchor, e.g. TTTTTTTTTT CAG; SEQ ID NO: 2) is used to further reduce RNA genome complexity (48-fold using the dt-3base anchor primer). In the RNA-genotyping method of the invention the RNA/cDNA sample can be attached to a surface and hybridized with a SNP-ASO.

In another aspect, the invention includes a method for identifying a SNP. Genomic fragments which include SNPs can be prepared according to the invention by preparing a set of primers from a RCG (e.g., a RCG is composed of a set of PCR products), performing PCR using the set of primers to amplify a plurality of isolated genomes to produce DNA products, and identifying SNPs included in the DNA products. The presence of a SNP in the DNA product can be identified using methods such as direct sequencing, i.e. using dideoxy chain termination or Maxam Gilbert (see e.g., Sambrook et al, “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, 1989, New York; or Zyskind et al., Recombinant DNA Laboratory Manual, Acad. Press, 1988), denaturing gradient gel electrophoresis to identify different sequence dependent melting properties and electrophoretic migration of SNPs containing DNA fragments (see e.g., Erlich, ed., PCR Technology, Principles and applications for DNA Amplification, Freeman and Co., NY, 1992), and conformation analysis to differentiate sequences based on differences in electrophoretic migration patterns of single stranded DNA products (see e.g., Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770, 1989). In preferred embodiments, the SNPs are identified based on the sequences of the polymerase chain reaction products identified using sequencing methods.

A “single nucleotide polymorphism” or “SNP” as used herein is a single base pair (i.e., a pair of complementary nucleotide residues on opposite genomic strands) within a DNA region wherein the identities of the paired nucleotide residues vary from individual to individual. At the variable base pair in the SNP, two or more alternative base pairings occur at a relatively high frequency (greater than 1%) in a subject, (e.g. human) population.

A “polymorphic region” is a region or segment of DNA the nucleotide sequence of which varies from individual to individual. The two DNA strands which are complementary to one another except at the variable position are referred to as alleles. A polymorphism is allelic because some members of a species have one allele and other members have a variant allele and some have both. When only one variant sequence exists, a polymorphism is referred to as a diallelic polymorphism. There are three possible genotypes in a diallelic polymorphic DNA in a diploid organism. These three genotypes arise because it is possible that a diploid individual's DNA may be homozygous for one allele, homozygous for the other allele, or heterozygous (i.e. having one copy of each allele). When other mutations are present, it is possible to have triallelic or higher order polymorphisms. These multiple mutation polymorphisms produce more complicated genotypes.

SNPs are well-suited for studying sequence variation because they are relatively stable (i.e. they exhibit low mutation rates) and because it appears that SNPs can be responsible for inherited traits. These properties make SNPs particularly useful as genetic markers for identifying disease-associated genes. SNPs are also useful for such purposes as linkage studies in families, determining linkage disequilibrium in isolated populations, performing association analysis of patients and controls, and loss of heterozygosity studies in tumors.

An exemplary method for identifying SNPs is presented in the Examples below. Briefly, DOP-PCR is performed using genomic DNA obtained from an individual. The products are separated on an agarose gel. The products are separated by approximate length into approximately 8 segments having sizes of about 400-1000 base pairs, and libraries are made from each of the segments. This approach prevents domination of the library by one or two abundant products. Plasmid DNA is isolated from individual colonies containing portions of the library. Inserts are isolated and the ends of the inserts are sequenced using vector primers. A new set of primers is then synthesized based on these insert sequences to allow PCR to be performed using RCG obtained from one or more individuals or from a pool of individuals. The DNA products generated by the PCR are sequenced and inspected for the presence of two nucleotide residues at one location, an indication that a polymorphism exists at that position within one of the alleles.

A “primer” as used herein is a polynucleotide which hybridizes with a target nucleic acid with which it is complementary and which is capable of acting as an initiator of nucleic acid synthesis under conditions for primer extension. Primer extension conditions include hybridization between the primer and template, the presence of free nucleotides, a chain extender enzyme, e.g., DNA polymerase, and appropriate temperature and pH.

In preferred embodiments, a set of primers is prepared by at least the following steps: preparing a RCG, composed of a set of PCR products, separating the set of PCR products into individual PCR products, determining the sequence of each end of at least one of the PCR products, and generating the set of primers for use in the subsequent PCR step based on the sequence of the ends of the insert(s).

A “set of PCR products”, as used herein, is a plurality of synthetic polynucleotide sequences, each polynucleotide sequence being different from one another except for a stretch of nucleotides in the 5′ and 3′ regions of the polynucleotides which are identical in each polynucleotide. These regions correspond to the primers used to generate the RCG and the sequence in these regions varies depending on what primer is used. When a DOP PCR primer is used, the sequence that varies in each primer preferably has a sequence N

x

, wherein x is 5-12 and N is any nucleotide. A set of DNA products is different from a “set of PCR products” as used herein and refers to DNA generated by PCR using specific primers which amplify a specific locus.

Once the sequence of a primer is known, the primer may be purified from a nucleic acid preparation which includes, it or it may be prepared synthetically. For instance, nucleic acid fragments may be isolated from nucleic acid sequences in genomes, plasmids, or other vectors by site-specific cleavage, etc. Alternatively, the primers may be prepared by de novo chemical synthesis, such as by using phosphotriester or phosphodiester synethetic methods, such as those described in U.S. Pat. No. 4,356,270; Itakura et al. (1989),

Ann. Rev. Biochem

., 53:323-56; and Brown et al. (1979),

Meth. Enzymol

., 68:109. Primers may also be prepared using recombinant technology, such as that described in Sambrook, “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, p. 390-401 (1982).

The term “nucleotide residue” refers to a single monomeric unit of a nucleic acid such as DNA or RNA. The term “base pair” refers to two nucleotide residues which are complementary to one another and are capable of hydrogen bonding with one another. Traditional base pairs are between G:C and T:A. The letters G, C, T, U and A refer to (deoxy)guanosine, (deoxy)cytidine, (deoxy)thymidine, uridine, and (deoxy)adenosine, respectively. The term “nucleic acids” as used herein refers to a class of molecules including single stranded and double stranded deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and polynucleotides. Nucleic acids within the scope of the invention include naturally occurring and synthetic nucleic acids, nucleic acid analogs, modified nucleic acids, nucleic acids containing modified nucleotides, modified nucleic acid analogs, and mixtures of any of these.

SNPs identified or detected in the genotyping methods described herein can also be identified by other methods known in the art. Many methods have been described for identifying SNPs. (see e.g. WO95/12607, Bostein, et al.,

Am. J. Hum. Genet

,. 32:314-331 (1980), etc.). In some embodiments, it is preferred that SNPs be identified using the same method that will subsequently be used for genotype analysis.

As discussed briefly above, the SNPs and RCGs of the invention are useful for a variety of purposes. For instance, SNPs and RCGs are useful for performing genotyping analysis; for identification of a subject, such as in paternity or maternity testing, in immigration and inheritance disputes, in breeding tests in animals, in zygosity testing in twins, in tests for inbreeding in humans and animals; in evaluation of transplant suitability such as with bone marrow transplants; in identification of human and animal remains; in quality control of cultured cells; in forensic testing such as forensic analysis of semen samples, blood stains, and other biological materials; in characterization of the genetic makeup of a tumor by testing for loss of heterozygosity; in determining the allelic frequency of a particular SNP; and in generating a genomic classification code for a genome by identifying the presence or absence of each of a panel of SNPs in the genome of a subject and optionally determining the allelic frequency of the SNPs.

A preferred use of the invention is in a high throughput method of genotyping. “Genotyping” is the process of identifying the presence or absence of specific genomic sequences within genomic DNA. Distinct genomes may be isolated from individuals of populations which are related by some phenotypic characteristic, by familial origin, by physical proximity, by race, by class, etc. in order to identify polymorphisms (e.g. ones associated with a plurality of distinct genomes) which are correlated with the phenotype family, location, race, class, etc. Alternatively, distinct genomes may be isolated at random from populations such that they have no relation to one another other than their origin in the population. Identification of polymorphisms in such genomes indicates the presence or absence of the polymorphisms in the population as a whole, but not necessarily correlated with a particular phenotype.

Although genotyping is often used to identify a polymorphism associated with a particular phenotypic trait, this correlation is not necessary. Genotyping only requires that a polymorphism, which may or may not reside in a coding region, is present. When genotyping is used to identify a phenotypic characteristic, it is presumed that the polymorphism affects the phenotypic trait being characterized. A phenotype may be desirable, detrimental, or, in some cases, neutral.

Polymorphisms identified according to the methods of the invention can contribute to a phenotype. Some polymorphisms occur within a protein coding sequence and thus can affect the protein structure, thereby causing or contributing to an observed phenotype. Other polymorphisms occur outside of the protein coding sequence but affect the expression of the gene. Still other polymorphisms merely occur near genes of interest and are useful as markers of that gene. A single polymorphism can cause or contribute to more than one phenotypic characteristic and, likewise, a single phenotypic characteristic may be due to more than one polymorphism. In general multiple polymorphisms occurring within a gene correlate with the same phenotype. Additionally, whether an individual is heterozygous or homozygous for a particular polymorphism can affect the presence or absence of a particular phenotypic trait.

Phenotypic correlation is performed by identifying an experimental population of subjects exhibiting a phenotypic characteristic and a control population which do not exhibit that phenotypic characteristic. Polymorphisms which occur within the experimental population of subjects sharing a phenotypic characteristic and which do not occur in the control population are said to be polymorphisms which are correlated with a phenotypic trait. Once a polymorphism has been identified as being correlated with a phenotypic trait, genomes of subjects which have potential to develop a phenotypic trait or characteristic can be screened to determine occurrence or non-occurrence of the polymorphism in the subjects' genomes in order to establish whether those subjects are likely to eventually develop the phenotypic characteristic. These types of analyses are generally carried out on subjects at risk of developing a particular disorder such as Huntington's disease or breast cancer.

A phenotypic trait encompasses any type of genetic disease, condition, or characteristic, the presence or absence of which can be positively determined in a subject. Phenotypic traits that are genetic diseases or conditions include multifactorial diseases of which a component may be genetic (e.g. owing to occurrence in the subject of a SNP), and predisposition to such diseases. These diseases include such as, but not limited to, asthma, cancer, autoimmune diseases, inflammation, blindness, ulcers, heart or cardiovascular diseases, nervous system disorders, and susceptibility to infection by pathogenic microorganisms or viruses. Autoimmune diseases include, but are not limited to, rheumatoid arthritis, multiple sclerosis, diabetes, systemic lupus, erythematosus and Grave's disease. Cancers include, but are not limited to, cancers of the bladder, brain, breast, colon, esophagus, kidney, hematopoietic system eg. leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach, and uterus. A phenotypic characteristic includes any attribute of a subject other than a disease or disorder, the presence or absence of which can be detected. Such characteristics can, in some instances, be associated with occurrence of a SNP in a subject which exhibits the characteristic. Examples of characteristics include, but are not limited to, susceptibility to drug or other therapeutic treatments, appearance, height, color (e.g. of flowering plants), strength, speed (e.g. of race horses), hair color, etc. Many examples of phenotypic traits associated with genetic variation have been described, see e.g., U.S. Pat. No. 5,908,978 (which identifies association of disease resistance in certain species of plants associated with genetic variations) and U.S. Pat. No. 5,942,392 (which describes genetic markers associated with development of Alzheimer's disease).

Identification of associations between genetic variations (e.g. occurrence of SNPs) and phenotypic traits is useful for many purposes. For example, identification of a correlation between the presence of a SNP allele in a subject and the ultimate development by the subject of a disease is particularly useful for administering early treatments, or instituting lifestyle changes (e.g., reducing cholesterol or fatty foods in order to avoid cardiovascular disease in subjects having a greater-than-normal predisposition to such disease), or closely monitoring a patient for development of cancer or other disease. It may also be useful in prenatal screening to identify whether a fetus is afflicted with or is predisposed to develop a serious disease. Additionally, this type of information is useful for screening animals or plants bred for the purpose of enhancing or exhibiting of desired characteristics.

One method for determining a genotype associated with a plurality of genomes is screening for the presence or absence of a SNP in a plurality of RCGs. For example, such screening may be performed using a hybridization reaction including a SNP-ASO and the RCGs. Either the SNP-ASO or the RCGs can, optionally be immobilized on a surface. The genotype is determined based on whether the SNP-ASO hybridizes with at least some of the RCGs. Other methods for determining a genotype involve methods which are not based on hybridization, including, but not limited to, mass spectrometric methods. Methods for performing mass spectrometry using nucleic acid samples have been described. See e.g., U.S. Pat. No. 5,885,775. The components of the RCG can be analyzed by mass spectrometry to identify the presence or absence of a SNP allele in the RCG.

A “SNP-ASO”, as used herein, is an oligonucleotide which includes one of two alternative nucleotides at a polymorphic site within its nucleotide sequence. In some embodiments, it is preferred that the oligonucleotide include only a single mismatched nucleotide residue namely the polymorphic residue, relative to an allele of a SNP. In other cases, however, the oligonucleotide may contain additional nucleotide mismatches such as neutral bases or may include nucleotide analogs. This is described in more detail below. In preferred embodiments, the SNP-ASO is composed from about 10 to 50 nucleotide residues. In more preferred embodiments, it is composed of from about 10 to 25 nucleotide residues.

Oligonucleotides may be purchased from commercial sources such as Genosys, Inc., Houston, Texas or, alternatively, may be synthesized de novo on an Applied Biosystems 381A DNA synthesizer or equivalent type of machine.

The oligonucleotides may be labeled by any method known in the art. One preferred method is end-labeling, which can be performed as described in Maniatis et al., “Molecular Cloning: A Laboratory Manual”, Cold Spring Harbor Laboratories, Cold Spring Harbor, N.Y. (1982).

It is possible that in organisms having a relatively non-complex genome, only a minimal complexity reduction step is necessary, and the genomic DNA may be directly analyzed or minimally reduced. This is particularly useful for screening tissue isolates to detect the presence of a bacterium or to identify the bacteria. Additionally, it is possible that, upon development of certain technical advances (e.g., more stringent hybridization, more sensitive detection equipment), even complex genomes may not need an extensive complexity reduction step.

Preferably, automated genotyping is performed. In general, genomic DNA of a well-characterized set of subjects, such as the CEPH families, is processed using PCR with appropriate primers to produce RCGs. The DNA is spotted onto one or more surfaces (e.g., multiple glass slides) for genotyping. This process can be performed using a microarray spotting apparatus which can spot more than 1,000 samples within a square centimeter area, or more than 10,000 samples on a typical microscope slide. Each slide is hybridized with a fluorescently tagged allele-specific SNP oligonucleotide under TMAC conditions analogous to those described below. The genotype of each individual can be determined by detecting the presence or absence of a signal for a selected set of SNP-ASOs. A schematic of the method is shown in FIG.

4

.

Once the complexity of genomic DNA obtained from an individual has been reduced, the resulting genomic DNA fragments can be attached to a solid support in order to be analyzed by hybridization. The RCG fragments may be attached to the slide by any method for attaching DNA to a surface. Methods for immobilizing nucleic acids have been described extensively, e.g., in U.S. Pat. Nos. 5,679,524; 5,610,287; 5,919,626; and 5,445,934. For instance, DNA fragments may be spotted onto poly-L-lysine-coated glass slides, and then crosslinked by UV irradiation. A second, more preferred method, which has been developed, involves including a 5′ amino group on each of the DNA fragments of the RCG. The DNA fragments are spotted onto silane-coated slides in the presence of NaOH in order to covalently attach the fragments to the slide. This method is advantageous because a covalent bond is formed between the fragments and the surface. Another method for accomplishing DNA fragment immobilization is to spot the RCG fragments onto a nylon membrane. Other methods of binding DNA to surfaces are possible and are well known to those of ordinary skill in the art. For instance, attachment to amino-alkyl-coated slides can be used. More detailed methods are described in the Examples below.

The surface to which the oligonucleotide arrays are conjugated is preferably a rigid or semi-rigid support which may, optionally, have appropriate light absorbing or transmitting characteristics for use with commercially available detection equipment. Substrates which are commonly used and which have appropriate light absorbing or transmitting characteristics include, but are not limited to, glass, Si, Ge, GaAs, GaP, SiO

2

, SiN

4

, modified silicon, and polymers such as (poly)tetrafluoroethylene, (poly)vinylidenedifluoride, polystyrene, polycarbonate, or combinations thereof. Additionally, the surface of the support may be non-coated or coated with a variety of materials. Coatings include, but are not limited to, polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, and membranes.

In one embodiment the SNP-ASOs are hybridized under standard hybridization conditions with RCGs covalently conjugated to a surface. Briefly, SNP-ASOs are labeled at their 5′ ends. A hybridization mixture containing the SNP-ASOs and, optionally, an isostabilizing agent, denaturing agent, or renaturation accelerant is brought into contact with an array of RCGs immobilized on the surface and the mixture and the surface are incubated under appropriate hybridization conditions. The SNP-ASOs which do not hybridize are removed by washing the array with a wash mixture (such as a hybridization buffer) to leave only hybridized SNP-ASOs attached to the surface. After washing, detection of the label (e.g., a fluorescent molecule) is performed. For example, an image of the surface can be captured (e.g., using a fluorescence microscope equipped with a CCD camera and automated stage capabilities, phosphoimager, etc.). The label may also, or instead, be detailed using a microarray scanner (e.g. one made by Genetic Microsystems). A microarray scanner provides image analysis which can be converted to a binary (i.e. +/−) signal for each sample using, for example, any of several available software applications (e.g., NIH image, ScanAnalyze, etc.) in a data format. The high signal/noise ratio for this analysis allows determination of data in this mode to be straightforward and easily automated. These data, once exported, can be manipulated to generate a format which can be directly analyzed by human genetics applications (such as CRI-MAP and LINKAGE via software). Additionally, the methods may utilize two or more fluorescent dyes which can be spectrally differentiated to reduce the number of samples to be analyzed. For instance, if four fluorescent dyes having spectral distinctions (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used. Then four hybridization reactions can be carried out under a single hybridization condition. In other embodiments discussed in more detail below, the SNP-ASOs are conjugated to a surface and hybridized with RCGs.

Conditions for optimal hybridization are described below in the Examples. In general, the SNP-ASO is present in a hybridization mixture at a concentration of from about 0.005 nanomoles per liter SNP-ASO hybridization mixture to about 50 nM SNP-ASO per ml hybridization mixture. More preferably, the concentration is from 0.5 nanomoles per liter to 1 nanomole per liter. A preferred concentration for radioactivity is 0.66 nanomoles per liter. The mixture preferably also includes a hybridization optimizing agent in order to improve signal discrimination between genomic sequences which are identically complementary to the SNP-ASO and those which contain a single mismatched nucleotide (as well as any neutral base etc. substitutions). Isostabilizing agents are compounds such as betaines and lower tetraalkyl ammonium salts which reduce the sequence dependence of DNA thermal melting transitions. These types of compounds also increase discrimination between matched and mismatched SNPs/genomes. A denaturing agent may also be included in the hybridization mixture. A denaturing agent is a composition that lowers the melting temperature of double stranded nucleic acid molecules, generally by reducing hydrogen bonding between bases or preventing hydration of nucleic acid molecules. Denaturing agents are well-known in the art and include, for example, DMSO, formaldehyde, glycerol, urea, formamide, and chaotropic salts. The hybridization conditions in general are those used commonly in the art, such as those described in Sambrook et al., “Molecular Cloning: A Laboratory Manual”, (1989), 2nd Ed., Cold Spring Harbor, N.Y.; Berger and Kimmel, “Guide to Molecular Cloning Techniques”,

Methods in Enzymology

, (1987), Volume 152, Academic Press, Inc., San Diego, Calif.; and Young and Davis, (1983),

PNAS

(USA) 80:1194.

In general, incubation temperatures for hybridization of nucleic acids range from about 20° C. to 75° C. For probes 17 nucleotides residues and longer, a preferred temperature range for hybridization is from about 50° C. to 54° C. The hybridization temperature for longer probes is preferably from about 55° C. to 65° C. and for shorter probes is less than 52° C. Rehybridization may be performed in a variety of time frames. Preferably, hybridization of SNP and RCGs performed for at least 30 minutes.

Preferably, either or both of the SNP-ASO and the RCG are labeled. The label may be added directly to the SNP-ASO or the RCG during synthesis of the oligonucleotide or during generation of RCG fragments. For instance, a PCR reaction performed using labeled primers or labeled nucleotides will produce a labeled product. Labeled nucleotides (e.g., fluorescein-labeled CTP) are commercially available. Methods for attaching labels to nucleic acids are well known to those of ordinary skill in the art and, in addition to the PCR method, include, for example, nick translation and end-labeling.

Labels suitable for use in the methods of the present invention include any type of label detectable by standard means, including spectroscopic, photochemical, biochemical, electrical, optical, or chemical methods. Preferred types of labels include fluorescent labels such as fluorescein. A fluorescent label is a compound comprising at least one fluorophore. Commercially available fluorescent labels include, for example, fluorescein phosphoramidides such as fluoreprime (Pharmacia, Piscataway, N.J.), fluoredite (Millipore, Bedford, Mass.), FAM (ABI, Foster City, Calif.), rhodamine, polymethadine dye derivative, phosphores, Texas red, green fluorescent protein, CY3, and CY5. Polynucleotides can be labeled with one or more spectrally distinct fluorescent labels. “Spectrally distinct” fluorescent labels are labels which can be distinguished from one another based on one or more of their characteristic absorption spectra, emission spectra, fluorescent lifetimes, or the like. Spectrally distinct fluorescent labels have the advantage that they may be used in combination (“multiplexed”).

Radionuclides such as

3

H,

125

I,

35

S,

14

C, or

32

P are also useful labels according to the methods of the invention. A plurality of radioactively distinguishable radionuclides can be used. Such radionuclides can be distinguished, for example, based on the type of radiation (e.g. α, β, or δ radiation) emitted by the radionuclides. The

32

P signal can be detected using a phosphoimager, which currently has a resolution of approximately 50 microns. Other known techniques, such as chemiluminescence or colormetric (enzymatic color reaction), can also be used.

By using spectrally distinct fluorescent probes , it is possible to analyze more than one locus a single hybridization mixture. The term “multiplexing” refers to the use of a set of distinct fluorescent labels in a single assay. Such fluorescent labels have been described extensively in the art, such as the fluorescent labels described in PCT Published Patent Application WO98/31834.

Fluorescent primers are a preferred method of labeling polynucleotides. The fluorescent tag is stable for more than a year. Radioactively labeled primers are stable for a shorter period. In addition, fluorescent primers may be used in combination if they are spectrally distinct, as discussed above. This allows multiple hybridizations to be detected in a single hybridization mixture. As a result, the total number of reactions needed for a genome-wide scan is reduced. For example, for analysis of 1000 loci, 2000 hybridizations are needed (1000 loci×2 polymorphisms/loci). The use of 4 fluorescently-labeled oligonucleotides will cut this number 4-fold and thus only 500 hybridizations will be needed.

In order to determine the genotype of an individual at a SNP locus, it is desirable to employ SNP allele-specific oligonucleotide hybridization. Preferably, two hybridization mixtures are prepared for each locus (or they can be performed together). The first hybridization mixture contains a labeled (e.g., radioactive or fluorescent) SNP-ASO (typically 17-21 nucleotide residues in length centered around the polymorphic residue). To increase specificity, a 20-50 fold excess of non-labeled oligonucleotides corresponding to another allele (referred to herein as a “complementary SNP-ASO”) is included in the hybridization mixture. Use of the non-labeled complementary SNP-ASO can be avoided by using SNP-ASO containing a neutral base as described below. In the second hybridization mixture, the SNP-ASO that was labeled in the first mixture is not labeled, and the non-labeled SNP-ASO is labeled instead. Hybridization is performed in the presence of a hybridization buffer. The melting temperature of oligonucleotides can be determined empirically for each experiment. The pair of 2 oligonucleotides corresponding to different alleles of the same SNP (the SNP-ASOs and the complementary SNP-ASO) are referred to herein as a pair of allele-specific oligonucleotides (ASOs). Further experimental details regarding selecting and making SNP-ASOs are provided in the Examples section below.

In addition to the method described above, several other methods of allele specific hybridization may be used for hybridizing SNP-ASOs with RCGs. One method is to increase discrimination of SNPs in DNA hybridization by means of artificial mismatches. Artificial mismatches are inserted into oligonucleotide probes using a neutral base such as the base analog 3-nitropyrrole. A significant enhancement of discrimination is generally obtained, with a strong dependence of the enhancement on the spacing between mismatches.

In general, the methods described above are based on conjugation of genomic DNA fragments (i.e. a RCG) to a solid support. Hybridization analysis can also be performed with the SNP-ASO conjugated to the support (e.g. in an array). The oligonucleotide array is hybridized with one or more RCGs. Attaching of the SNP-ASOs or RCGs onto the support may be performed by any method known in the art. Many methods for attaching oligonucleotides to surfaces in arrays have been described, see, e.g. PCT Published Patent Application WO97/29212, U.S. Pat. Nos. 4,588,682; 5,667,976; and 5,760,130. Other methods include, for example, using arrays of metal pins. Additionally, RCGs may be attached to the surface by the methods disclosed in the Examples below.

An “array” as used herein is a set of molecules arranged in a specific order with respect to a surface. Preferably the array is composed of polynucleotides (e.g. either SNP-ASOs or RCGs) attached to the surface. Oligonucleotide arrays can be used to screen nucleic acid samples for a target nucleic acid, which can be labeled with a detectable marker. A fluorescent signal resulting from hybridization between a target nucleic acid and a substrate-bound oligonucleotide provides information relating to the identity of the target nucleic acid by reference to the location of the oligonucleotide in the array on the substrate. Such a hybridization assay can generate thousands of signals which exhibit different signal strengths. These signals correspond to particular oligonucleotides of the array. Different signal strengths will arise based on the amount of labeled target nucleic acid hybridized with an oligonucleotide of the array. This amount, in turn, can be influenced by the proportion of AT-rich regions and GC-rich regions within the oligonucleotide (which determines thermal stability). The relative amounts of hybridized target nucleic acid can also be influenced by, for example, the number of different probes arrayed on the substrate, the length of the target nucleic acid, and the degree of hybridization between mismatched residues. Oligonucleotide arrays, in some embodiments, have a density of at least 500 features per square centimeter, but in practice can have much lower densities. A feature, as used herein, is an area of a substrate on which oligonucleotides having a single sequence are immobilized.

The oligonucleotide arrays of the invention may be produced by any method known in the art. Many such arrays are commercially available, and many methods have been described for producing them. One preferred method for producing arrays includes spatially directed oligonucleotide synthesis. Spatially directed oligonucleotide may be performed using light-directed oligonucleotide synthesis, microlithography, application by ink jet, microchannel deposition to specific location, and sequestration with physical barriers. Each of these methods is well-known in the art and has been described extensively. For instance, the light-directed oligonucleotide synthesis method has been disclosed in U.S. Pat. Nos. 5,143,854; 5,489,678; and 5,571,639; and PCT applications having publication numbers WO90/15070; WO92/10092; and WO94/12305. This technique involves modification of the surface of the solid support with linkers and photolabile protecting groups using a photolithographic mask to produce reactive (e.g. hydroxyl) groups in the illuminated regions. A 3′-O-phosphoramide-activated deoxynucleocide having a 5′-hydroxyl protected group is supplied to the surface such that coupling occurs at sites that were exposed to light. The substrate is rinsed, and the surface is illuminated with a second mask, and another activated deoxynucleotide is presented to the surface. The cycle is repeated until the desired set of products is obtained. After the cycle is finished, the nucleotides can be capped. Another method involves mechanically protecting portions of the surface and selectively deprotecting/coupling materials to the is exposed portions of the surface, such as the method described in U.S. Pat. No. 5,384,261. The mechanical means is generally referred to as a mask. Other methods for array preparation are described in PCT Published Patent Applications WO97/39151, WO98/20967, and WO98/10858, which describe an automated apparatus for the chemical synthesis of molecular arrays, U.S. Pat. No. 5,143,854, Fodor et al.,

Science

(1991), 251:767-777 and Kozal et al.,

Nature Medicine

, v. 2, p. 753-759 (1996).

Hybridizing a SNP-ASO with an array of RCGs (or hybridizing a RCG with an array of SNP ASO) is followed by detection of hybridization. Part of the genotyping methods described herein is to determine if a positive or negative signal exists for each hybridization for an individual and then based on this information, determine the genotype for the corresponding SNP locus. This step is relatively straightforward, but varies depending on the method of detection. Essentially, all of the detection methods described here (fluorescent, radioactive, etc.) can be reduced to a digital image file, e.g. using a microarray reader or phosphoimager. Presently, there are several software products which will overlay a grid on an image and determine the signal strength value for each element of the grid. These values can be imported into a computer program, such as the Microsoft Corporation spreadsheet program designated Microsoft Excel™, with which simple analysis can be performed to assign each signal a manipulable value (e.g. 1 or 0 or + or −). Once this is accomplished, an individual's genotype can be described in terms of the pattern of hybridization of RCG fragments obtained from the individual with selected SNP ASO corresponding to disease-associated SNPs.

The array having labeled SNP-ASOs (or labeled RCGs) hybridized thereto can be analyzed using automated equipment. Automated equipment for analyzing arrays can include an excitation radiation source which emits radiation at a first wavelength, an optical detector, and a stage for securing the surface supporting the array. The excitation source emits excitation radiation which is focused on at least one area of the array and which induces emission from fluorescent labels. The signal is preferably in the form of radiation having a different wavelength than the excitation radiation. Emitted radiation is collected by a detector, which generates a signal proportional to the amount of radiation sensed thereon. The array may then be moved so that a different area can be exposed to the radiation source to produce a signal. Once each area of the array has been scanned, a two-dimensional image of the array is obtained. Preferably, the movement of the array is accomplished using automated equipment, such as a multi-axis translation stage, such as one which moves the array at a constant velocity. In alternative embodiments, the array may remain stationary, and devices may be employed to cause scanning of the light over the stationary array.

One type of detection method includes a CCD imaging system, e.g. when the nucleic acids are labeled with fluorescent probes. Other detectors are well known to those of skill in the art and also, or alternatively, be used. CCD imaging systems for use with array detection have been described. For instance, a photodiode detector may be placed on the opposite side of the array from the excitation source. Alternatively, a CCD camera may be used in place of the photodiode detector to image the array. One advantage of using these systems is rapid read time. In general, an entire 50×50 centimeter array can be read in about 30 seconds or less using standard equipment. If more powerful equipment and efficient dyes are used, the read time may be reduced to less than 5 seconds.

Once the data is obtained, e.g. as a two-dimensional image, a computer can be used to transform the data into a displayed image which varies in color depending on the intensity of light emission at a particular location. Any type of commercial software which can perform this type of data analysis can be used. In general, the data analysis involves the steps of determining the intensity of the fluorescence emitted as a function of the position on the substrate, removing the outliers, and calculating the relative binding affinity. One or more of the presence, absence, and intensity of signal corresponding to a label is used to assess the presence or absence of an SNP corresponding to the label in the RCG. The presence and absence of one or more SNP's in a RCG can be used to assign a genotype to the individual. For example, the following depicts the genotype analysis of 3 individuals at a given locus at which an A/G polymorphism occurs:

Individual

SNP 1 Allele “A”

SNP 1 Allele “G”

Genotype

Larry

+

−

A/A

Moe

−

+

G/G

Curly

+

+

A/G

As mentioned above, SNP analysis can be used to determine whether an individual has or will develop a particular phenotypic trait and whether the presence or absence of a specific allele correlates with a particular phenotypic trait. In order to determine which SNPs are related to a particular phenotypic trait, genomic samples are isolated from a group of individuals which exhibit the particular phenotypic trait, and the samples are analyzed for the presence of common SNPs. The genomic sample obtained from each individual is used to prepare a RCG. These RCGs are screened using panels of SNPs in a high throughput method of the invention to determine whether the presence or absence of a particular allele is associated with the phenotype. In some cases, it may be possible to predict the likelihood that a particular subject will exhibit the related phenotype. If a particular polymorphic allele is present in 30% of individuals who develop Alzheimer's disease, then an individual having that allele has a higher likelihood of developing Alzheimer's disease. The likelihood can also depend on several factors such as whether individuals not afflicted with Alzheimer's disease have this allele and whether other factors are associated with the development of Alzheimer's disease. This type of analysis can be useful for determining a probability that a particular phenotype will be exhibited. In order to increase the predictive ability of this type of analysis, multiple SNPs associated with a particular phenotype can be analyzed. Although values can be calculated, it is enough to identify that a difference exists.

It is also possible to identify SNPs which segregate with a particular disease. Multiple polymorphic sites may be detected and examined to identify a physical linkage between them or between a marker (SNP) and a phenotype. Both of these are useful for mapping a genetic locus linked to or associated with a phenotypic trait to a chromosomal position and thereby revealing one or more genes associated with the phenotypic trait. If two polymorphic sites segregate randomly, then they are either on separate chromosomes or are distant enough, with respect to one another on the same chromosome that they do not co-segregate. If two sites co-segregate with significant frequency, then they are linked to one another on the same chromosome. These types of linkage analyses are useful for developing genetic maps. See e.g., Lander et al., PNAS (USA) 83, 7353-7357 (1986), Lander et al., Genetics 121, 185-199 (1989). The invention is also useful for identifying polymorphic sites which do not segregate, i.e., when one sibling has a chromosomal region that includes a polymorphic site and another sibling does not have that region.

Linkage analysis is often performed on family members which exhibit high rates of a particular phenotype or on patients suffering from a particular disease. Biological samples are isolated from each subject exhibiting a phenotypic trait, as well as from subjects which do not exhibit the phenotypic trait. These samples are each used to generate individual RCGs and the presence or absence of polymorphic markers is determined using panels of SNPs. The data can be analyzed to determine whether the various SNPs are associated with the phenotypic trait and whether or not any SNPs segregate with the phenotypic trait.

Methods for analyzing linkage data have been described in many references, including Thompson & Thompson, Genetics in Medicine (5th edition), W.B. Saunders Co., Philadelphia, 1991; and Strachan, “Mapping the Human Genome” in the Human Genome (Bios Scientific Publishers Ltd., Oxford) chapter 4, and summarized in PCT published patent application WO98/18967 by Affymetrix, Inc. Linkage analysis involving by calculating log of the odds values (LOD values) reveals the likelihood of linkage between a marker and a genetic locus at a recombination fraction, compared to the value when the marker and genetic locus are not linked. The recombination fraction indicates the likelihood that markers are linked. Computer programs and mathematical tables have been developed for calculating LOD scores of different recombination fraction values and determining the recombination fraction based on a particular LOD score, respectively. See e.g., Lathrop, PNAS, USA 81, 3443-3446 (1984); Smith et al., Mathematical Tables for Research Workers in Human Genetics (Churchill, London, 1961); Smith, Ann. Hum. Genet. 32, 127-1500 (1968). Use of LOD values for genetic mapping of phenotypic traits is described in PCT published patent application WO98/18967 by Affymetrix, Inc. In general, a positive LOD score value indicates that two genetic loci are linked and a LOD score of +3 or greater is strong evidence that two loci are linked. A negative value suggests that the linkage is less likely.

The methods of the invention are also useful for assessing loss of heterozygosity in a tumor. Loss of heterozygosity in a tumor is useful for determining the status of the tumor, such as whether the tumor is an aggressive, metastatic tumor. The method is generally performed by isolating genomic DNA from tumor sample obtained from a plurality of subjects having tumors of the same type, as well as from normal (i.e., non-cancerous) tissue obtained from the same subjects. These genomic DNA samples are used to generate RCGs which can be hybridized with a SNP-ASO, for example using the surface array technology described herein. The absence of a SNP allele in the RCG generated from the tumor compared to the RCG generated from normal tissue indicates whether loss of heterozygosity has occurred. If a SNP allele is associated with a metastatic state of a cancer, the absence of the SNP allele can be compared to its presence or absence in a non-metastatic tumor sample or a normal tissue sample. A database of SNPs which occur in normal and tumor tissues can be generated and an occurrence of SNPs in a patient's sample can be compared with the database for diagnostic or prognostic purposes.

It is useful to be able to differentiate non-metastatic primary tumors from metastatic tumors, because metastasis is a major cause of treatment failure in cancer patients. If metastasis can be detected early, it can be treated aggressively in order to slow the progression of the disease. Metastasis is a complex process involving detachment of cells from a primary tumor, movement of the cells through the circulation, and eventual colonization of tumor cells at local or distant tissue sites. Additionally, it is desirable to be able to detect a predisposition for development of a particular cancer such that monitoring and early treatment may be initiated. Many cancers and tumors are associated with genetic alterations. For instance, an extensive cytogenetic analysis of hematologic malignancies such as lymphomas and leukemias have been described, see e.g., Solomon et al., Science 254, 1153-1160, 1991. Many solid tumors have complex genetic abnormalities requiring more complex analysis.

Solid tumors progress from tumorigenesis through a metastatic stage and into a stage at which several genetic aberrations can occur. e.g., Smith et al., Breast Cancer Res. Terat., 18 Suppl. 1, S5-14, 1991. Genetic aberrations are believed to alter the tumor such that it can progress to the next stage, i.e., by conferring proliferative advantages, the ability to develop drug resistance or enhanced angiogenesis, proteolysis, or metastatic capacity. These genetic aberrations are referred to as “loss of heterozygosity.” Loss of heterozygosity can be caused by a deletion or recombination resulting in a genetic mutation which plays a role in tumor progression. Loss of heterozygosity for tumor suppressor genes is believed to play a role in tumor progression. For instance, it is believed that mutations in the retinoblastoma tumor suppressor gene located in chromosome 13q14 causes progression of retinoblastomas, osteosarcomas, small cell lung cancer, and breast cancer. Likewise, the short arm of chromosome 3 has been shown to be associated with cancer such as small cell lung cancer, renal cancer and ovarian cancers. For instance, ulcerative colitis is a disease which is associated with increased risk of cancer presumably involving a multistep progression involving accumulated genetic changes (U.S. Pat. No. 5,814,444). It has been shown that patients afflicted with long duration ulcerative colitis exhibit an increased risk of cancer, and that one early marker is loss of heterozygosity of a region of the distal short arm of chromosome 8. This region is the site of a putative tumor suppressor gene that may also be implicated in prostate and breast cancer. Loss of heterozygosity can easily be detected by performing the methods of the invention routinely on patients afflicted with ulcerative colitis. Similar analyses can be performed using samples obtained from other tumors known or believed to be associated with loss of heterozygosity.

The methods of the invention are particularly advantageous for studying loss of heterozygosity because thousands of tumor samples can be screened at one time. Additionally, the methods can be used to identify new regions of loss that have not previously been identified in tumors.

The methods of the invention are useful for generating a genomic pattern for an individual genome of a subject. The genomic pattern of a genome indicates the presence or absence of polymorphisms, for example, SNPs, within a genome. Genomic DNA is unique to each individual subject (except identical twins). Accordingly, the more polymorphisms that are analyzed for a given genome of a subject, the higher probability of generating a unique genomic pattern for the individual from which the sample was isolated. The genomic pattern can be used for a variety of purposes, such as for identification with respect to forensic analysis or population identification, or paternity or maternity testing. The genomic pattern may also be used for classification purposes as well as to identify patterns of polymorphisms within different populations of subjects.

Genomic patterns may be used for many purposes, including forensic analysis and paternity or maternity testing. The use of genomic information for forensic analysis has been described in many references, see e.g., National Research Council, The Evaluation of Forensic DNA Evidence (EDS Pollard et al., National Academy Press, DC, 1996). Forensic analysis of DNA is based on determination of the presence or absence of alleles of polymorphic regions within a genomic sample. The more polymorphisms that are analyzed, the higher probability of identifying the correct individual from which the sample was isolated.

In an embodiment of the invention, when a biological sample, such as blood or sperm, is found at a crime scene, DNA can be isolated and RCGs can be prepared. This RCG can then be screened with a panel of SNPs to generate a genomic pattern. The genomic pattern can be matched with a genomic pattern produced from a suspect or compared to a database of genomic patterns which has been compiled. Preferably, the SNPs used in the analysis are those in which the frequency of the polymorphic variation (allelic frequency) has been determined, such that a statistical analysis can be used to determine the probability that the sample genome matches the suspect's genome or a genome within the database. The probability that two individuals have the same polymorphic or allelic form at a given genetic site is described in detail in PCT published patent application WO98/18967, the entire contents of which are hereby incorporated by reference. Briefly, this probability defined as P(ID) can be determined by the equation:

P

(

ID

)=(

x

2

)

2

+(2

xy

)

2

+(

y

2

)

2

x and y in the equation represent the frequency that an allele A or B will occur in a haploid genome.

The calculation can be extended for more polymorphic forms at a given locus. The predictability increases with the number of polymorphic forms tested. In a locus of n alleles, a binomial expansion is used to calculate P(ID). The probabilities of each locus can be multiplied to provide the cumulative probability of identity and from this the cumulative probability of non-identity for a particular number of loci can be calculated. This value indicates the likelihood that random individuals have the same loci. The same type of quantitative analysis can be used to determine whether a subject is a parent of a particular child. This type of information is useful in paternity testing, animal breeding studies, and identification of babies or children whose identity has been confused, e.g., through adoption or inadequate record keeping in a hospital, or through separation of families by occurrences such as earthquake or war.

The genomic pattern may be used to generate a genomic classification code (GNC). The GNC may be represented by one or more data signals and stored as part of a data structure on a computer-readable medium, for example, a database. The stored GNCs may be used to characterize, classify, or identify the subjects for which the GNCs were generated. Each GNC may be generated by representing the presence or absence of each polymorphism with a computer-readable signal. These signals may then be encoded, for example, by performing a function on the signals.

Accordingly, the GNCs may be used as part of a classification or identification system for subjects such as, for example, humans, plants, or animals. As discussed above, the more polymorphisms that are analyzed for a given genome of a subject, the higher probability of generating a unique genomic pattern for the individual from which the sample was isolated, and consequently, the higher the probability that the GNC uniquely identifies an individual. In such a system, a data structure may include a plurality of entries, for example, data records or table entries, where each entry identifies an individual. Each entry may include the GNC generated for the individual as well as other. The GNC or portions thereof may then be stored in an index data structure, for example, another table. A portion of a GNC may be indexed so that each GNC may be further classified by a portion of its genomic pattern as opposed to only the entire genomic pattern.

The data structures may then be searched to identify an individual who has committed a crime. For example, if a biological sample from the individual (such as blood) is recovered from the crime scene, the GNC of the individual may generated by the methods described herein, and a database of records including GNCs searched until a match is found. Thus, the GNCs may be used to classify individuals within a group such as soldiers in the armed forces, cattle in a herd, or produce within a specific crop. For example, the armed forces may generate a database containing the GNC of each soldier, and the database could be used to identify the soldier if necessary. Likewise, a database could be generated where records and indexes of the database include the GNCs of individual animals within a herd of cattle, so that lost or stolen animals could later be identified and returned to the proper owner.

The code may optionally be converted into a bar code or other human- or machine-readable form. For example, each line of a bar code may indicate the presence of specific polymorphisms or groups of specific polymorphisms for a particular subject.

Additionally, it is useful to be able to identify the genus, species, or other taxonomic classification to which an organism belongs. The methods of the invention can accomplish this in a high throughput manner. Taxonomic identification is useful for determining the presence and identity of a pathogenic organism such as a virus, bacteria, protozoa, or multicellular parasites in a tissue sample. In most hospitals, bacteria and other pathogenic organisms are identified based on morphology, determination of nutritional requirements or fermentation patterns, determination of antibiotic resistance, comparison of isoenzyme patterns, or determination of sensitivity to bacteriophage strains. These types of methods generally require approximately 48 to 72 hours to identify the pathogenic organism. More recently, methods for identifying pathogenic organisms have been focused on genotype analysis, for instance, using RFLPs. RFLP analysis has been performed using hybridization methods (such as southern blots) and PCR assays.

The information generated according to the methods of the invention and in particular the GNCs, can be included in a data structure, for example, a database, on computer-readable medium, wherein the information is correlated with other information pertaining to the genomes or the subjects or types of subjects, from which the genomes are obtained.

FIG. 5

shows a computer system

100

for storing and manipulating genomic information. The computer system

100

includes a genomic database

102

which includes a plurality of records

104

a-n

storing information corresponding to a plurality of genomes. Each of the records

104

a-n

may store genetic information about each genome or an RCG generated therefrom. The genomes for which information is stored in the genomic database

102

may be any kind of genomes from any type of subject. For example, the genomes may represent distinct genomes of individual members of a species, particular classes of the individuals, ie., army, prisoners, etc.

An example of the format of a record

200

in the genomic database

102

(i.e., one of the records

104

a-n

) is shown in FIG.

6

A. As shown in

FIG. 6A

, the record

200

includes a genome identifier (Genome ID)

202

that identifies the genome corresponding to the record

200

. If enough polymorphisms of the genome were analyzed to generate the spectral pattern (such that the possibility that the GNC uniquely identifies the genome is high), or if a group to which the genome belongs has few enough members, than the GNC of the genome could serve as the Genome ID

202

. The record

202

also may include genomic information fields

204

a-n

. The genomic information may be any information associated with the genome identified by the Genome ID

202

such as, for example, a GNC, a portion of a GNC, the presence or absence of a particular SNP, a genetic attribute (genotype), a physical attribute (phenotype), a name, a taxonomic identifier, a classification of the genome, a description of the individual from which the genome was taken, a disease of the individual, a mutation, a color, etc. Each information field

204

a-n

may be used as an entry in an index data structure that has a structure similar to record

200

. For example, each entry of the index data structure may include an indexed information field as a first data element, and one or more Genome IDs

202

as additional elements, such that all elements that share a common attribute are stored in a common data structure. The format of the record

200

shown in

FIG. 6A

is merely an example of a format that may be used to represent genomes in the genomic database

102

. The amount of information stored for each record

200

, the number of records

200

, and the number of fields indexed may vary.

Further, each information field

204

a-n

may include one or more fields itself, and each of these fields themselves may include more fields, etc. Referring to

FIG. 6B

, an embodiment of the information field

204

a

is shown. The information field

204

a

includes a plurality of fields

206

a-m

for storing more information about the information represented by information field

204

a

. Although the following description refers to the fields

206

a-m

of the gene ID

204

a

, such description is equally applicable to information fields

204

b-n

. For example, if information field

204

a

represented a GNC of the genome corresponding to the genome ID

202

, then each of the fields

206

a-m

may represent a portion of the GNC, a particular SNP of the genomic pattern from which the GNC was generated, a group of such SNPs, a description of the GNC, a description of a one of the SNPs, etc.

The fields

206

a-m

of the gene ID

204

a

may store any kind of value that is capable of being stored in a computer readable medium such as, for example, a binary value, a hexadecimal value, an integral decimal value, or a floating point value.

A user may perform a query on the genomic database

102

to search for genomic information of interest, for example, all genomes having a GNC that matches the GNC of a murder suspect. In another example, it may be known that a biological sample contains a particular sequence. That sequence can be compared with sequences in the database to identify information such as which individual the sample was isolated from, or whether the genetic sequence corresponds to a particular phenotypic trait. For example, the user may search the genomic database

102

for genetic matches to identify an individual, genotypes which correlate with a particular phenotype, genotypes associated with various classes of individuals etc. Referring to

FIG. 5

, a user may provide user input

106

indicating genomic information for which to search to a query user interface

108

. The user input

106

may, for example, indicate an SNP for which to search using a standard character-based notation. The query user interface

108

may, for example, provide a graphical user interface (GUI) which allows the user to select from a list of types of accessible genomic information using an input device such as a keyboard or a mouse.

The query user interface

108

generates a search query

110

based on the user input

106

. A search engine

112

receives the search query

110

and generates a mask

114

based on the search query. Example formats of the mask

114

and ways in which the mask

114

may be used to determine whether the genomic information specified by the mask

114

matches genomic information of genomes in the genomic database

102

are described in more detail below with respect to FIG.

7

. The search engine

112

determines whether the genomic information specified by the mask

114

matches genomic information of genomes stored in the genomic database

102

. As a result of the search, the search engine

112

generates search results

116

indicating whether the genomic database

102

includes genomes having the genomic information specified by the mask

114

. The search results

116

may also indicate which genomes in the genomic database

102

have the genomic information specified by the mask

114

.

If, for example, the user input

106

specified a sequence of a gene, a GNC, or an SNP, the search results

116

may indicate which genomes in the genomic database

102

include the specified sequence, GNC, or SNP. If the user input

106

specified particular genetic information concerning a genome (e.g., enough to identify an individual), the search results

116

may indicate which individual genome listed in the genomic database

102

matches the particular information, thus identifying the individual from whom the sample was taken. Similarly, if the user input

106

specified genetic sequences which are not adequate to specifically identify the individual, the search results

116

may still be adequate to identify a class of individuals that have genomes in the genomic database

102

that match the genetic sequence. For example, the search results may indicate that the genomic information of genomes of all caucasian males matches the specified genetic sequence.

FIG. 7

illustrates a process

300

that may be used by the search engine

112

to generate the search results

116

. The search engine

112

receives the search query

110

from the query user interface

108

(step

302

). The search engine

112

generates the mask

114

generated based on the search query

110

(step

304

). The search engine

112

performs a binary operation on one or more of the records

104

a-n

in the genomic database

102

using the mask

114

(step

306

).

The search engine

112

generates the search results

116

based on the results of the binary operation performed in step

306

(step

308

).

A computer system for implementing the system

100

of

FIG. 5

as a computer program typically includes a main unit connected to both an output device which displays information to a user and an input device which receives input from a user. The main unit generally includes a processor connected to a memory system via an interconnection mechanism. The input device and output device also are connected to the processor and memory system via the interconnection mechanism.

One or more output devices may be connected to the computer system. Example output devices include a cathode ray tube (CRT) display, liquid crystal displays (LCD), printers, communication devices such as a modem, and audio output. One or more input devices may be connected to the computer system. Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet communication device, and data input devices such as sensors. The invention is not limited to the particular input or output devices used in combination with the computer system or to those described herein.

The computer system may be a general purpose computer system which is programmable using a computer programming language, such as for example, C++, Java, or other language, such as a scripting language or assembly language. The computer system may also include specially programmed, special purpose hardware such as, for example, an application-specific integrated circuit (ASIC). In a general purpose computer system, the processor is typically a commercially available processor, of which the series x86, Celeron, and Pentium processors, available from Intel, and similar devices from AMD and Cyrix, the 680X0 series microprocessors available from Motorola, the PowerPC microprocessor from IBM and the Alpha-series processors from Digital Equipment Corporation, are examples.

Many other processors are available. Such a microprocessor executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services. The processor and operating system define a computer platform for which application programs in high-level programming languages are written.

A memory system typically includes a computer readable and writeable nonvolatile recording medium, of which a magnetic disk, a flash memory, and tape are examples. The disk may be removable such as, for example, a floppy disk or a read/write CD, or permanent, known as a hard drive. A disk has a number of tracks in which signals are stored, typically in binary form, i.e., a form interpreted as a sequence of one and zeros. Such signals may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program. Typically, in operation, the processor causes data to be read from the nonvolatile recording medium into an integrated circuit memory element, which is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM). The integrated circuit memory element allows for faster access to the information by the processor than does the disk. The processor generally manipulates the data within the integrated circuit memory and then copies the data to the disk after processing is completed. A variety of mechanisms are known for managing data movement between the disk and the integrated circuit memory element, and the invention is not limited to any particular mechanism. It should also be understood that the invention is not limited to a particular memory system.

The invention is not limited to a particular computer platform, particular processor, or particular high-level programming language. Additionally, the computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network. It should be understood that each module (e.g.

108

,

112

) in

FIG. 5

may be a separate module of a computer program, or may be a separate computer program. Such modules may be operable on separate computers. Data (e.g.

102

,

106

,

110

,

114

, and

116

) may be stored in a memory system or transmitted between computer systems. The invention is not limited to any particular implementation using software, hardware, firmware, or any combination thereof. The various elements of the system, either individually or in combination, may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor. Various steps of the process, for example, steps

302

,

304

,

306

, and

308

of

FIG. 7

, may be performed by a computer processor executing a program tangibly embodied on a computer-readable medium to perform functions by operating on input and generating output. Computer programming languages suitable for implementing such a system include procedural programming languages, object-oriented programming languages, and combinations of the two.

The invention also encompasses compositions. One composition of the invention is a plurality of RCGs immobilized on a surface, where the plurality of RCGs are prepared by DOP-PCR. Another composition is a panel of SNP-ASOs immobilized on a surface, wherein the SNPs are identified by using RCGs as described above.

The invention also includes kits having a container housing a set of PCR primers for reducing the complexity of a genome and a container housing a set of SNP-ASOs, particularly wherein the SNPs are present with a frequency of at least 50 or 55% in a RCG made using the primer set. In some kits, the set of PCR primers are primers for DOP-PCR and preferably the DOP-PCR primer has the tag-(N)

x

-TARGET structure described herein, i.e., wherein the TARGET includes at least 7 arbitrarily selected nucleotide residues, wherein x is an integer from 3 to 9, and wherein each N is any nucleotide residue and wherein tag is a polynucleotide as described above. In some embodiments the SNPs in the kit are attached to a surface such as a slide.

SNPs identified according to the methods of the invention using the B1 5′ rev primer the following:

B1 5′ rev ATTAAAGGCGTGCGCCACCATGCC (SEQID #13)

locus

ASO

Allele

Strain

(SEQ ID #)

1

tttatgAaggCataaaaa

A

129/

14

tttatgGaggCataaaaa

B

B6-DBA

15

tttatgAaggTataaaaa

C

Spre

16

2

ctgggctgTattcattt

A

129-DBA

17

ctgggctgCattcattt

B

B6

18

tctGcctccTGagtgct

C

B6-129-DBA

19

tctAcctccCAagtgct

D

Spre

20

3

tagctagaAtcaagctt

A

B6

21

tagctagaGtcaagctt

B

DBA-Spre

22

4

gctgtgcAACaaatcac

A

129/

23

cagctgtgc---aaatcacc

B

B6

24

5

tttcgtga-tgtttctat

A

129-Spre

25

tttcgtgaAtgtttcta

B

B6-DBA

26

6

cactgtctAcatcttta

A

B6-129

27

cactgtctCcatcttta

B

DBA-Spre

28

7

taacattcTtgaagcca

A

129-DBA-Spre

29

taacattcCtgaagcca

B

B6

30

8

gcttccaTttcctaagg

A

129-DBA

31

gcttccaCttcctaagg

B

B6

32

9

aggaatgGcAataatcc

A

B6-129

33

aggaatgGcGataatcc

B

DBA

34

aggaatgAcAataatcc

C

Spre

35

ttaaattcGtaaatgga

D

B6-129-DBA

36

ttaaattcAtaaatgga

E

Spre

37

10

taacattcTtgaagcca

A

129-DBA-Spre

38

taacattcCtgaagcca

B

B6

39

11

ttcTGtgActccaCttg

A

129

40

ttcTGtgActccaTttg

B

B6-DBA

41

ttcCCtgTctccaTttg

C

Spre

42

12

gtagtttgCcaggaacc

A

129-Spre

43

gtagtttgTcaggaacc

B

B6-DBA

44

13

tgctactcctctctactcg

A

129

45

tgctattcctctctgctcg

B

B6-DBA-Spre

46

cttgatcaccctctgatga

C

B6-129-DBA

47

cttggtcaccctctaatga

D

Spre

48

14

gaggtggtgcagagtga

A

129-DBA

49

gaggtggcgcagagtga

B

B6

50

gaggtggcccagagtga

C

Spre

51

15

cccactgaaccgcacag

A

129-DBA

52

cccactgagctgcacag

B

B6

53

cccactcagccgcacag

C

Spre

54

16

tgaagacacagccagcc

A

129-DBA

55

tgaagacgcagccagcc

B

B6

56

tgaagacgaagccagcc

C

Spre

57

17

agaagttggtaccaggg

A

129/FVB/F1/cast/spre

58

agaagttgttaccaggg

B

B6

59

18

tatgattacgtaatgtt

A

129/B6/F1

60

tatgattatgtaatgtt

B

FVB/F1

61

19

atgattccagtgagtta

A

129/B6

62

atgattcctgtgagtta

B

FVB/F1

63

catactattaacactggaa

C

Cast-129

64

catattattaacacaggaa

D

Spre

65

20

gtcaagaacaggcaata

A

129/b6/f1/FVB

66

gtcaagaataggcaata

B

f1

67

cagactagggaaccttc

C

129

68

cagacgagggaaccttc

E

Spre

69

cagactagggagccttc

D

Cast

70

21

tgtccagttgtttgcat

A

129/

71

tgtccagtcgtttgcat

B

b6/fvb/f1

72

ggggtagccagtttggt

C

Cast-129

73

ggggtagcaagtttggt

D

Spre

74

22

caggaagctgtagctcc

A

129/f1

75

caggaagccgtagctcc

B

b6/fvb

76

cctgagcctgtctacct

C

Cast-129

77

cctgagcccgtctacct

D

Spre

78

23

taacattcttgaagcca

A

129/FVB/F1/cast/spre

79

taacattcctgaagcca

B

B6

80

24

ccaactgaaccgcacag

A

129/FVB

81

ccaactgagctgcacag

B

B6

82

gagctagctcacacattct

C

Cast-129

83

gagttagctcacacgttct

D

Spre

84

25

acgggggggtggcgtta

A

129/f1

85

acgggggg-tggcgttaa

B

b6/fvb/cast/spre

86

tagacagccagcgcgtcac

C

Cast-129

87

tagatagccagcgcatcac

D

Spre

88

26

gcttttcttgagagtggc

A

129/b6

89

gcttttctttagagtggc

B

fvb

90

gcttttcgtgagagtggc

C

f1

91

27

ctacagataaagttata

A

129/b6/fvb/f1

92

ctacagatgaagttata

B

f1

93

tagacctgctgctatct

C

Cast-129

94

tagacctgttgctatct

D

Spre

95

28

tgttgttctggcctcca

A

129/F1

96

tgttgttttggcctcca

B

B6

97

ttctgagaatttgttag

C

129/B6

98

ttctgagagtttgttag

D

F1/spre

99

29

caggaagcagtagctcc

A

129

100

caggaagccgtagctcc

B

B6/FVB/F1

101

agagtcaggtaagttgc

C

Cast-129

102

agagtcagataagttgc

D

Spre

103

30

agatttcaaaaagtttt

A

129/b6

104

agattccaaaaggtttt

B

f1

105

agatttcaaaaagtttt

C

fvb

106

cctgaggggagcaatca

D

Cast-129

107

cctgagggaagcaatca

E

Spre

108

31

aaggtaagataactaag

A

129.f1

109

aaggtaaggtaactaag

B

b6/fvbn

110

ggactacacagagaaac

C

Cast-129

111

ggactacatagagaaac

D

Spre

112

32

cccaggctacacgaggg

A

129/fvb/f1

113

cccaggctacatgaggg

B

b6

114

cttaccagttgtgagac

C

129

115

cttaccacttgtgagac

D

Spre

116

cttaccagtcgtgagac

E

Cast

117

33

ctgccctcaggtcttta

A

129

118

ctgccctccggtcttta

B

b6/fvbn

119

gcaataaaattgtttta

C

Cast-129

120

gcaatgagatcgtttta

D

Spre

121

34

tgttctgtggagacccc

A

129/fvbn/f1/cast/spre

122

tgttctgtagagacccc

B

b6

123

35

cacattgaatcaaagcc

A

129/b6/fvbn/f1

124

cacattgagtcaaagcc

B

f1

125

ggactacccacccgttc

C

129

126

gcgactgc--acccattct

E

Spre

127

gcgactgccccc--attct

D

Cast

128

36

cctgggccagccaggaa

A

129/b6/cast

129

cctgggcctgccaggaa

B

fvbn/f1/spre

130

37

ccccaggtaaccatctt

A

129/f1

131

ccccaggtgaccatctt

B

b6/fvbn/cast/spre

132

ttctgtatattagctga

C

Cast-129

133

tttctatattaa--ctgac

D

Spre

134

38

ggacccggacggtcttc

A

129/b6

135

ggacccggtcggtcttc

B

bvb/f1

136

gtccctaatgttagcat

C

Cast-129

137

gtccccaatgtcagcat

D

Spre

138

39

acgggggggtggcgtta

A

129/f1

139

acgggggg-tggcgttaa

B

b6/fvbn/cast/spre

140

tagacagccagcgcgtcac

C

Cast

141

tagatagccagcgcatcac

D

Spre

142

40

gattcttcgtgttcctt

A

129-b6-F1

143

gattcttcatgttcctt

B

FVBN-Cast-Spre

144

41

tgtaaaaacttagaata

A

129/b6/f1

145

tgtaaaaatttagaata

B

fvbn/cast/spre

146

42

tgtgaaagcgctcccaa

A

129/fvbn/f1/cast/spre

147

tgtgaaagtgctcccaa

B

b6

148

43

caaaggctcagagaatc

A

129/b6/f1

149

caaaggcttagagaatc

B

fvbn

150

ttaattctctccaaaca

C

129/b6/fvb/f1

151

ttaaggctctccggaca

D

f1

152

44

ctgccaccgtgcacaca

A

129/b6

153

ctgccaccatgcacaca

B

fvbn/f1

154

ccaaatattctgattcc

C

129-Spre

155

ccaaatattcttttttt

D

Cast

156

45

atgagctgaccctccct

A

129/B6/F1

157

atgagctgcccctccct

B

FVB

158

acactaggtaaaagctc

C

129/B6/FVB/F1

159

acactaggcaaaagctc

D

F1

160

agacaccacgaccgagg

E

129-Spre

161

agacaccaagaccgagg

F

Cast

162

46

gcagcgtccggttaagt

A

129/f1

163

gcagcgtctggttaagt

B

b6/fvbn/f1

164

cagatactacaaggatg

C

129

165

tacagatac---aaggatgc

D

SPRE/Cast

166

47

tcagctagtgtatctgt

A

129/FVB/F1

167

tcacctagtgtatttgt

B

B6/F1

168

ttttttatttttggatt

C

129-Cast

169

tttt-aatttttggattt

D

Spre

170

48

gatattgttttcattta

A

129/

171

gatattgtcttcattta

B

b6/fvbn/f1

172

49

agacccggtgctggtgt

A

129/b6

173

agacccggcgctggtgt

B

fvbn/f1/cast

174

50

cttctaagctttgtctt

A

129/fvb/f1/cast/spre

175

cttctaagttttgtctt

B

b6/f1

176

51

agttggcaaccagcatg

A

129/

177

agttggcatccagcatg

B

b6/fvbn/f1

178

ggtgaaatggtaattac

C

129-Cast

179

ggtgaaatagtaattac

D

Spre

180

52

acgggatataacgagtt

A

129/FVB/F1

181

acgggatacaacgagtt

B

B6/cast/spre

182

gggatacaacgagtttc

C

129-Cast

183

gggatacaccgagtttc

D

Spre

184

53

gtatcttgggtgtcctg

A

129/FVB/F1

185

gtaacttgggtgttctg

B

B6/F1/spre

186

gggtgtcctgccccatc

C

129

187

gggtgttctgttttatc

D

Spre

188

54

tgtccagttgttttgca

A

129

189

tgtccagtcgttttgca

B

B6/FVB/F1/spre

190

aagacagccggaactct

C

129...

191

aagacagcaggaactct

D

Spre

192

55

tgataggaccaaagaga

A

129/b6/f1

193

cgataggactaaagaga

B

fvbn/f1

194

tccaaagccagggccca

C

129

195

tccaaattcagggccca

D

Spre

196

56

cctgggccagccagaag

A

129/B6/cast

197

cctgggcctgccagaag

B

FVB/F1/spre

198

57

gattctctgagcctttg

A

129/b6/f1

199

gattctctaagcctttg

B

fvbn

200

taccattttttagatga

C

129...

201

taccatttcttagatga

D

Spre

202

ctggaagggcagtgaat

A

129

203

tctgga-cgagggtgaat

B

B6/FVB

204

59

tagttgcagcacaaatg

A

129/B6

205

tagttgtagcacaaatg

B

FVB/F1

206

60

acactaccgcacagagc

A

129/b6/fvbn/f1

207

acactaccacacagagc

B

f1

208

aataataagtaaataag

C

129/

209

aataataaataaataag

D

cast

210

61

tggcagtagttgttcat

A

129/b6

211

tggcagtaattgttcat

B

fvbn/f1

212

aggtatgacgtcataag

C

129-cast

213

aggtatgatgtcataag

D

Spre

214

62

gttgttgttgaagattt

A

129/fvbn/f1

215

ttgttgttg---aagattta

B

b6/f1

216

gatagtacaggtgttgtca

C

129...

217

gatggtacaggtgtcgtca

D

Spre

218

63

aatataatgtaacagga

A

129/F1

219

aatataatataacagga

B

B6/FVB/F1

220

64

ttaaccatttatctgat

A

129/FVB

221

ttaaccatatatctgat

B

B6/F1

222

65

agagcccagcaaagttc

A

129/B6

223

agagcccaacaaagttc

B

FVB/F1

224

atcccgaaccggggaaaat

C

129-b6

225

atcccaaaccgggggaaat

D

cast-spre

226

66

atgacaccaccacaacc

A

129

227

atgacaccgccacaacc

B

B6/FVB/F1

228

67

aggcaaacagatataac

A

129/FVB/F1

229

aggcaaacggatataac

B

B6/cast/spre

230

tgtattcactaataaga

C

129-Cast

231

tgtattcattaataaga

D

Spre

232

68

ttggcgtatacttcata

A

129/B6/F1

233

ttggcgtacacttcata

B

FVB

234

ctcaccacgctccatct

C

129

235

ctcaccaccctccatct

D

Cast-Spre

236

69

atatctaaa----ggcacag

A

129/FVB

237

tatctacataaaggcac

B

B6/F1/cast/spre

238

gtgtctcctagtctccc

C

B6-Cast

239

gtgtctcccagtctccc

D

Spre

240

70

atgagctgaccctccct

A

129/B6/F1

241

atgagctgcccctccct

B

FVB/F1

242

ggacaacatttaattgg

C

129-Cast

243

ggacaacacttaattgg

D

Spre

244

71

gctttaaaatttttatt

A

129

245

gctttaaattttttatt

B

B6/FVB/F1

246

aaatttgttcctaaatg

C

129

247

aaatttgtacctaaatg

D

Cast-Spre

248

72

gtgttgttctggcctcc

A

129/FVB/spre

249

gtgttgttttggcctcc

B

B6/F1

250

73

tgaatgacaaaaagaca

A

129/B6/FVB

251

tgaatgacgaaaagaca

B

F1/cast

252

B2 5′Rev

ACTGAGCCATCTCWCCAG

W = A + T

253

101

acttaacttaagctggc

A

129/

254

gtacttaa-----gctggcctg

B

b6/fvb/f1

255

102

actctaatatcccacag

A

129/fvbn/f1

256

actctaatctcccacag

B

b6

257

cggatcggctctagttc

C

129/cast

258

cggatcagctctagttc

D

spre

259

103

tcaaaccaataaggagg

A

129/b6/fvb/f1

260

tcaaaccagtaaggagg

B

f1

261

104

gtgtgtgtgtggggggg

A

129/f1

262

gtgtgtgtg---gggggggt

B

b6/fvbn

263

cttaataataatttcat

C

129/cast

264

cttaataacaatttcat

D

spre

265

105

gtgtctccatatgtgtg

A

129/b6/f1

266

gtgtctacacatgtgtg

B

fvbn

267

106

aactcatcatgatggtt

A

129/

268

aactcataatgatggtt

B

b6/fvbn/f1

269

aactcatcacgatggtt

C

cast

270

atcactcatagcccaga

D

129/

271

atcacttatagcccaga

F

spre

272

atcactcatatcccaga

E

cast

273

107

catcttaccagcattga

A

129/cast/spre

274

catcttactagcattga

B

b6/fvbn/f1

275

108

agtcagccggctctggc

A

129/b6/f1

276

agtcagccagctctggc

B

fvbn/f1

277

gggtaggagtggggatgag

C

129/

278

gggcaggagtgggggtgag

E

spre

279

gggtaggagtgggggtgag

D

cast

280

109

tcagtattgttcttctc

A

129/f1/spre

281

tcagtatttttcttctc

B

b6/fvbn/f1/cast

282

110

agcagagactgagctcg

A

129/

283

agcagagaccgagctcg

B

b6/fvbn/f1

284

acaggggtcgattcgtc

c

129/b6/fvbn/f1/cast

285

acagggatcgattcgtc

E

spre

286

acaggggtcgtttcgtc

D

f1

287

111

tcccaaagcattcaagg

A

129/b6/f1

288

tcccaaagtattcaagg

B

fvbn/f1

289

gaccagggttaatgact

C

129/b6

290

gaccagggctaatgact

D

cast/spre

291

112

ctattaacagagtcgag

A

129/b6/f1

292

ctattaacggagtcgag

B

fvbn

293

gtgatactggatgtctg

C

129/b6

294

gtgataccg-atgtctgg

D

cast/spre

295

113

ctctctcgatagtctaa

A

129/f1

296

ctctctcgctagtctaa

B

b6/fvbn/f1/cast

297

tctctcgatagtctaat

C

129/

298

tctctcgctggtctaat

D

cast

299

114

agatgcaaaattcttag

A

129/

300

agatgcacagttcttag

B

b6/fvbn/f1

301

115

ggaaaatgctcaggtag

A

129/f1/cast/spre

302

ggaaaatgttcaggtag

B

b6/fvbn

303

116

tctgggcagagtgcagg

A

129/

304

tctgggcagcgtgcagg

B

b6/fvb/f1

305

117

tatggaacggttgcttc

A

129/fvb

306

tatggaactgttgcttc

B

b6/f1

307

aagcctggtacccgctg

C

129/cast

308

aagcctggcacccgctg

D

spre

309

118

cattcttctttttctga

A

129/

310

cattcttcgttttctga

B

b6/fvbn/f1/cast/spre

311

ctgcaggcttgtctgtg

C

129/CAST

312

ctgcaggtttgtctgtg

D

spre

313

119

tgccatttcctataaca

A

129/f1

314

tgccatttgctataaca

B

b6/fvbn

315

120

ccgccacacccgctcct

A

129/b6

316

ccgccacagccgctcct

B

fvbn/f1

317

121

caaataatgctagttat

A

129/b6/f1

318

caaataatgttagttat

B

fvbn

319

122

ggatgttgacacgctac

A

129/fvbn/f1

320

ggatgttgtcacgctac

B

b6/f1

321

catgtgtc-caacgccat

C

129/

322

catgtgtcacaacgcca

D

cast/spre

323

123

aaaggggccttaaagga

A

129/fvbn/f1

324

aaaggggctttaaagga

B

b6

325

tgaaaagttcttttcat

C

129/cast

326

tgaaaagtacttttcat

D

spre

327

124

cctctctatgtgtgagc

A

129/b6/f1

328

cctctctacgtgtgagc

B

fvbn

329

gaagttttaggagattct-t

C

129/

330

gaagatttaggagagtctc

D

spre

331

125

agggatgtattttgtta

A

129/fvbn/f1

332

agggatgtgttttgtta

B

b6

333

acaattcaaatgtatat

C

129/cast

334

acaattcatatgtatat

D

spre

335

126

cttgcctaacctgcaca

A

129/b6/f1

336

cttgcctagcctgcaca

B

fvbn

337

caacagc---acctcatatc

C

129/bt/cast

338

acagcggtgcctcgtat

D

spre

339

127

actcacagtgtcagggc

A

129/fvbn/f1/spre

340

actcacagcgtcagggc

B

b6/cast

341

128

ggctgctcctgtgtgtctg

A

129/fvbn/f1/cast

342

ggctcttcctgtgtgtctg

B

b6

343

ggctgctcctgtgtttctg

C

spre

344

129

aagatgcccttctga

A

129/f1

345

aatagatgccctcttga

B

b6/fvbn

346

aatcgatgcccttctga

c

spre

347

130

ttggtctagcaggtagc

A

129/fvbn/f1

348

ttggtctaccaggtagc

B

b6

349

agccttggctcttaaaa

C

129/cast

350

agccttggttcttaaaa

D

spre

351

131

agtctctggcgcctttg

A

129/fvbn/f1/cast/spre

352

agtctctgccgcctttg

B

b6

353

132

tagcaggaggcacagctta

A

129/

354

aagcaggaggcacaactta

B

b6

355

aagcaggaggcacagctta

C

fvb/f1/CAST

356

tagcaggaggcacagcttg

D

spre

357

133

aggagagaccggactcc

A

129/fvb/f1

358

aggagagagcggactcc

B

b6

359

134

tacaagtcatccttcct

A

129/b6/f1

360

tacaagtcgtccttcct

B

fvbn/f1

361

atacctccctcagacaa

C

129/cast

362

atacctcc-tcagacaag

D

spre

363

135

aaacaaacaaacaaacc

A

129/b6/f1/cast/spre

364

aaacaaaccaacaaacc

B

fvbn

365

gtgcgccaccatgacca

C

129/cast

366

gtgcgccatcatgacca

D

spre

367

136

ggctttcccattagtgg

A

129/

368

ggctttcctattagtgg

B

b6/fvbn/f1

369

ccctcacctctctctca

C

129/cast

370

cctcacccctctctca

D

spre

371

137

aatctctcgcgttcatt

A

129/fvbn/f1

372

aatctctcacgttcatt

B

b6

373

138

aatgataccgatcctta

A

129/f1

374

aatgatacagatcctta

B

b6/fvbn

375

ataaaactgcaattcgtg

C

129/b6

376

ataaaactacattcgtg

D

cast/spre

377

B1

AGTTCCAGGACAGCCAGG

378

Musch

201

atatctccgactttgaa

A

129/cast

379

atatctccaactttgaa

B

b6/fvb/f1/spre

380

tggccctgcagagtctg

C

129-Cast

381

tggctctgcagag-ctgg

D

Spre

382

202

caatggatc---aaagatgc

A

129-FVB-F1

383

atggatcaacaaagatg

B

B6

384

gctgcctc---aaggtataa

C

129/b6

385

ctgcctcttaaggtata

D

cast/spre

386

203

acctatggctcctcatc

A

129/b6/f1

387

acctatggttcctcatc

B

fvb

388

tcttctcccctgcttta

C

129-Cast

389

tcttctcac-tgctttag

D

Spre

390

204

ccgc-ataaaaagctgag

A

FVB-F1

391

ccgccataaaa-gctgag

B

B6-F1

392

agaatatagggtttttt

C

129/cast

393

agaatacag--ttttttt

D

spre

394

205

agagttgctgtgcaggg

A

129/b6/f1

395

agagttgccgtgcaggg

B

fvb/cast

396

agagttgcagtgcaggg

C

spre

397

206

taagcagtgttcttggc

A

129-B6-F1

398

taagcagtattcttggc

B

FVBN

399

ttctcccctgcttta

C

129/Cast

400

tcttctcac-tgctttag

D

spre

401

207

tttttttattattga

A

129/fvb/f1

402

ttttttt-attattgaa

B

b6

403

tgtggtacgcacatctg

C

129-Cast

404

tgtggtacacacatctg

D

Spre

405

208

agactcttagacttctg

A

129/f1

406

agactcttaggcttctg

B

b6/fvb/f1

407

agactcataagcttctg

C

spre

408

agactcttaggcttctg

D

cast

419

209

cacgtacccgaacgtga

A

129-B6

410

cacgtacctgaacgtga

B

FVB-F1

411

attacggtttgtcgtca

C

129/CAST

412

attacggttggtcgtca

D

spre

413

210

ccaagatacgaaaccag

A

129/f1/cast/spre

414

ccaagatatgaaaccag

B

b6

415

211

tgcaatgaccagcaacc

A

29/b6

416

tgcaacgaccagcaacc

B

fvb/f1/cast

417

tgtaacgaccaacaact

C

spre

418

212

tctaaagggaaagatgg

A

129-FVB

419

tctaaagg-aaagatgga

B

B6-F1

420

213

ctggactcatacataca

A

129-FVB-F1

421

ctggactcgtacataca

B

B6-F1-Cast/SPRE

422

agtttggtcccctggac

C

129/FVB/B6-F1-Cast

423

agtttggtttcctggac

D

Spre

424

214

tatagcttcatgtaaaa

A

129/fvb/f1/cast/spre

425

tatagctttatgtaaaa

B

b6

426

215

tttttt-attattgaa

A

129

427

tttttttttattattga

B

B6-FVB-F1

428

actcattgccaatttaa

C

129

429

actcattcagaatttaa

D

spre/CAST

430

216

atgcgtaatgggggcta

A

129

431

atgcgtaacgggggcta

B

b6/fvb/f1/cast/SPRE

432

attaattgctcttttaaa

C

129/b6/fvb/f1/cast

433

gtaattgctcttttaaa

D

spre

434

217

tctgattagtgatggat

A

129-F1

435

tctgatta-tgatggatt

B

B6

436

agcagagtgtctcgtaa

C

129

437

agcagagtatctcgtaa

D

spre/CAST

438

218

gctggcagatatcggta

A

129/b6/f1

439

gctggcaggtatcggta

B

fvb/cast

440

219

aactgcaatgaccagca

A

129-B6

441

aactgcaacgaccagca

B

FVB-F1

442

gctggtcattgcagttt

C

129

443

gttggtcgttacagttt

D

spre

444

gctggtcgttgcagttt

F

cast

445

220

gctggcagatatcggta

A

129-B6-F1

446

gctggcaggtatcggta

B

FVB

447

atagaaagtccaccgtc

C

129/cast

448

atagaaagcccaccgtc

D

spre

449

221

ttagtgaccgtgtaaac

A

129/b6/f1

450

ttagtgactgtgtaaac

B

fvb

451

ggggaggagctttgttc

C

129-Cast

452

ggggaggatctttgttc

D

Spre

453

222

ggcctggacacaaaagc

A

129/fvb/f1

454

ggcctggaaacaaaagc

B

b6

455

cccttttctagtattgt

C

29

456

cccttttccagtattgt

D

Cast-Spre

457

223

gaattggttttaggaat

A

129-F1-Cast-Spre

458

gaattggtattaggaat

B

B6

459

224

acccagctttccatggt

A

129/f1

460

acccagctctccatggt

B

b6/fvb/CAST

461

225

tcacgttcgggtacgtg

A

129/b6/f1

462

tcacgttcaggtacgtg

B

fvb/f1

463

tgccttccggttggcaa

C

129-Cast

464

tgccttccagttggcaa

D

Spre

465

226

ttttatcatacaattgc

A

129-F1

466

ttttatcagacaattgc

B

B6-FVB-F1

467

227

atcttctcttctttgag

A

129/f1

468

atcttctcctctttgag

B

b6/fvb

469

cagtcctctgctttctc

C

129-Cast

470

cagtcctcagctttctc

D

Spre

471

228

ccaagatacgaaaccag

A

129/f1/spre

472

ccaagatatgaaaccag

B

b6

473

229

ggtattcaagggttact

A

129/cast/spre

474

ggtattca-gggttactg

B

b6/fvb 1bp del

475

230

acctatggctcctcatc

A

129/b6/f1/cast

476

acctatggttcctcatc

B

fvb

477

231

ttttatcatacaattgc

A

129/f1

478

ttttatcagacaattgc

B

b6/fvb

479

232

aaccagggcttaagtct

A

129

480

aaccagggattaagtct

B

b6/fvb/f1

481

cagaaaaacagatatac

C

129-B6-FVB-F1

482

cagaaaaagagatatac

D

Spre

483

234

tctgagcgtgagtgctg

A

129/fvb

484

tctgagcgcgagtgctg

B

b6/f1/cast/spre

485

acctcagaagcggaggt

C

129-B6-FVB-F1

486

acctcggaaggggaggt

D

Spre

487

acctcggaagcggaggt

E

Cast

488

235

taactcgatcgctatca

A

129-B6-F1

489

taactcgcttgctatca

B

FVBN-Cast

490

taactcgctcgctatca

C

Spre

491

236

gaatttctcaacttctt

A

129/fvb/f1/spre

492

gaatttctgaacttctt

B

b6/f1

493

237

caggggtccccaatttg

A

129/f1/SPRE

494

caggggtctccaatttg

B

b6/fvb

495

238

ttttgctgtgc-aggcta

A

129-B6-F1

496

ttttactgtgccaggct

B

FVB

497

gacagccctgtctcaaa

C

129/cast

498

agagaaaccctgtctca

D

spre

499

239

gcaccggtctgagcagt

A

129/f1

500

gcaccggtttgagcagt

B

b6/fvb/f1

501

ccgtgcccctgaacaat

C

129-B6-FVB-F1-Cast

502

ccgtgcccttgaacaat

D

Spre

503

240

tcacgttcgggtacgtg

A

129/b6/f1

504

tcacgttcaggtacgtg

B

fvb/f1

505

tgattcgctgggactct

C

129-Cast

506

tgattcgccgggactct

D

Spre

507

241

ttgatatccgaggcctt

A

129/b6/fvb/f1

508

ttgatatctgaggcctt

B

f1/CAST/SPRE

509

242

tccctgggccaagcata

A

129/b6/fvb

510

tccctgggtcaagcata

B

f1

511

243

ttatggctgaggatcac

A

129-B6-F1-Cast

512

ttatggctgcggatcat

B

FVB

513

ttatggcaggggatcac

C

Spre

514

244

ctctctgcgctgaagca

A

129/b6

515

ctctctgctctgaagca

B

fvb/f1

516

agatacagagatgtgtt

C

129-B6-FVB-F1

517

agatactgaggtgtgtt

D

Spre

518

245

cgacatctggcagatgt

A

129/f1

519

cgacatctagcagatgt

B

b6/fvb

520

gtcacaaatagtatttc

C

129/cast

521

gtcacaaagagtatttc

D

Spre

522

246

aaggtgtgtgcgtgtgt

A

29/f1

523

aaggtgtgcgcgtgtgt

B

fvb

524

247

agtcttttttttcctga

A

129-B6-FVB

525

tagtc-tttttttt-cctgaa

B

F1

526

248

caggctgtgggaggctt

A

129/b6/f1

527

caggctgcggaaggctt

B

fvb

528

ctgtaagtcattcaata

C

129-B6-FVB-F1-Cast

529

ctgtaagtaattcaata

D

Spre

530

249

caggggtccccaatttg

A

129/f1

531

caggggtctccaatttg

B

b6/fvb

532

250

gactcatggccgccttg

A

129

533

gactcattgccgcctgg

B

B6-FVB-F1

534

gactcctggccgcctgg

C

F1

535

gactcctggctgcctgg

D

Spre

536

gactcctggccgcctgg

E

Cast

537

251

acaggga-ggaaggaag

A

129

538

acaggggaaggaaggaa

B

b6/fvb/f1

539

252

ttgatatagattgattc

A

129/b6/f1

540

ttgatatatattgattc

B

fvb/f1

541

atagaacagcaaagtaa

C

129-B6-FVB-F1-Cast

542

atagaacaacaaagtaa

D

Spre

543

253

aacaagcatctatggat

A

129/fvb/f1

544

aacaagcacctatggat

B

b6

545

DOP

300

gagcaggttaagcgatg

A

129/

546

gagcaggtgaagcgatg

B

B6

547

301

ggcttccagcttgattc

A

129/

548

ggcttccaacttgattc

B

B6

549

302

agatagggatgaatccc

A

129/

550

agataggggtgaatccc

B

B6

551

303

tcattcaccgtttattg

A

129/

552

tcattcactgtttattg

B

B6

553

304

ctgacatactgcttagg

A

129/

554

ctgacatattgcttagg

B

B6

555

305

ctaggaaagcctaaatt

A

129/

556

ctaggaaaacctaaatt

B

B6

557

306

atgtcaggattttaaga

A

129/

558

atgtcagggttttaaga

B

B6

559

307

ggtttccaattggaaag

A

129/

560

ggtttccaguggaaag

B

B6

561

308

cgaggagtgcaaagcga

A

129/

562

cgaggagtccaaagcga

B

B6

563

309

tgtgtgtgtgtctgtct

A

129/

564

tgtgtgtgcgtctgtct

B

B6

565

310

gcaagatgcagctgcat

A

129/

566

gcaagatgtagctgcat

B

B6

567

311

gctggggctattctgta

A

129/

568

gctggggccattctgta

B

B6

569

312

caataacggacctgcct

A

129/

570

caataacgaacctgcct

B

B6

571

313

tagcctctctacatagg

A

129/

572

tagcctctgtacatagg

B

B6

573

ASO name

ASO sequence

12-01

104-01

884-01

1331-01

3A-G

CATCTATAGGTTCACTT

GT

TT

TT

TT

574

3A-T

CATCTATATGTTCACTT

575

5A-C

GCCAACAACATTGAGAG

GG

CG

GG

GG

576

5A-G

GCCAACAAGATTGAGAG

577

7A-C

GGGTCGTGCGTCCCCCT

TT

CT

TT

TT

578

7A-T

GGGTCGTGTGTCCCCCT

579

9A-A

ATTGTCTCACATTTCTT

AA

GG

AA

AA

580

9A-G

CATTGTCTCGCATTTCTT

581

12A-C

DGGTGTGGTCGCAGAAGG

CC

CC

CT

CT

582

12A-T

AGGTGTGGTTGCAGAAGG

583

15A-A

TCATTGCCACACTTGAA

AA

GG

AA

GG

584

15A-G

ArCATTGCCGCACTTGAA

585

20A-A

ATCTGTCTACAATGATC

AG

GG

AA

AG

586

20A-G

ATCTGTCTGCAATGATC

587

22A-A

BGGCTGGGCACAGTGGCT

AA

GG

AA

AA

588

22A-G

GGCTGGGCGCAGTGGCT

589

34A-A

CAGCCTGGAGAACAAGT

CC

CC

CC

AC

590

34A-C

CAGCCTGGCGAACAAGT

591

39A-C

TTTGACACCCGGAAGCT

CT

CC

CC

CC

592

39A-T

TTTGACACTCGGAAGCT

593

40A-C

CTGCCTTTCATACTGCC

CT

TT

CT

TT

594

40A-T

CTGCCTTTTATACTGCC

595

40B-C

ACAATAGACGTTCCCCG

TT

CT

TT

CT

596

40B-T

ACAATAGATGTTCCCCG

597

41A-A

GGTGTTTGATTTGTACT

CC

AC

CC

CC

598

41A-C

GGTGTTTGCTTTGTACT

599

42A-A

TCCAACTCAAAAAATGT

AT

AA

AT

AT

600

42A-T

TCCAACTCTAAAAATGT

601

44A-C

GGGCCGCTCACAGTCCA

CC

CT

CC

CC

602

44A-T

GGGCCGCTTACAGTCCA

603

44B-C

GCATGGCTCGTGGGTTT

CT

CT

TT

CT

604

44B-T

GCATGGCTTGTGGGTTT

605

46A-G

GTTGGGAAGTGGAGCGG

GG

TT

GG

TT

606

46A-T

GTTGGGAATTGGAGCGG

607

50A-A

AAGGGATGAGGATGTGA

AG

AA

AA

AG

608

50A-G

AAGGGATGGGGATGTGA

609

50B-A

TCCTCGAGAGCTTTGCT

AG

AG

AA

AG

610

50B-G

TCCTCGAGGGCTTTGCT

611

51A-C

TGACAATGCGTGCCCAA

CT

CC

CC

CC

612

51A-T

TGACAATGTGTGCCCAA

613

53A-A

TCCATGTCATAGATTTC

AG

AA

AA

AA

614

53A-G

TCCATGTCGTAGATTTC

615

66A-A

TGGAGGACAGTGGAGGG

TT

TT

TT

AT

616

66A-T

TGGAGGACTGTGGAGGG

617

69A-C

ACCCATTTCCTGAAAAT

TT

CT

TT

TT

618

69A-T

ACCCATTTTCTGAAAAT

619

71A-G

CTGAGTTCGGCACTGCT

TT

GG

GG

TT

620

71A-T

CTGAGTTCTGCACTGCT

621

71B-G

ACCAGTTTGGCTCAAAG

GG

TT

TT

GG

622

71B-T

ACCAGTTTTGCTCAAAG

623

72A-A

CCAATCAGAACGTGCAG

AA

GG

GG

AA

624

72A-G

CCAATCAGAGCGTGCAG

625

73A-A

ACCCACACAGACACTGC

AA

AT

TT

AT

626

73A-T

ACCCACACTGACACTGC

627

81A-C

GGACAAAGCGCTGGTGT

TT

CT

CC

CT

628

81A-T

GGACAAAGTGCTGGTGT

629

81C-C

AGCTGGTCCCCCTMCCC

TT

CT

CC

CC

630

81C-T

AGCTGGTCTCCCTMCCC

631

90A-A

GGTGTAGTAAGCACAGC

AA

AA

AC

AA

632

90A-C

GGTGTAGTCAGCACAGC

633

91A-C

AGCGAACACGGGGGAAA

CC

CC

TT

CC

634

91A-T

AGCGAACATGGGGGAAA

635

98D-A

GTGACAGCACCAAACTT

GG

AG

GG

GG

636

98D-G

GTGACAGCGCCAAACTT

637

101A-C

GTCTGTTGCTGTTATTT

TT

TT

TT

CT

638

101A-T

GTCTGTTGTTGTTATTT

639

111A-A

ACCAGCATAGCCCAGAG

GG

GG

GG

AG

640

111A-G

ACCAGCATGGCCCAGAG

641

111B-A

CGTAGGAGACAAGACCT

GG

GG

GG

AG

642

111B-G

CGTAGGAGGCAAGACCT

643

117A-A

CTCTGCTGAATCTCCCA

GG

GG

AG

644

117A-G

CTCTGCTGGATCTCCCA

645

124A-A

AAGCAAAGACTGATTCA

TT

AT

TT

TT

646

124A-T

AAGCAAAGTCTGATTCA

647

125A-A

AGGCAGCTAGAGGGAGA

CC

AA

AC

AA

648

125A-C

AGGCAGCTCGAGGGAGA

649

130C-C

TTCCATTCCGTTCAATT

TT

TT

TT

CC

650

130C-T

TTCCATTCTGTTCAATT

651

130D-C

TATTGTTACTGATTTTG

CT

CT

CT

TT

652

130D-T

TATTGTTATTGATTTTG

653

136A-A

GAGCTTTCAGAGGCTGA

AA

AG

AG

AG

654

136A-G

GAGCTTTCGGAGGCTGA

655

137A-A

GGGGGAAGATATGGAGT

GG

AG

AA

AG

656

137A-G

GGGGGAAGGTATGGAGT

657

143A-C

CATGGCCTCGTGGGTTT

TC

TC

TT

TC

658

143A-T

CATGGCCTTGTGGGTTT

659

147B-A

GGGKAGGGAGACCAGCT

AA

AG

GG

GG

660

147B-G

GGGKAGGGGGACCAGCT

661

147C-A

GCAGTGTCAGTGTGGGT

TT

AT

AA

AT

662

147C-T

GCAGTGTCTGTGTGGGT

663

147D-A

ACACCAGCACTTTGATC

AA

AG

GG

AG

664

147D-G

ACACCAGCGCTTTGATC

665

151A-A

CCTTCTGCAACCACACC

GG

GG

AG

AG

666

151A-G

CCTTCTGCGACCACACC

667

163A-A

AAATTCGCAGGAGCCGA

GG

AG

GG

GG

668

163A-G

AAATTCGCGGGAGCCGA

669

164B-A

AGGTCTAGACGCTCACC

AG

GG

AG

GG

670

164B-G

AGGTCTAGGCGCTCACC

671

164C-A

GGAGGAACACTTCAAAC

GG

AG

GG

GG

672

164C-G

GGAGGAACGCTTCAAAC

673

170A-A

TTTGTGCTATACCTTGA

AA

AG

AG

AG

674

170A-G

TTTGTGCTGTACCTTGA

675

179A-C

ATGATGCACACACCCTG

CT

CC

TT

CC

676

179A-T

ATGATGCATACACCCTG

677

181B-C

TATTGCTCCGCCTCCTC

CT

TT

CC

TT

678

181B-T

TATTGCTCTGCCTCCTC

679

181D-C

CTCAGAGACTGTGTGCC

CG

CC

CC

CC

680

181D-G

CTCAGAGAGTGTGTGCC

681

187A-C

ATCTTCTGCGTCACTCA

CT

CT

CC

CC

682

187A-T

ATCTTCTGTGTCACTCA

683

187B-A

CAGCATCTAGTAACCAC

AG

AA

GG

AG

684

187B-G

CAGCATCTGGTAACCAC

685

190A-C

ATTAGTGCCAAATACAT

CC

CC

CT

CT

686

190A-T

ATTAGTGCTAAATACAT

687

195B-A

TGCTCCACAGCAGCCGT

AT

TT

TT

TT

688

195B-T

TGCTCCACTGCAGCCGT

689

196A-A

TAGGGGAGAATCTGTTT

CC

AC

AC

AA

690

196A-C

TAGGGGAGCATCTGTTT

691

The invention also encompasses a composition comprising a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment including a (N)

x

-TARGET polynucleotide structure as described above, i.e., wherein the TARGET portion is identical in all of the DNA fragments of each RCG, the portion includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein each N is any nucleotide residue. Preferably the TARGET portion includes at least 8 nucleotides residues.

In other aspects, the invention includes a method for performing DOP-PCR. The prior art DOP-PCR technique was originally developed to amplify the entire genome in cases where DNA was in short supply. This method is accomplished using a primer set wherein each primer has an arbitrarily selected six nucleotide residue portion, at its 3′ end. The complexity of the resultant product is extremely high due to the short length and results in amplification of the genome. By increasing the length of the arbitrarily selected of the DOP-PCR primer from 6 nucleotides to 7, and preferably 8, or more nucleotide residues the complexity of the genome is significantly reduced.

EXAMPLE

Example 1

Identification and Isolation of SNPs

High allele frequency SNPs are estimated to occur in the human genome once every kilobase or less (Cooper et al., 1985). A method for identifying these SNPs is illustrated in FIG.

1

. As shown in

FIG. 1

, inter-Alu PCR was performed on genomes isolated from three unrelated individuals. The PCR products were cloned, and a mini library was made for each of the 3 individuals. The library clone inserts were PCR-amplified and spotted on nylon filters. Clones were matched by hybridization into two sets of identical clones from each individual, for a total of 6 clones per matched clone set. These sets of clones were sequenced, and the sequences were compared in order to identify SNPs. This method of identifying SNPs has several advantages over the prior art PCR amplification methods. For instance, a higher quality sequence is obtained from cloned DNA than is obtained from cycle sequencing of PCR products. Additionally, every sequence represents a specific allele, rather than potentially representing a heterozygote. Finally, sequencing ambiguities, Taq polymerase errors, and other source of sequence error particular to one representation of the sequence are reduced by application of an algorithm which requires that the same variant sequence be present in at least 2 of the 6 clones sampled.

In general, the Alu PCR method for identifying SNPs can be performed using genomic DNA obtained from independent individuals, unrelated or related. Briefly, Alu PCR is performed which yields a product having an estimated complexity of approximately 100 different single copy genomic DNA sequences and an average sequence length of between about 500 base pairs and 1 kilobase pairs. The PCR products are cloned, and a mini library is made for each individual. Approximately 800 clones are selected from each library and transferred into a 96-well dish. Filter replicas of each plate are hybridized with PCR probes from individual clones selected from one of the libraries in order to create a matched clone set of 6 clones, 2 from each individual. Many sets of clones can be isolated from these libraries. The clones can be sequenced and compared to identify SNPs.

Methods

An Alu primer designated primer 8C was designed to produce an Alu PCR product having a complexity of approximately 100 independent products. Primer 8C (having the nucleotide sequence CTT GCA GTG AGC CGA GATC; SEQ ID NO: 3) is complementary with base pairs 218-237 of the Alu consensus sequence (Britten et al., 1994). In order to reduce the complexity of the product, however, the last base pair of the primer was selected to correspond to base pair 237 of the consensus sequence, a nucleotide which has been shown to be highly variable among Alu sequences. Primer 8C therefore produces a product having complexity lower than that produced using Alu primers which match a segment of the Alu sequence in which there is little variation in nucleotide sequence among Alu family members.

Preliminary experiments were conducted to estimate the complexity of the product produced by Alu PCR reaction with primer 8C on the CEPH Mega Yacs. These preliminary experiments confirmed that primer 8C produced a lower number of Alu PCR products than other Alu PCR primers closely matching less variable sequences in the Alu consensus.

Three libraries of Alu PCR products were produced from inter-Alu PCR reactions involving genomic DNA derived from three unrelated CEPH individuals designated 201, 1701, and 2301. The reactions were performed at an annealing temperature of 58° C. for 32 cycles using the 8C Alu primer. Each set of PCR reaction products was purified by phenol:chloroform extraction followed by ethanol precipitation. The products were shotgun cloned into the T-vector pCR2.1 (Invitrogen); electroporated into

E. coli

strain DH10B Electromax ampicillin-containing LB agar plates. 768 colonies were picked from each of the three libraries into eight 96-well format plates containing LB+ampicillin and grown overnight. The following day, an equal volume of glycerol was added and the plates were stored at −80° C. An initial survey of the picked clones indicated an average insert size of between 500 base pairs and 1 kilobase pair.

To identify matching clones in each library, 1 microliter of an overnight culture made from each library plate well was subjected to PCR amplification using vector-derived primers. Amplified inserts were spotted onto Hybond™ N+ filters (Amersham) using a 96-pin replicating device such that each filter had 384 products present in duplicate. The DNA was subjected to alkali denaturation by standard methods and fixed by baking at 80° C. for 2 hours. Individual inserts derived from the library were radiolabeled by random hexamer priming and used as probes against the three libraries (6 filters per probe). Hybridization was carried out overnight at 42° C. in buffer containing 50% formamide as described in Sambrook et al. The following day, the filters were washed in 2×standard saline citrate (SSC), 0.1% SDS at room temperature for minutes, followed by 2 washes in 0.1×X SSC, 0.1% SDS at 65° C. for 45 minutes each. The filters were then exposed to Kodak X-OMAT X-ray film overnight.

Results

FIG. 2

shows the data obtained for identification of SNPs. The results of the gel electrophoresis of inter-Alu PCR genomic DNA products prepared using the 8C primer is shown in FIG.

2

A. Mini libraries were prepared from the Alu PCR genomic DNA products. Colonies were picked from the libraries, and inserts were amplified. The inserts were separated by gel electrophoresis to demonstrate that each was a single insert. The gel is shown in FIG.

2

B. Once the individual amplified inserts were spotted on Hybond™ N+ filters, the inserts were radiolabeled by random hexamer primary and used as probes of the entire contents against the three mini libraries. One of the filters, having 2 positive or matched clones, is shown in FIG.

2

C.

The results of screening 330 base pairs of genomic DNA by the matched clone method led to the identification of 6 SNPs, 4 in single copy DNA, 2 in the flanking Alu sequence. These observations were consistent with the projected rate of SNP currents of 1 high frequency SNP per 1,000 base pairs or less. The single copy SNPs identified are presented below in Table I.

TABLE 1

CEPH

Individual

1

2

3

4

201

taagtGtacaa

cccacGgagaa

aattgCttccc

aaattcaatgt

(SEQ ID

(SEQ ID

(SEQ ID

(SEQ ID

NO. 5)

NO. 7)

NO. 9)

NO. 11)

taagtGtacaa

cccacGgagaa

aattgCttccc

aaattCaatgt..

(SEQ ID

(SEQ ID

(SEQ ID

(SEQ ID

NO. 5)

NO. 7)

NO. 9)

NO. 11)

1701

taagtAtacaa

cccacAgagaa

aattgCttccc

aaattcaatgt..

(SEQ ID

(SEQ ID

(SEQ ID

(SEQ ID

NO. 6)

NO. 8)

NO. 9)

NO. 11)

taagtGtacaa

cccacGgagaa

aattgTttccc

aaattCaatgt..

(SEQ ID

(SEQ ID

(SEQ ID

(SEQ ID

NO. 5)

NO. 7)

NO. 10)

NO. 11)

2301

taagtGtacaa

cccacAgagaa

aattgCttccc

aaattAaatgt..

(SEQ ID

(SEQ ID

(SEQ ID

(SEQ ID

NO. 5)

NO. 8)

NO. 9)

NO. 12)

taagtGtacaa

cccacGgagaa

aattgTttccc

aaattCaatgt..

(SEQ ID

(SEQ ID

(SEQ ID

(SEQ ID

NO. 5)

NO. 7)

NO. 10)

NO. 11)

To verify the identities of the SNPs shown in Table I, specific primers were synthesized which permitted amplification of each single copy locus. Cycle sequencing was then performed on PCR products from each of the three unrelated individuals, and the site of the putative SNP was examined. In all cases, the genotype of the individual derived by cycle sequencing was consistent with the genotype observed in the matched clone set.

Example 2

Allele-specific Oligonucleotide Hybridization to Alu PCR SNPs

Methods

Inter-Alu PCR was performed using genomic DNA obtained from 136 members of 8 CEPH families (numbers 102, 884, 1331, 1332, 1347, 1362, 1413, and 1416) using the 8C Alu primer, as described above. The products from these reactions were denatured by alkali treatment (10-fold addition of 0.5 M NaOH, 2.0 M NaCl, 25 mM EDTA) and dot blotted onto multiple Hybond™ N+ filters (Amersham) using a 96-well dot blot apparatus (Schleicher and Schull). For each SNP, a set of two allele-specific oligonucleotides consisting of two 17-residue oligonucleotides centered on the polymorphic nucleotide residue were synthesized. Each filter was hybridized with 1 picomole

32

P-kinase labeled allele-specific oligonucleotides and a 50-fold excess of non-labeled competitor oligonucleotide complementary to the opposite allele (Shuber et al., 1993). Hybridizations were carried out overnight at 52° C. in 10 mL TMAC buffer 3.0 M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO

4

, pH 6.8, 5×Denhardt's solution, 40 micrograms/milliliter yeast RNA). Blots were washed for 20 minutes at room temperature in TMAC wash buffer (3 M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM Na

3

PO

4

pH 6.8) followed by minutes at 52° C. (52° C.-52° C. is optimal). The blots were then exposed to Kodak X OMAT AR X-ray film for 8-24 hours and genotypes were determined by the hybridization pattern.

Results

The results of the genotyping and mapping are shown in FIG.

3

. In order to determine the map location of the SNP, the genotype data determined from CEPH families number 884 and 1347 were compared to the CEPH genotype database version 8.1 (HTTP:\\www.cephb.fr/cephdb/) by calculating a 2 point lod score using the computer software program MultiMap version 2.0 running on a Sparc Ultra I computer. This analysis revealed a linkage to marker D3S1292 with a lod score of 5.419 at a theta value of 0.0. To confirm this location, PCR amplification of the CCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel (Research Genetics). This analysis placed marker CCRSNP1 at 4.40 cR from D3S3445 with a lod score greater than 15.0. Integrated maps from the genetic location database (Collins et al., 1996) indicated that the locations of the markers identified by these two independent methods are overlapping. These results support the mapping of even low frequency polymorphisms by two point linkage to markers previously established on CEPH families.

Of the dot blots performed on each CEPH family PCR, two families were informative at this SNP locus, namely families number, 884 and 1347. The dot blot is shown in FIG.

3

A. Lines are drawn around signals representing CEPH family 884 on the dot blot shown in

FIGS. 3A and 3B

. Allele-specific oligonucleotide hybridizations were performed on the filters shown in

FIGS. 3A and 3B

under TMAC buffer conditions with G allele-specific oligonucleotide (

FIG. 3A

) and A allele-specific oligonucleotide (FIG.

3

B). The pedigree of CEPH family number 884 with genotypes as scored from the filter shown in

FIGS. 3A and 3B

is shown in FIG.

3

C. The DNA was not available for one individual in this pedigree, and that square is left blank. Mapping of CCRSNP1 was performed by two independent methods. First, genotype data from informative CEPH families numbers 884 and 1347 were compared to the CEPH genotype database version 8.1 by calculation of a 2 point lod score. Secondly, PCR amplification of the CCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel. The highest lod scores determined by these analyses were D3S1292 and D3S3445, respectively, as shown in FIG.

3

D.

The percentage of SNPs detected using the above-described methods is dependent on the number of chromosomes sampled, as well as the allele frequency.

Example 3

Confirmation of SNP Identity

Allele-specific oligonucleotides are synthesized based on standard protocols (Shuber et al., 1997). Briefly, polynucleotides of 17 bases centering on the polymorphic site are synthesized for each allele of a SNP. Hybridization with DNA dots of IRS or DOP-PCR products affixed to a membrane were performed, followed by hybridization to end labeled allele-specific oligonucleotides under TMAC buffer conditions. These conditions are known to equalize the contribution of AT and GC base pairs to melting temperature, thereby providing a uniform temperature for hybridization of allele-specific oligonucleotides independent of nucleotide composition.

Using this methodology, genotypes of CEPH progenitors and their offspring are determined. The Mendelian segregation of each SNP marker confirms its identity as a SNP marker and accrued estimate of its relative allele frequency, hence, its likely usefulness as a genetic marker. Markers which yield complex segregation patterns or show very low allele frequencies on CEPH progenitors are set aside for future analysis, and remaining markers are further characterized.

Example 4

Development of Detailed Information on Map Position and Allele Frequency for Each SNP

Two complementary methods are used to establish genetic map position for each marker. Each marker is genotyped on a number of CEPH families. The result is compared, using MultiMap (Matise et al., 1993, as described above) or other appropriate software, against the CEPH database to determine by linkage the most likely position of the SNP marker.

Allele frequencies are determined by hybridization with the standard worldwide panel which U.S. NIH currently is making available to researchers for standardization of allele frequency comparison. Allele-specific oligonucleotide methodology used for genetic mapping is used to determine allele frequency.

Example 5

Development of a System for Scoring Genotype Using SNPs

After the identification of a set of SNPs, automated genotyping is performed. Genomic DNA of a well-characterized set of subjects, such as the CEPH families, is PCR-amplified using appropriate primers. These DNA samples serve as the substrate for system development. The DNA is spotted onto multiple glass slides for genotyping. This process can be carried out using a microarray spotting apparatus which can spot greater than 1,000 samples within a square centimeter area or more than 10,000 samples on a typical microscope slide. Each slide is hybridized with a fluorescently tagged allele-specific oligonucleotide under TMAC conditions analogous to those described above. The genotype of each individual is determined by the presence or absence of a signal for a selected set of allele-specific oligonucleotides. A schematic of the method is shown in FIG.

4

.

PCR products are attached to the slide using any methods for attaching DNA to a surface that are known in the art. For instance, PCR products may be spotted onto poly-L-lysine-coated glass slides, and crosslinked by UV irradiation prior to hybridization. A second, more preferred method, which has been developed according to the invention, involves use of oligonucleotides having a 5′ amino group for each of the PCR reactions described above. The PCR products are spotted onto silane-coated slides in the presence of NaOH to covalently attach the products to the slide. This method is advantageous because a covalent bond is formed, which produces a stable attachment to the surface.

SNP-ASO are hybridized under TMAC hybridization conditions with the RCGs covalently conjugated to the surface. The allele-specific oligonucleotides are labeled at their 5′-ends with a fluorescent dye, (e.g., Cy3). After washing, detection of the fluorescent oligonucleotides is performed in one of two ways. Fluorescent images can be captured using a fluorescence microscope equipped with a CCD camera and automated stage capabilities. Alternatively, the data can be obtained using a microarray scanner (e.g. one made by Genetic Microsystems). A microarray scanner provides image analysis which can be converted to a digital (e.g. +/−) signal for each sample using any of several available software applications (e.g., NIH image, ScanAnalyze, etc.). The high signal/noise ratio for this analysis allows for the determination of data in this mode to be straightforward and automated. These data, once exported, can be manipulated to conform with a format which can be analyzed by any of several human genetics applications such as CRI-MAP and LINKAGE software. Additionally, the methods may involve use of two or more fluorescent dyes or other labels which can be spectrally differentiated to reduce the number of samples which need to be analyzed. For instance, if four fluorescent spectrally distinct dyes, (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used, then four hybridization reactions can be performed in a single hybridization mixture.

Example 6

Reduction of Genome Complexity Using IRS-PCR or DOP-PCR

The initial step of the SNP identification method and the genotyping approach described above is to reduce the complexity of genomic DNA in a reproducible manner. The purpose of this step with respect to genotying is to allow genotyping of multiple SNPs using the products of a single PCR reaction. Using the IRS-PCR approach, a PCR primer was synthesized which bears homology to a repetitive sequence present within the genome of the species to be analyzed (e.g., Alu sequence in humans). When two repeat elements bearing the primer sequence are present in a head-to-head fashion within a limited distance (approximately 2 kilobase pairs), the inter-repeat sequence can be amplified. The method has the advantage that the complexity of the resultant PCR can be controlled by how closely the nucleotide sequence primer chosen is to the consensus nucleotide sequence of the repeat element (that is, the closer to the repeat consensus, the more complex the PCR product).

In detail, a 50 microliter reaction for each sample was set up as follows:

distilled, deionized H

2

O (ddH

2

O)

30.75

10× PCR Buffer

5

μl

(500 mM KCl, 100 mM Tris-HCl pH 8.3, 15 mM MgCl

2

μM, 0.1% gelatin)

1.25 mM dNTPs

7.5

μl

20 μm Primer 8C

1.5

μl

Taq polymerase (1.25 units)

0.25

μl

Template (50 ng genomic DNA in ddH

2

O)

5.0

μl

50

ul total

The PCR reaction was performed, for example, in a Perkin Elmer 9600 thermal cycler under the following conditions:

1 min.

94° C.

30 sec.

94° C.

45 sec.

58° C.

32 cycles

90 sec.

72° C.

10 min.

72° C.

Hold

4° C.

An aliquot of the reaction mixture was separated on an agarose gel to confirm successful amplification.

RCGs were also performed using DOP-PCR with the following primer (CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4) (wherein N is any nucleotide). DOP-PCR uses a single primer which is typically composed of 3 parts, herein designated tag-(N)

x

-TARGET. The TARGET portion is a polynucleotide which comprises at least 7, and preferably at least 8, arbitrarily-selected nucleotide residues, x is an integer from 0 to 9, and N is any nucleotide residue. Tag is a polynucleotide as described above.

The initial rounds of DOP-PCR were performed at a low temperature, because the specificity of the reaction is determined primarily by the nucleotide sequence of the TARGET portion and the N

x

residues. A slow ramp time during these cycles insures that the primers do not detach from the template prior to chain extension. Subsequent amplification rounds were carried out at a higher annealing temperature because of the fact that the 5′ end of the DOP-PCR primer can also contribute to primer annealing.

The DOP-PCR method was performed using a reaction mixture comprising the following ingredients:

distilled deionized H

2

O

24

μl

10× PCR Buffer

5

μl

1.25 mM dNTPs

8

μl

20 μM Primer DOP-BJ1 (SEQ ID No. 4)

7.5

μl

Taq polymerase (1.25 units)

0.5

μl

Template

5

μl

(50 ng genomic DNA in distilled deionized H

2

O)

50

μl

The PCR reaction was performed, for example, in a Perkin Elmer 9600 thermal cycler using the following reaction conditions:

1 min.

94° C.

1 min.

94° C.

1.5 min.

45° C.

2 min. ramp to

72° C.

5 cycles

3 min.

72° C.

1 min.

94° C.

1.5 min.

58° C.

35 cycles

3 min.

72° C.

10 min.

72° C.

Hold

4° C.

Example 7

Attachment of PCR Products to a Solid Support

Once the complexity of the genomic DNA from an individual has been reduced, it can be attached to a solid support in order to facilitate hybridization analysis. One method of attaching DNA to a solid support involves spotting PCR products onto a nylon membrane. This protocol was performed as follows:

Upon completion of the PCR reaction (typically in a 50 μl reaction mixture), a 10-fold amount of denaturing solution (500 mM NaOH, 2.0 M NaCl, 25 mM EDTA) and a small amount (5 ul) of India Ink were added. Sixty microliters of product was applied to a pre-wetted Hybond™ N+ membrane (Amersham) using a Schleicher and Schull 96-well dot blot apparatus. The membrane was immediately removed and placed DNA side up on top of Whatmann 3MM paper saturated with 2×SSC for 2 minutes. The filters were air-dried and the DNA was fixed to the membrane by baking in an 80° C. oven for 2 hours. The membranes were then used for hybridization.

Another method for attaching nucleic acids to a support involves the use of microarrays. This method attaches minute quantities of PCR products samples onto a glass slide. The number of samples that can be spotted is greater than 1000/cm

2

, and therefore over 10,000 samples can be analyzed simultaneously on a glass slide. To accomplish this, pre-cleaned glass slides were placed in a mixture of 80 ml dry xylene, 32 ml 96% 3-glycidoxy-propyltrimethoxy silane, and 160 μl 99% N-ethyldiisopropylamin at 80° C. overnight. The slides were rinsed for 5 minutes in ethylacetate and dried at 80° C. for 30 minutes. An equal volume of 0.8 M NaOH (0.6M NaOH and 0.6-0.8M KOH also works) was added directly to the PCR product (which contained a 5′ amino group incorporated into the PCR primer) and the components were mixed. The resulting solution was spotted onto a glass slide under humid conditions. At the earliest opportunity, the slide was placed in a humid chamber overnight at 37° C. The next day, the slide was removed from the humid chamber and kept at 37° C. for an additional 1 hour. The slide was incubated in an 80° C. oven for 2.5 hours, and then washed for 5 minutes in 0.1% SDS. The slide was washed for an additional 5 minutes in ddH20 and air dried. Attachment to the slide was monitored by OilGreen staining (obtained from Molecular Probes), which specifically binds single-stranded DNA.

Example 8

Hybridization Using Allele Specific Oligonucleotides for Each SNP

In order to determine the genotype of an individual at a selected SNP locus, we employed allele-specific oligo hybridizations. Using this method, 2 hybridization reactions were performed at each locus. The first hybridization reaction involved a labeled (radioactive or fluorescent) SNP-ASO (typically 17 nucleotides residues) centered around and complementary to one allele of the SNP. To increase specificity, a 20 to 50-fold excess of non-labeled SNP-ASO complementary to the opposite allele of the SNP was included in the hybridization mixture. For the second hybridization, the allele specificity of the previously labeled and non-labeled SNP-ASOs was reversed. Hybridization occurred in the presence of TMAC buffer, which has the property that oligonucleotides of the same length have the same annealing temperature.

Specifically, for analysis of each SNP, a pair of SNP allele-specific oligos (SNP-ASOs) consisting of two 1 7mers centered around the polymorphic nucleotide were synthesized. Each filter was hybridized with 20 pmol

33

P-labeled kinase labeled SNP-ASO (0.66 pmol/ml) and a 50-fold excess of non-labeled competitor oligonucleotide complementary to the other allele of the SNP. Hybridizations was performed overnight at 52° C. in 10 ml TMAC buffer (3.0M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO

4

6.8, 5×Denhardt's solution, 40 μg/ml yeast RNA). Blots were washed for 20 minutes at room temperature in TMAC Wash Buffer (3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM Na

3

PO

4

pH 6.8) followed by 20 minutes washing at 52° C. The blots were exposed to Kodak X-OMATAR X-ray film for 8-24 hours, and genotypes were determined by analyzing the hybridization pattern.

Example 9

Scoring the Hybridization Pattern for Each Sample to Determine Genotype

Hybridization of SNP-ASOs (2 for each locus) to with IRS-PCR or DOP-PCR products of several individuals has been performed. The final step in this process is to determine if a positive or negative signal exists for each hybridization for an individual and then, based on this information, determine the genotype for that particular locus. Essentially, all of the detection methods described herein can be reduced to a digital image file, for example using a microarray reader or using a phosphoimager. Presently, there are several software products which will overlay a grid onto the image and determine the signal strength value at each element of the grid. These values are imported into a spreadsheet program, like Microsoft Excel™, and simple analysis is performed to assign each signal a + or − value. Once this is accomplished, an individual's genotype can be determined by its pattern of hybridization to the SNP alleles present at a given loci.

Example 10

Genomic Analysis Using DOP-PCR

Genomic DNA isolated from approximately 40 individuals was subjected to DOP-PCR using primer BJ1 (CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4). 100 microliter of the DOP-PCR mixture was precipitated by addition of 10 microliters 3M sodium acetate (pH 5.2) and 110 microliters of isopropanol and were stored at −20° C. for at least 1 hour. The samples were spun down in a microcentrifuge for 30 minutes and the supernatant was removed. The pellets were rinsed with 70% ethanol and spun again for 30 minutes. The supernatant was removed and the pellets were air-dried overnight at room temperature.

The pellets were then resuspended in 12 microliters of distilled water and stored at −20° C. until denatured by the addition of 3 microliter of 2N NaOH/50 mM EDTA and maintained at 37° C. for 20 minutes and then at room temperature for 15 minutes. The samples were then spotted onto nylon coated-glass slides using a Genetic Microsystems GMS417 microarrayer. Upon completion of the spotting, the slides were placed in an 80° C. vacuum oven for 2 hours, and then stored at room temperature. A set of 2 allele specific SNP-ASOs consisting of two 17mers centered around a polymorphic nucleotide residue were synthesized. Each slide was prehybridized for 1 hour in Hyb Buffer (3M TMAC/0.5% SDS/1 mM EDTA/10 mM NaPO

4

/5×Denhardt's solution/40 μg/ml yeast RNA) followed by hybridization with 0.66 picomoles per milliliter

33

P-labeled kinase labeled SNP-ASO and a 50-fold excess of cold-competitor SNP-ASO of the opposite allele in Hyb Buffer. Hybridizations were carried out overnight at 52° C. The slides were washed twice for 30 minutes at room temperature in TMAC Wash Buffer (3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO

4

pH 6.8) followed by 20 minutes at 54° C. The slides were exposed to Kodak BioMax MR X-ray film. The results are shown in FIG.

8

. The genotypes were determined by the hybridization patterns shown in

FIG. 8

wherein loci are indicated.

The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the invention. The present invention is not limited in scope by the examples provided, since the examples are intended as illustrations of various aspect of the invention and other functionally equivalent embodiments are within the scope of the invention. Various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description and fall within the scope of the appended claims. The advantages and objects of the invention are not necessarily encompassed by each embodiment of the invention.

All references, patents and patent publications that are recited in this application are incorporated in their entirety herein by reference.

691

1

9

DNA

Homo Sapiens

variation

(4)...(6)

N = A, C, G or T

1
cagnnnctg 9

2

13

DNA

Homo Sapiens

2
tttttttttt cag 13

3

19

DNA

Homo Sapiens

3
cttgcagtga gccgagatc 19

4

20

DNA

Homo Sapiens

variation

(7)...(12)

N = A, C, G or T

4
ctcgagnnnn nnaagcgatg 20

5

11

DNA

Homo Sapiens

5
taagtgtaca a 11

6

11

DNA

Homo Sapiens

6
taagtataca a 11

7

11

DNA

Homo Sapiens

7
cccacggaga a 11

8

11

DNA

Homo Sapiens

8
cccacagaga a 11

9

11

DNA

Homo Sapiens

9
aattgcttcc c 11

10

11

DNA

Homo Sapiens

10
aattgtttcc c 11

11

11

DNA

Homo Sapiens

11
aaattcaatg t 11

12

11

DNA

Homo Sapiens

12
aaattaaatg t 11

13

24

DNA

Homo Sapiens

13
attaaaggcg tgcgccacca tgcc 24

14

18

DNA

Homo Sapiens

14
tttatgaagg cataaaaa 18

15

18

DNA

Homo Sapiens

15
tttatggagg cataaaaa 18

16

18

DNA

Homo Sapiens

16
tttatgaagg tataaaaa 18

17

17

DNA

Homo Sapiens

17
ctgggctgta ttcattt 17

18

17

DNA

Homo Sapiens

18
ctgggctgca ttcattt 17

19

17

DNA

Homo Sapiens

19
tctgcctcct gagtgct 17

20

17

DNA

Homo Sapiens

20
tctacctccc aagtgct 17

21

17

DNA

Homo Sapiens

21
tagctagaat caagctt 17

22

17

DNA

Homo Sapiens

22
tagctagagt caagctt 17

23

17

DNA

Homo Sapiens

23
gctgtgcaac aaatcac 17

24

17

DNA

Homo Sapiens

24
cagctgtgca aatcacc 17

25

17

DNA

Homo Sapiens

25
tttcgtgatg tttctat 17

26

17

DNA

Homo Sapiens

26
tttcgtgaat gtttcta 17

27

17

DNA

Homo Sapiens

27
cactgtctac atcttta 17

28

17

DNA

Homo Sapiens

28
cactgtctcc atcttta 17

29

17

DNA

Homo Sapiens

29
taacattctt gaagcca 17

30

17

DNA

Homo Sapiens

30
taacattcct gaagcca 17

31

17

DNA

Homo Sapiens

31
gcttccattt cctaagg 17

32

17

DNA

Homo Sapiens

32
gcttccactt cctaagg 17

33

17

DNA

Homo Sapiens

33
aggaatggca ataatcc 17

34

17

DNA

Homo Sapiens

34
aggaatggcg ataatcc 17

35

17

DNA

Homo Sapiens

35
aggaatgaca ataatcc 17

36

17

DNA

Homo Sapiens

36
ttaaattcgt aaatgga 17

37

17

DNA

Homo Sapiens

37
ttaaattcat aaatgga 17

38

17

DNA

Homo Sapiens

38
taacattctt gaagcca 17

39

17

DNA

Homo Sapiens

39
taacattcct gaagcca 17

40

17

DNA

Homo Sapiens

40
ttctgtgact ccacttg 17

41

17

DNA

Homo Sapiens

41
ttctgtgact ccatttg 17

42

17

DNA

Homo Sapiens

42
ttccctgtct ccatttg 17

43

17

DNA

Homo Sapiens

43
gtagtttgcc aggaacc 17

44

17

DNA

Homo Sapiens

44
gtagtttgtc aggaacc 17

45

19

DNA

Homo Sapiens

45
tgctactcct ctctactcg 19

46

19

DNA

Homo Sapiens

46
tgctattcct ctctgctcg 19

47

19

DNA

Homo Sapiens

47
cttgatcacc ctctgatga 19

48

19

DNA

Homo Sapiens

48
cttggtcacc ctctaatga 19

49

17

DNA

Homo Sapiens

49
gaggtggtgc agagtga 17

50

17

DNA

Homo Sapiens

50
gaggtggcgc agagtga 17

51

17

DNA

Homo Sapiens

51
gaggtggccc agagtga 17

52

17

DNA

Homo Sapiens

52
cccactgaac cgcacag 17

53

17

DNA

Homo Sapiens

53
cccactgagc tgcacag 17

54

17

DNA

Homo Sapiens

54
cccactcagc cgcacag 17

55

17

DNA

Homo Sapiens

55
tgaagacaca gccagcc 17

56

17

DNA

Homo Sapiens

56
tgaagacgca gccagcc 17

57

17

DNA

Homo Sapiens

57
tgaagacgaa gccagcc 17

58

17

DNA

Homo Sapiens

58
agaagttggt accaggg 17

59

17

DNA

Homo Sapiens

59
agaagttgtt accaggg 17

60

17

DNA

Homo Sapiens

60
tatgattacg taatgtt 17

61

17

DNA

Homo Sapiens

61
tatgattatg taatgtt 17

62

17

DNA

Homo Sapiens

62
atgattccag tgagtta 17

63

17

DNA

Homo Sapiens

63
atgattcctg tgagtta 17

64

19

DNA

Homo Sapiens

64
catactatta acactggaa 19

65

19

DNA

Homo Sapiens

65
catattatta acacaggaa 19

66

17

DNA

Homo Sapiens

66
gtcaagaaca ggcaata 17

67

17

DNA

Homo Sapiens

67
gtcaagaata ggcaata 17

68

17

DNA

Homo Sapiens

68
cagactaggg aaccttc 17

69

17

DNA

Homo Sapiens

69
cagacgaggg aaccttc 17

70

17

DNA

Homo Sapiens

70
cagactaggg agccttc 17

71

17

DNA

Homo Sapiens

71
tgtccagttg tttgcat 17

72

17

DNA

Homo Sapiens

72
tgtccagtcg tttgcat 17

73

17

DNA

Homo Sapiens

73
ggggtagcca gtttggt 17

74

17

DNA

Homo Sapiens

74
ggggtagcaa gtttggt 17

75

17

DNA

Homo Sapiens

75
caggaagctg tagctcc 17

76

17

DNA

Homo Sapiens

76
caggaagccg tagctcc 17

77

17

DNA

Homo Sapiens

77
cctgagcctg tctacct 17

78

17

DNA

Homo Sapiens

78
cctgagcccg tctacct 17

79

17

DNA

Homo Sapiens

79
taacattctt gaagcca 17

80

17

DNA

Homo Sapiens

80
taacattcct gaagcca 17

81

17

DNA

Homo Sapiens

81
ccaactgaac cgcacag 17

82

17

DNA

Homo Sapiens

82
ccaactgagc tgcacag 17

83

19

DNA

Homo Sapiens

83
gagctagctc acacattct 19

84

19

DNA

Homo Sapiens

84
gagttagctc acacgttct 19

85

17

DNA

Homo Sapiens

85
acgggggggt ggcgtta 17

86

17

DNA

Homo Sapiens

86
acggggggtg gcgttaa 17

87

19

DNA

Homo Sapiens

87
tagacagcca gcgcgtcac 19

88

19

DNA

Homo Sapiens

88
tagatagcca gcgcatcac 19

89

18

DNA

Homo Sapiens

89
gcttttcttg agagtggc 18

90

18

DNA

Homo Sapiens

90
gcttttcttt agagtggc 18

91

18

DNA

Homo Sapiens

91
gcttttcgtg agagtggc 18

92

17

DNA

Homo Sapiens

92
ctacagataa agttata 17

93

17

DNA

Homo Sapiens

93
ctacagatga agttata 17

94

17

DNA

Homo Sapiens

94
tagacctgct gctatct 17

95

17

DNA

Homo Sapiens

95
tagacctgtt gctatct 17

96

17

DNA

Homo Sapiens

96
tgttgttctg gcctcca 17

97

17

DNA

Homo Sapiens

97
tgttgttttg gcctcca 17

98

17

DNA

Homo Sapiens

98
ttctgagaat ttgttag 17

99

17

DNA

Homo Sapiens

99
ttctgagagt ttgttag 17

100

17

DNA

Homo Sapiens

100
caggaagcag tagctcc 17

101

17

DNA

Homo Sapiens

101
caggaagccg tagctcc 17

102

17

DNA

Homo Sapiens

102
agagtcaggt aagttgc 17

103

17

DNA

Homo Sapiens

103
agagtcagat aagttgc 17

104

17

DNA

Homo Sapiens

104
agatttcaaa aagtttt 17

105

17

DNA

Homo Sapiens

105
agattccaaa aggtttt 17

106

17

DNA

Homo Sapiens

106
agatttcaaa aagtttt 17

107

17

DNA

Homo Sapiens

107
cctgagggga gcaatca 17

108

17

DNA

Homo Sapiens

108
cctgagggaa gcaatca 17

109

17

DNA

Homo Sapiens

109
aaggtaagat aactaag 17

110

17

DNA

Homo Sapiens

110
aaggtaaggt aactaag 17

111

17

DNA

Homo Sapiens

111
ggactacaca gagaaac 17

112

17

DNA

Homo Sapiens

112
ggactacata gagaaac 17

113

17

DNA

Homo Sapiens

113
cccaggctac acgaggg 17

114

17

DNA

Homo Sapiens

114
cccaggctac atgaggg 17

115

17

DNA

Homo Sapiens

115
cttaccagtt gtgagac 17

116

17

DNA

Homo Sapiens

116
cttaccactt gtgagac 17

117

17

DNA

Homo Sapiens

117
cttaccagtc gtgagac 17

118

17

DNA

Homo Sapiens

118
ctgccctcag gtcttta 17

119

17

DNA

Homo Sapiens

119
ctgccctccg gtcttta 17

120

17

DNA

Homo Sapiens

120
gcaataaaat tgtttta 17

121

17

DNA

Homo Sapiens

121
gcaatgagat cgtttta 17

122

17

DNA

Homo Sapiens

122
tgttctgtgg agacccc 17

123

17

DNA

Homo Sapiens

123
tgttctgtag agacccc 17

124

17

DNA

Homo Sapiens

124
cacattgaat caaagcc 17

125

17

DNA

Homo Sapiens

125
cacattgagt caaagcc 17

126

17

DNA

Homo Sapiens

126
ggactaccca cccgttc 17

127

17

DNA

Homo Sapiens

127
gcgactgcac ccattct 17

128

17

DNA

Homo Sapiens

128
gcgactgccc ccattct 17

129

17

DNA

Homo Sapiens

129
cctgggccag ccaggaa 17

130

17

DNA

Homo Sapiens

130
cctgggcctg ccaggaa 17

131

17

DNA

Homo Sapiens

131
ccccaggtaa ccatctt 17

132

17

DNA

Homo Sapiens

132
ccccaggtga ccatctt 17

133

17

DNA

Homo Sapiens

133
ttctgtatat tagctga 17

134

17

DNA

Homo Sapiens

134
tttctatatt aactgac 17

135

17

DNA

Homo Sapiens

135
ggacccggac ggtcttc 17

136

17

DNA

Homo Sapiens

136
ggacccggtc ggtcttc 17

137

17

DNA

Homo Sapiens

137
gtccctaatg ttagcat 17

138

17

DNA

Homo Sapiens

138
gtccccaatg tcagcat 17

139

17

DNA

Homo Sapiens

139
acgggggggt ggcgtta 17

140

17

DNA

Homo Sapiens

140
acggggggtg gcgttaa 17

141

19

DNA

Homo Sapiens

141
tagacagcca gcgcgtcac 19

142

19

DNA

Homo Sapiens

142
tagatagcca gcgcatcac 19

143

17

DNA

Homo Sapiens

143
gattcttcgt gttcctt 17

144

17

DNA

Homo Sapiens

144
gattcttcat gttcctt 17

145

17

DNA

Homo Sapiens

145
tgtaaaaact tagaata 17

146

17

DNA

Homo Sapiens

146
tgtaaaaatt tagaata 17

147

17

DNA

Homo Sapiens

147
tgtgaaagcg ctcccaa 17

148

17

DNA

Homo Sapiens

148
tgtgaaagtg ctcccaa 17

149

17

DNA

Homo Sapiens

149
caaaggctca gagaatc 17

150

17

DNA

Homo Sapiens

150
caaaggctta gagaatc 17

151

17

DNA

Homo Sapiens

151
ttaattctct ccaaaca 17

152

17

DNA

Homo Sapiens

152
ttaaggctct ccggaca 17

153

17

DNA

Homo Sapiens

153
ctgccaccgt gcacaca 17

154

17

DNA

Homo Sapiens

154
ctgccaccat gcacaca 17

155

17

DNA

Homo Sapiens

155
ccaaatattc tgattcc 17

156

17

DNA

Homo Sapiens

156
ccaaatattc ttttttt 17

157

17

DNA

Homo Sapiens

157
atgagctgac cctccct 17

158

17

DNA

Homo Sapiens

158
atgagctgcc cctccct 17

159

17

DNA

Homo Sapiens

159
acactaggta aaagctc 17

160

17

DNA

Homo Sapiens

160
acactaggca aaagctc 17

161

17

DNA

Homo Sapiens

161
agacaccacg accgagg 17

162

17

DNA

Homo Sapiens

162
agacaccaag accgagg 17

163

17

DNA

Homo Sapiens

163
gcagcgtccg gttaagt 17

164

17

DNA

Homo Sapiens

164
gcagcgtctg gttaagt 17

165

17

DNA

Homo Sapiens

165
cagatactac aaggatg 17

166

17

DNA

Homo Sapiens

166
tacagataca aggatgc 17

167

17

DNA

Homo Sapiens

167
tcagctagtg tatctgt 17

168

17

DNA

Homo Sapiens

168
tcacctagtg tatttgt 17

169

17

DNA

Homo Sapiens

169
ttttttattt ttggatt 17

170

17

DNA

Homo Sapiens

170
ttttaatttt tggattt 17

171

17

DNA

Homo Sapiens

171
gatattgttt tcattta 17

172

17

DNA

Homo Sapiens

172
gatattgtct tcattta 17

173

17

DNA

Homo Sapiens

173
agacccggtg ctggtgt 17

174

17

DNA

Homo Sapiens

174
agacccggcg ctggtgt 17

175

17

DNA

Homo Sapiens

175
cttctaagct ttgtctt 17

176

17

DNA

Homo Sapiens

176
cttctaagtt ttgtctt 17

177

17

DNA

Homo Sapiens

177
agttggcaac cagcatg 17

178

17

DNA

Homo Sapiens

178
agttggcatc cagcatg 17

179

17

DNA

Homo Sapiens

179
ggtgaaatgg taattac 17

180

17

DNA

Homo Sapiens

180
ggtgaaatag taattac 17

181

17

DNA

Homo Sapiens

181
acgggatata acgagtt 17

182

17

DNA

Homo Sapiens

182
acgggataca acgagtt 17

183

17

DNA

Homo Sapiens

183
gggatacaac gagtttc 17

184

17

DNA

Homo Sapiens

184
gggatacacc gagtttc 17

185

17

DNA

Homo Sapiens

185
gtatcttggg tgtcctg 17

186

17

DNA

Homo Sapiens

186
gtaacttggg tgttctg 17

187

17

DNA

Homo Sapiens

187
gggtgtcctg ccccatc 17

188

17

DNA

Homo Sapiens

188
gggtgttctg ttttatc 17

189

17

DNA

Homo Sapiens

189
tgtccagttg ttttgca 17

190

17

DNA

Homo Sapiens

190
tgtccagtcg ttttgca 17

191

17

DNA

Homo Sapiens

191
aagacagccg gaactct 17

192

17

DNA

Homo Sapiens

192
aagacagcag gaactct 17

193

17

DNA

Homo Sapiens

193
tgataggacc aaagaga 17

194

17

DNA

Homo Sapiens

194
cgataggact aaagaga 17

195

17

DNA

Homo Sapiens

195
tccaaagcca gggccca 17

196

17

DNA

Homo Sapiens

196
tccaaattca gggccca 17

197

17

DNA

Homo Sapiens

197
cctgggccag ccagaag 17

198

17

DNA

Homo Sapiens

198
cctgggcctg ccagaag 17

199

17

DNA

Homo Sapiens

199
gattctctga gcctttg 17

200

17

DNA

Homo Sapiens

200
gattctctaa gcctttg 17

201

17

DNA

Homo Sapiens

201
taccattttt tagatga 17

202

17

DNA

Homo Sapiens

202
taccatttct tagatga 17

203

17

DNA

Homo Sapiens

203
ctggaagggc agtgaat 17

204

17

DNA

Homo Sapiens

204
tctggacgag ggtgaat 17

205

17

DNA

Homo Sapiens

205
tagttgcagc acaaatg 17

206

17

DNA

Homo Sapiens

206
tagttgtagc acaaatg 17

207

17

DNA

Homo Sapiens

207
acactaccgc acagagc 17

208

17

DNA

Homo Sapiens

208
acactaccac acagagc 17

209

17

DNA

Homo Sapiens

209
aataataagt aaataag 17

210

17

DNA

Homo Sapiens

210
aataataaat aaataag 17

211

17

DNA

Homo Sapiens

211
tggcagtagt tgttcat 17

212

17

DNA

Homo Sapiens

212
tggcagtaat tgttcat 17

213

17

DNA

Homo Sapiens

213
aggtatgacg tcataag 17

214

17

DNA

Homo Sapiens

214
aggtatgatg tcataag 17

215

17

DNA

Homo Sapiens

215
gttgttgttg aagattt 17

216

17

DNA

Homo Sapiens

216
ttgttgttga agattta 17

217

19

DNA

Homo Sapiens

217
gatagtacag gtgttgtca 19

218

19

DNA

Homo Sapiens

218
gatggtacag gtgtcgtca 19

219

17

DNA

Homo Sapiens

219
aatataatgt aacagga 17

220

17

DNA

Homo Sapiens

220
aatataatat aacagga 17

221

17

DNA

Homo Sapiens

221
ttaaccattt atctgat 17

222

17

DNA

Homo Sapiens

222
ttaaccatat atctgat 17

223

17

DNA

Homo Sapiens

223
agagcccagc aaagttc 17

224

17

DNA

Homo Sapiens

224
agagcccaac aaagttc 17

225

19

DNA

Homo Sapiens

225
atcccgaacc ggggaaaat 19

226

19

DNA

Homo Sapiens

226
atcccaaacc gggggaaat 19

227

17

DNA

Homo Sapiens

227
atgacaccac cacaacc 17

228

17

DNA

Homo Sapiens

228
atgacaccgc cacaacc 17

229

17

DNA

Homo Sapiens

229
aggcaaacag atataac 17

230

17

DNA

Homo Sapiens

230
aggcaaacgg atataac 17

231

17

DNA

Homo Sapiens

231
tgtattcact aataaga 17

232

17

DNA

Homo Sapiens

232
tgtattcatt aataaga 17

233

17

DNA

Homo Sapiens

233
ttggcgtata cttcata 17

234

17

DNA

Homo Sapiens

234
ttggcgtaca cttcata 17

235

17

DNA

Homo Sapiens

235
ctcaccacgc tccatct 17

236

17

DNA

Homo Sapiens

236
ctcaccaccc tccatct 17

237

16

DNA

Homo Sapiens

237
atatctaaag gcacag 16

238

17

DNA

Homo Sapiens

238
tatctacata aaggcac 17

239

17

DNA

Homo Sapiens

239
gtgtctccta gtctccc 17

240

17

DNA

Homo Sapiens

240
gtgtctccca gtctccc 17

241

17

DNA

Homo Sapiens

241
atgagctgac cctccct 17

242

17

DNA

Homo Sapiens

242
atgagctgcc cctccct 17

243

17

DNA

Homo Sapiens

243
ggacaacatt taattgg 17

244

17

DNA

Homo Sapiens

244
ggacaacact taattgg 17

245

17

DNA

Homo Sapiens

245
gctttaaaat ttttatt 17

246

17

DNA

Homo Sapiens

246
gctttaaatt ttttatt 17

247

17

DNA

Homo Sapiens

247
aaatttgttc ctaaatg 17

248

17

DNA

Homo Sapiens

248
aaatttgtac ctaaatg 17

249

17

DNA

Homo Sapiens

249
gtgttgttct ggcctcc 17

250

17

DNA

Homo Sapiens

250
gtgttgtttt ggcctcc 17

251

17

DNA

Homo Sapiens

251
tgaatgacaa aaagaca 17

252

17

DNA

Homo Sapiens

252
tgaatgacga aaagaca 17

253

18

DNA

Homo Sapiens

253
actgagccat ctcwccag 18

254

17

DNA

Homo Sapiens

254
acttaactta agctggc 17

255

17

DNA

Homo Sapiens

255
gtacttaagc tggcctg 17

256

17

DNA

Homo Sapiens

256
actctaatat cccacag 17

257

17

DNA

Homo Sapiens

257
actctaatct cccacag 17

258

17

DNA

Homo Sapiens

258
cggatcggct ctagttc 17

259

17

DNA

Homo Sapiens

259
cggatcagct ctagttc 17

260

17

DNA

Homo Sapiens

260
tcaaaccaat aaggagg 17

261

17

DNA

Homo Sapiens

261
tcaaaccagt aaggagg 17

262

17

DNA

Homo Sapiens

262
gtgtgtgtgt ggggggg 17

263

17

DNA

Homo Sapiens

263
gtgtgtgtgg ggggggt 17

264

17

DNA

Homo Sapiens

264
cttaataata atttcat 17

265

17

DNA

Homo Sapiens

265
cttaataaca atttcat 17

266

17

DNA

Homo Sapiens

266
gtgtctccat atgtgtg 17

267

17

DNA

Homo Sapiens

267
gtgtctacac atgtgtg 17

268

17

DNA

Homo Sapiens

268
aactcatcat gatggtt 17

269

17

DNA

Homo Sapiens

269
aactcataat gatggtt 17

270

17

DNA

Homo Sapiens

270
aactcatcac gatggtt 17

271

17

DNA

Homo Sapiens

271
atcactcata gcccaga 17

272

17

DNA

Homo Sapiens

272
atcacttata gcccaga 17

273

17

DNA

Homo Sapiens

273
atcactcata tcccaga 17

274

17

DNA

Homo Sapiens

274
catcttacca gcattga 17

275

17

DNA

Homo Sapiens

275
catcttacta gcattga 17

276

17

DNA

Homo Sapiens

276
agtcagccgg ctctggc 17

277

17

DNA

Homo Sapiens

277
agtcagccag ctctggc 17

278

19

DNA

Homo Sapiens

278
gggtaggagt gggggtgag 19

279

19

DNA

Homo Sapiens

279
gggcaggagt gggggtgag 19

280

19

DNA

Homo Sapiens

280
gggtaggagt gggggtgag 19

281

17

DNA

Homo Sapiens

281
tcagtattgt tcttctc 17

282

17

DNA

Homo Sapiens

282
tcagtatttt tcttctc 17

283

17

DNA

Homo Sapiens

283
agcagagact gagctcg 17

284

17

DNA

Homo Sapiens

284
agcagagacc gagctcg 17

285

17

DNA

Homo Sapiens

285
acaggggtcg attcgtc 17

286

17

DNA

Homo Sapiens

286
acagggatcg attcgtc 17

287

17

DNA

Homo Sapiens

287
acaggggtcg tttcgtc 17

288

17

DNA

Homo Sapiens

288
tcccaaagca ttcaagg 17

289

17

DNA

Homo Sapiens

289
tcccaaagta ttcaagg 17

290

17

DNA

Homo Sapiens

290
gaccagggtt aatgact 17

291

17

DNA

Homo Sapiens

291
gaccagggct aatgact 17

292

17

DNA

Homo Sapiens

292
ctattaacag agtcgag 17

293

17

DNA

Homo Sapiens

293
ctattaacgg agtcgag 17

294

17

DNA

Homo Sapiens

294
gtgatactgg atgtctg 17

295

17

DNA

Homo Sapiens

295
gtgataccga tgtctgg 17

296

17

DNA

Homo Sapiens

296
ctctctcgat agtctaa 17

297

17

DNA

Homo Sapiens

297
ctctctcgct agtctaa 17

298

17

DNA

Homo Sapiens

298
tctctcgata gtctaat 17

299

17

DNA

Homo Sapiens

299
tctctcgctg gtctaat 17

300

17

DNA

Homo Sapiens

300
agatgcaaaa ttcttag 17

301

17

DNA

Homo Sapiens

301
agatgcacag ttcttag 17

302

17

DNA

Homo Sapiens

302
ggaaaatgct caggtag 17

303

17

DNA

Homo Sapiens

303
ggaaaatgtt caggtag 17

304

17

DNA

Homo Sapiens

304
tctgggcaga gtgcagg 17

305

17

DNA

Homo Sapiens

305
tctgggcagc gtgcagg 17

306

17

DNA

Homo Sapiens

306
tatggaacgg ttgcttc 17

307

17

DNA

Homo Sapiens

307
tatggaactg ttgcttc 17

308

17

DNA

Homo Sapiens

308
aagcctggta cccgctg 17

309

17

DNA

Homo Sapiens

309
aagcctggca cccgctg 17

310

17

DNA

Homo Sapiens

310
cattcttctt tttctga 17

311

17

DNA

Homo Sapiens

311
cattcttcgt tttctga 17

312

17

DNA

Homo Sapiens

312
ctgcaggctt gtctgtg 17

313

17

DNA

Homo Sapiens

313
ctgcaggttt gtctgtg 17

314

17

DNA

Homo Sapiens

314
tgccatttcc tataaca 17

315

17

DNA

Homo Sapiens

315
tgccatttgc tataaca 17

316

17

DNA

Homo Sapiens

316
ccgccacacc cgctcct 17

317

17

DNA

Homo Sapiens

317
ccgccacagc cgctcct 17

318

17

DNA

Homo Sapiens

318
caaataatgc tagttat 17

319

17

DNA

Homo Sapiens

319
caaataatgt tagttat 17

320

17

DNA

Homo Sapiens

320
ggatgttgac acgctac 17

321

17

DNA

Homo Sapiens

321
ggatgttgtc acgctac 17

322

17

DNA

Homo Sapiens

322
catgtgtcca acgccat 17

323

17

DNA

Homo Sapiens

323
catgtgtcac aacgcca 17

324

17

DNA

Homo Sapiens

324
aaaggggcct taaagga 17

325

17

DNA

Homo Sapiens

325
aaaggggctt taaagga 17

326

17

DNA

Homo Sapiens

326
tgaaaagttc ttttcat 17

327

17

DNA

Homo Sapiens

327
tgaaaagtac ttttcat 17

328

17

DNA

Homo Sapiens

328
cctctctatg tgtgagc 17

329

17

DNA

Homo Sapiens

329
cctctctacg tgtgagc 17

330

17

DNA

Homo Sapiens

330
gaagttttag gattctt 17

331

19

DNA

Homo Sapiens

331
gaagatttag gagagtctc 19

332

17

DNA

Homo Sapiens

332
agggatgtat tttgtta 17

333

17

DNA

Homo Sapiens

333
agggatgtgt tttgtta 17

334

17

DNA

Homo Sapiens

334
acaattcaaa tgtatat 17

335

17

DNA

Homo Sapiens

335
acaattcata tgtatat 17

336

17

DNA

Homo Sapiens

336
cttgcctaac ctgcaca 17

337

17

DNA

Homo Sapiens

337
cttgcctagc ctgcaca 17

338

17

DNA

Homo Sapiens

338
caacagcacc tcatatc 17

339

17

DNA

Homo Sapiens

339
acagcggtgc ctcgtat 17

340

17

DNA

Homo Sapiens

340
actcacagtg tcagggc 17

341

17

DNA

Homo Sapiens

341
actcacagcg tcagggc 17

342

17

DNA

Homo Sapiens

342
ggctgctcct gtgtctg 17

343

19

DNA

Homo Sapiens

343
ggctcttcct gtgtgtctg 19

344

19

DNA

Homo Sapiens

344
ggctgctcct gtgtttctg 19

345

17

DNA

Homo Sapiens

345
aatagatgcc cttctga 17

346

17

DNA

Homo Sapiens

346
aatagatgcc ctcttga 17

347

17

DNA

Homo Sapiens

347
aatcgatgcc cttctga 17

348

17

DNA

Homo Sapiens

348
ttggtctagc aggtagc 17

349

17

DNA

Homo Sapiens

349
ttggtctacc aggtagc 17

350

17

DNA

Homo Sapiens

350
agccttggct cttaaaa 17

351

17

DNA

Homo Sapiens

351
agccttggtt cttaaaa 17

352

17

DNA

Homo Sapiens

352
agtctctggc gcctttg 17

353

17

DNA

Homo Sapiens

353
agtctctgcc gcctttg 17

354

19

DNA

Homo Sapiens

354
tagcaggagg cacagctta 19

355

19

DNA

Homo Sapiens

355
aagcaggagg cacaactta 19

356

19

DNA

Homo Sapiens

356
aagcaggagg cacagctta 19

357

19

DNA

Homo Sapiens

357
tagcaggagg cacagcttg 19

358

17

DNA

Homo Sapiens

358
aggagagacc ggactcc 17

359

17

DNA

Homo Sapiens

359
aggagagagc ggactcc 17

360

17

DNA

Homo Sapiens

360
tacaagtcat ccttcct 17

361

17

DNA

Homo Sapiens

361
tacaagtcgt ccttcct 17

362

17

DNA

Homo Sapiens

362
atacctccct cagacaa 17

363

17

DNA

Homo Sapiens

363
atacctcctc agacaag 17

364

17

DNA

Homo Sapiens

364
aaacaaacaa acaaacc 17

365

17

DNA

Homo Sapiens

365
aaacaaacca acaaacc 17

366

17

DNA

Homo Sapiens

366
gtgcgccacc atgacca 17

367

17

DNA

Homo Sapiens

367
gtgcgccatc atgacca 17

368

17

DNA

Homo Sapiens

368
ggctttccca ttagtgg 17

369

17

DNA

Homo Sapiens

369
ggctttccta ttagtgg 17

370

17

DNA

Homo Sapiens

370
ccctcacctc tctctca 17

371

17

DNA

Homo Sapiens

371
ccctcacccc tctctca 17

372

17

DNA

Homo Sapiens

372
aatctctcgc gttcatt 17

373

17

DNA

Homo Sapiens

373
aatctctcac gttcatt 17

374

17

DNA

Homo Sapiens

374
aatgataccg atcctta 17

375

17

DNA

Homo Sapiens

375
aatgatacag atcctta 17

376

17

DNA

Homo Sapiens

376
ataaaactgc attcgtg 17

377

17

DNA

Homo Sapiens

377
ataaaactac attcgtg 17

378

18

DNA

Homo Sapiens

378
agttccagga cagccagg 18

379

17

DNA

Homo Sapiens

379
atatctccga ctttgaa 17

380

17

DNA

Homo Sapiens

380
atatctccaa ctttgaa 17

381

17

DNA

Homo Sapiens

381
tggccctgca gagtctg 17

382

17

DNA

Homo Sapiens

382
tggctctgca gagctgg 17

383

17

DNA

Homo Sapiens

383
caatggatca aagatgc 17

384

17

DNA

Homo Sapiens

384
atggatcaac aaagatg 17

385

17

DNA

Homo Sapiens

385
gctgcctcaa ggtataa 17

386

17

DNA

Homo Sapiens

386
ctgcctctta aggtata 17

387

17

DNA

Homo Sapiens

387
acctatggct cctcatc 17

388

17

DNA

Homo Sapiens

388
acctatggtt cctcatc 17

389

17

DNA

Homo Sapiens

389
tcttctcccc tgcttta 17

390

17

DNA

Homo Sapiens

390
tcttctcact gctttag 17

391

17

DNA

Homo Sapiens

391
ccgcataaaa agctgag 17

392

17

DNA

Homo Sapiens

392
ccgccataaa agctgag 17

393

17

DNA

Homo Sapiens

393
agaatatagg gtttttt 17

394

17

DNA

Homo Sapiens

394
tagaatacag ttttttt 17

395

17

DNA

Homo Sapiens

395
agagttgctg tgcaggg 17

396

17

DNA

Homo Sapiens

396
agagttgccg tgcaggg 17

397

17

DNA

Homo Sapiens

397
agagttgcag tgcaggg 17

398

17

DNA

Homo Sapiens

398
taagcagtgt tcttggc 17

399

17

DNA

Homo Sapiens

399
taagcagtat tcttggc 17

400

17

DNA

Homo Sapiens

400
tcttctcccc tgcttta 17

401

17

DNA

Homo Sapiens

401
tcttctcact gctttag 17

402

17

DNA

Homo Sapiens

402
ttttttttta ttattga 17

403

17

DNA

Homo Sapiens

403
ttttttttat tattgaa 17

404

17

DNA

Homo Sapiens

404
tgtggtacgc acatctg 17

405

17

DNA

Homo Sapiens

405
tgtggtacac acatctg 17

406

17

DNA

Homo Sapiens

406
agactcttag acttctg 17

407

17

DNA

Homo Sapiens

407
agactcttag gcttctg 17

408

17

DNA

Homo Sapiens

408
agactcataa gcttctg 17

409

17

DNA

Homo Sapiens

409
agactcttag gcttctg 17

410

17

DNA

Homo Sapiens

410
cacgtacccg aacgtga 17

411

17

DNA

Homo Sapiens

411
cacgtacctg aacgtga 17

412

17

DNA

Homo Sapiens

412
attacggttt gtcgtca 17

413

17

DNA

Homo Sapiens

413
attacggttg gtcgtca 17

414

17

DNA

Homo Sapiens

414
ccaagatacg aaaccag 17

415

17

DNA

Homo Sapiens

415
ccaagatatg aaaccag 17

416

17

DNA

Homo Sapiens

416
tgcaatgacc agcaacc 17

417

17

DNA

Homo Sapiens

417
tgcaacgacc agcaacc 17

418

17

DNA

Homo Sapiens

418
tgtaacgacc aacaact 17

419

17

DNA

Homo Sapiens

419
tctaaaggga aagatgg 17

420

17

DNA

Homo Sapiens

420
tctaaaggaa agatgga 17

421

17

DNA

Homo Sapiens

421
ctggactcat acataca 17

422

17

DNA

Homo Sapiens

422
ctggactcgt acataca 17

423

17

DNA

Homo Sapiens

423
agtttggtcc cctggac 17

424

17

DNA

Homo Sapiens

424
agtttggttt cctggac 17

425

17

DNA

Homo Sapiens

425
tatagcttca tgtaaaa 17

426

17

DNA

Homo Sapiens

426
tatagcttta tgtaaaa 17

427

17

DNA

Homo Sapiens

427
ttttttttat tattgaa 17

428

17

DNA

Homo Sapiens

428
ttttttttta ttattga 17

429

17

DNA

Homo Sapiens

429
actcattgcc aatttaa 17

430

17

DNA

Homo Sapiens

430
actcattcag aatttaa 17

431

17

DNA

Homo Sapiens

431
atgcgtaatg ggggcta 17

432

17

DNA

Homo Sapiens

432
atgcgtaacg ggggcta 17

433

17

DNA

Homo Sapiens

433
ataattgctc ttttaaa 17

434

17

DNA

Homo Sapiens

434
gtaattgctc ttttaaa 17

435

17

DNA

Homo Sapiens

435
tctgattagt gatggat 17

436

17

DNA

Homo Sapiens

436
tctgattatg atggatt 17

437

17

DNA

Homo Sapiens

437
agcagagtgt ctcgtaa 17

438

17

DNA

Homo Sapiens

438
agcagagtat ctcgtaa 17

439

17

DNA

Homo Sapiens

439
gctggcagat atcggta 17

440

17

DNA

Homo Sapiens

440
gctggcaggt atcggta 17

441

17

DNA

Homo Sapiens

441
aactgcaatg accagca 17

442

17

DNA

Homo Sapiens

442
aactgcaacg accagca 17

443

17

DNA

Homo Sapiens

443
gctggtcatt gcagttt 17

444

17

DNA

Homo Sapiens

444
gttggtcgtt acagttt 17

445

17

DNA

Homo Sapiens

445
gctggtcgtt gcagttt 17

446

17

DNA

Homo Sapiens

446
gctggcagat atcggta 17

447

17

DNA

Homo Sapiens

447
gctggcaggt atcggta 17

448

17

DNA

Homo Sapiens

448
atagaaagtc caccgtc 17

449

17

DNA

Homo Sapiens

449
atagaaagcc caccgtc 17

450

17

DNA

Homo Sapiens

450
ttagtgaccg tgtaaac 17

451

17

DNA

Homo Sapiens

451
ttagtgactg tgtaaac 17

452

17

DNA

Homo Sapiens

452
ggggaggagc tttgttc 17

453

17

DNA

Homo Sapiens

453
ggggaggatc tttgttc 17

454

17

DNA

Homo Sapiens

454
ggcctggaca caaaagc 17

455

17

DNA

Homo Sapiens

455
ggcctggaaa caaaagc 17

456

17

DNA

Homo Sapiens

456
cccttttcta gtattgt 17

457

17

DNA

Homo Sapiens

457
cccttttcca gtattgt 17

458

17

DNA

Homo Sapiens

458
gaattggttt taggaat 17

459

17

DNA

Homo Sapiens

459
gaattggtat taggaat 17

460

17

DNA

Homo Sapiens

460
acccagcttt ccatggt 17

461

17

DNA

Homo Sapiens

461
acccagctct ccatggt 17

462

17

DNA

Homo Sapiens

462
tcacgttcgg gtacgtg 17

463

17

DNA

Homo Sapiens

463
tcacgttcag gtacgtg 17

464

17

DNA

Homo Sapiens

464
tgccttccgg ttggcaa 17

465

17

DNA

Homo Sapiens

465
tgccttccag ttggcaa 17

466

17

DNA

Homo Sapiens

466
ttttatcata caattgc 17

467

17

DNA

Homo Sapiens

467
ttttatcaga caattgc 17

468

17

DNA

Homo Sapiens

468
atcttctctt ctttgag 17

469

17

DNA

Homo Sapiens

469
atcttctcct ctttgag 17

470

17

DNA

Homo Sapiens

470
cagtcctctg ctttctc 17

471

17

DNA

Homo Sapiens

471
cagtcctcag ctttctc 17

472

17

DNA

Homo Sapiens

472
ccaagatacg aaaccag 17

473

17

DNA

Homo Sapiens

473
ccaagatatg aaaccag 17

474

17

DNA

Homo Sapiens

474
ggtattcaag ggttact 17

475

17

DNA

Homo Sapiens

475
ggtattcagg gttactg 17

476

17

DNA

Homo Sapiens

476
acctatggct cctcatc 17

477

17

DNA

Homo Sapiens

477
acctatggtt cctcatc 17

478

17

DNA

Homo Sapiens

478
ttttatcata caattgc 17

479

17

DNA

Homo Sapiens

479
ttttatcaga caattgc 17

480

17

DNA

Homo Sapiens

480
aaccagggct taagtct 17

481

17

DNA

Homo Sapiens

481
aaccagggat taagtct 17

482

17

DNA

Homo Sapiens

482
cagaaaaaca gatatac 17

483

17

DNA

Homo Sapiens

483
cagaaaaaga gatatac 17

484

17

DNA

Homo Sapiens

484
tctgagcgtg agtgctg 17

485

17

DNA

Homo Sapiens

485
tctgagcgcg agtgctg 17

486

17

DNA

Homo Sapiens

486
acctcagaag cggaggt 17

487

17

DNA

Homo Sapiens

487
acctcggaag gggaggt 17

488

17

DNA

Homo Sapiens

488
acctcggaag cggaggt 17

489

17

DNA

Homo Sapiens

489
taactcgatc gctatca 17

490

17

DNA

Homo Sapiens

490
taactcgctt gctatca 17

491

17

DNA

Homo Sapiens

491
taactcgctc gctatca 17

492

17

DNA

Homo Sapiens

492
gaatttctca acttctt 17

493

17

DNA

Homo Sapiens

493
gaatttctga acttctt 17

494

17

DNA

Homo Sapiens

494
caggggtccc caatttg 17

495

17

DNA

Homo Sapiens

495
caggggtctc caatttg 17

496

17

DNA

Homo Sapiens

496
ttttgctgtg caggcta 17

497

17

DNA

Homo Sapiens

497
ttttactgtg ccaggct 17

498

17

DNA

Homo Sapiens

498
gacagccctg tctcaaa 17

499

17

DNA

Homo Sapiens

499
agagaaaccc tgtctca 17

500

17

DNA

Homo Sapiens

500
gcaccggtct gagcagt 17

501

17

DNA

Homo Sapiens

501
gcaccggttt gagcagt 17

502

17

DNA

Homo Sapiens

502
ccgtgcccct gaacaat 17

503

17

DNA

Homo Sapiens

503
ccgtgccctt gaacaat 17

504

17

DNA

Homo Sapiens

504
tcacgttcgg gtacgtg 17

505

17

DNA

Homo Sapiens

505
tcacgttcag gtacgtg 17

506

17

DNA

Homo Sapiens

506
tgattcgctg ggactct 17

507

17

DNA

Homo Sapiens

507
tgattcgccg ggactct 17

508

17

DNA

Homo Sapiens

508
ttgatatccg aggcctt 17

509

17

DNA

Homo Sapiens

509
ttgatatctg aggcctt 17

510

17

DNA

Homo Sapiens

510
tccctgggcc aagcata 17

511

17

DNA

Homo Sapiens

511
tccctgggtc aagcata 17

512

17

DNA

Homo Sapiens

512
ttatggctga ggatcac 17

513

17

DNA

Homo Sapiens

513
ttatggctgc ggatcat 17

514

17

DNA

Homo Sapiens

514
ttatggcagg ggatcac 17

515

17

DNA

Homo Sapiens

515
ctctctgcgc tgaagca 17

516

17

DNA

Homo Sapiens

516
ctctctgctc tgaagca 17

517

17

DNA

Homo Sapiens

517
agatacagag atgtgtt 17

518

17

DNA

Homo Sapiens

518
agatactgag gtgtgtt 17

519

17

DNA

Homo Sapiens

519
cgacatctgg cagatgt 17

520

17

DNA

Homo Sapiens

520
cgacatctag cagatgt 17

521

17

DNA

Homo Sapiens

521
gtcacaaata gtatttc 17

522

17

DNA

Homo Sapiens

522
gtcacaaaga gtatttc 17

523

17

DNA

Homo Sapiens

523
aaggtgtgtg cgtgtgt 17

524

17

DNA

Homo Sapiens

524
aaggtgtgcg cgtgtgt 17

525

17

DNA

Homo Sapiens

525
agtctttttt ttcctga 17

526

19

DNA

Homo Sapiens

526
tagtcttttt tttcctgaa 19

527

17

DNA

Homo Sapiens

527
caggctgtgg gaggctt 17

528

17

DNA

Homo Sapiens

528
caggctgcgg aaggctt 17

529

17

DNA

Homo Sapiens

529
ctgtaagtca ttcaata 17

530

17

DNA

Homo Sapiens

530
ctgtaagtaa ttcaata 17

531

17

DNA

Homo Sapiens

531
caggggtccc caatttg 17

532

17

DNA

Homo Sapiens

532
caggggtctc caatttg 17

533

17

DNA

Homo Sapiens

533
gactcatggc cgccttg 17

534

17

DNA

Homo Sapiens

534
gactcattgc cgcctgg 17

535

17

DNA

Homo Sapiens

535
gactcctggc cgcctgg 17

536

17

DNA

Homo Sapiens

536
gactcctggc tgcctgg 17

537

17

DNA

Homo Sapiens

537
gactcctggc cgcctgg 17

538

17

DNA

Homo Sapiens

538
acaggggagg aaggaag 17

539

17

DNA

Homo Sapiens

539
acaggggaag gaaggaa 17

540

17

DNA

Homo Sapiens

540
ttgatataga ttgattc 17

541

17

DNA

Homo Sapiens

541
ttgatatata ttgattc 17

542

17

DNA

Homo Sapiens

542
atagaacagc aaagtaa 17

543

17

DNA

Homo Sapiens

543
atagaacaac aaagtaa 17

544

17

DNA

Homo Sapiens

544
aacaagcatc tatggat 17

545

17

DNA

Homo Sapiens

545
aacaagcacc tatggat 17

546

17

DNA

Homo Sapiens

546
gagcaggtta agcgatg 17

547

17

DNA

Homo Sapiens

547
gagcaggtga agcgatg 17

548

17

DNA

Homo Sapiens

548
ggcttccagc ttgattc 17

549

17

DNA

Homo Sapiens

549
ggcttccaac ttgattc 17

550

17

DNA

Homo Sapiens

550
agatagggat gaatccc 17

551

17

DNA

Homo Sapiens

551
agataggggt gaatccc 17

552

17

DNA

Homo Sapiens

552
tcattcaccg tttattg 17

553

17

DNA

Homo Sapiens

553
tcattcactg tttattg 17

554

17

DNA

Homo Sapiens

554
ctgacatact gcttagg 17

555

17

DNA

Homo Sapiens

555
ctgacatatt gcttagg 17

556

17

DNA

Homo Sapiens

556
ctaggaaagc ctaaatt 17

557

17

DNA

Homo Sapiens

557
ctaggaaaac ctaaatt 17

558

17

DNA

Homo Sapiens

558
atgtcaggat tttaaga 17

559

17

DNA

Homo Sapiens

559
atgtcagggt tttaaga 17

560

17

DNA

Homo Sapiens

560
ggtttccaat tggaaag 17

561

17

DNA

Homo Sapiens

561
ggtttccagt tggaaag 17

562

17

DNA

Homo Sapiens

562
cgaggagtgc aaagcga 17

563

17

DNA

Homo Sapiens

563
cgaggagtcc aaagcga 17

564

17

DNA

Homo Sapiens

564
tgtgtgtgtg tctgtct 17

565

17

DNA

Homo Sapiens

565
tgtgtgtgcg tctgtct 17

566

17

DNA

Homo Sapiens

566
gcaagatgca gctgcat 17

567

17

DNA

Homo Sapiens

567
gcaagatgta gctgcat 17

568

17

DNA

Homo Sapiens

568
gctggggcta ttctgta 17

569

17

DNA

Homo Sapiens

569
gctggggcca ttctgta 17

570

17

DNA

Homo Sapiens

570
caataacgga cctgcct 17

571

17

DNA

Homo Sapiens

571
caataacgaa cctgcct 17

572

17

DNA

Homo Sapiens

572
tagcctctct acatagg 17

573

17

DNA

Homo Sapiens

573
tagcctctgt acatagg 17

574

17

DNA

Homo Sapiens

574
catctatagg ttcactt 17

575

17

DNA

Homo Sapiens

575
catctatatg ttcactt 17

576

17

DNA

Homo Sapiens

576
gccaacaaca ttgagag 17

577

17

DNA

Homo Sapiens

577
gccaacaaga ttgagag 17

578

17

DNA

Homo Sapiens

578
gggtcgtgcg tccccct 17

579

17

DNA

Homo Sapiens

579
gggtcgtgtg tccccct 17

580

17

DNA

Homo Sapiens

580
attgtctcac atttctt 17

581

17

DNA

Homo Sapiens

581
attgtctcgc atttctt 17

582

17

DNA

Homo Sapiens

582
ggtgtggtcg cagaagg 17

583

17

DNA

Homo Sapiens

583
ggtgtggttg cagaagg 17

584

17

DNA

Homo Sapiens

584
tcattgccac acttgaa 17

585

17

DNA

Homo Sapiens

585
tcattgccgc acttgaa 17

586

17

DNA

Homo Sapiens

586
atctgtctac aatgatc 17

587

17

DNA

Homo Sapiens

587
atctgtctgc aatgatc 17

588

17

DNA

Homo Sapiens

588
ggctgggcac agtggct 17

589

17

DNA

Homo Sapiens

589
ggctgggcgc agtggct 17

590

17

DNA

Homo Sapiens

590
cagcctggag aacaagt 17

591

17

DNA

Homo Sapiens

591
cagcctggcg aacaagt 17

592

17

DNA

Homo Sapiens

592
tttgacaccc ggaagct 17

593

17

DNA

Homo Sapiens

593
tttgacactc ggaagct 17

594

17

DNA

Homo Sapiens

594
ctgcctttca tactgcc 17

595

17

DNA

Homo Sapiens

595
ctgcctttta tactgcc 17

596

17

DNA

Homo Sapiens

596
acaatagacg ttccccg 17

597

17

DNA

Homo Sapiens

597
acaatagatg ttccccg 17

598

17

DNA

Homo Sapiens

598
ggtgtttgat ttgtact 17

599

17

DNA

Homo Sapiens

599
ggtgtttgct ttgtact 17

600

17

DNA

Homo Sapiens

600
tccaactcaa aaaatgt 17

601

17

DNA

Homo Sapiens

601
tccaactcta aaaatgt 17

602

17

DNA

Homo Sapiens

602
gggccgctca cagtcca 17

603

17

DNA

Homo Sapiens

603
gggccgctta cagtcca 17

604

17

DNA

Homo Sapiens

604
gcatggctcg tgggttt 17

605

17

DNA

Homo Sapiens

605
gcatggcttg tgggttt 17

606

17

DNA

Homo Sapiens

606
gttgggaagt ggagcgg 17

607

17

DNA

Homo Sapiens

607
gttgggaatt ggagcgg 17

608

17

DNA

Homo Sapiens

608
aagggatgag gatgtga 17

609

17

DNA

Homo Sapiens

609
aagggatggg gatgtga 17

610

17

DNA

Homo Sapiens

610
tcctcgagag ctttgct 17

611

17

DNA

Homo Sapiens

611
tcctcgaggg ctttgct 17

612

17

DNA

Homo Sapiens

612
tgacaatgcg tgcccaa 17

613

17

DNA

Homo Sapiens

613
tgacaatgtg tgcccaa 17

614

17

DNA

Homo Sapiens

614
tccatgtcat agatttc 17

615

17

DNA

Homo Sapiens

615
tccatgtcgt agatttc 17

616

17

DNA

Homo Sapiens

616
tggaggacag tggaggg 17

617

17

DNA

Homo Sapiens

617
tggaggactg tggaggg 17

618

17

DNA

Homo Sapiens

618
acccatttcc tgaaaat 17

619

17

DNA

Homo Sapiens

619
acccattttc tgaaaat 17

620

17

DNA

Homo Sapiens

620
ctgagttcgg cactgct 17

621

17

DNA

Homo Sapiens

621
ctgagttctg cactgct 17

622

17

DNA

Homo Sapiens

622
accagtttgg ctcaaag 17

623

17

DNA

Homo Sapiens

623
accagttttg ctcaaag 17

624

17

DNA

Homo Sapiens

624
ccaatcagaa cgtgcag 17

625

17

DNA

Homo Sapiens

625
ccaatcagag cgtgcag 17

626

17

DNA

Homo Sapiens

626
acccacacag acactgc 17

627

17

DNA

Homo Sapiens

627
acccacactg acactgc 17

628

17

DNA

Homo Sapiens

628
ggacaaagcg ctggtgt 17

629

17

DNA

Homo Sapiens

629
ggacaaagtg ctggtgt 17

630

17

DNA

Homo Sapiens

630
agctggtccc cctmccc 17

631

17

DNA

Homo Sapiens

631
agctggtctc cctmccc 17

632

17

DNA

Homo Sapiens

632
ggtgtagtaa gcacagc 17

633

17

DNA

Homo Sapiens

633
ggtgtagtca gcacagc 17

634

17

DNA

Homo Sapiens

634
agcgaacacg ggggaaa 17

635

17

DNA

Homo Sapiens

635
agcgaacatg ggggaaa 17

636

17

DNA

Homo Sapiens

636
gtgacagcac caaactt 17

637

17

DNA

Homo Sapiens

637
gtgacagcgc caaactt 17

638

17

DNA

Homo Sapiens

638
gtctgttgct gttattt 17

639

17

DNA

Homo Sapiens

639
gtctgttgtt gttattt 17

640

17

DNA

Homo Sapiens

640
accagcatag cccagag 17

641

17

DNA

Homo Sapiens

641
accagcatgg cccagag 17

642

17

DNA

Homo Sapiens

642
cgtaggagac aagacct 17

643

17

DNA

Homo Sapiens

643
cgtaggaggc aagacct 17

644

17

DNA

Homo Sapiens

644
ctctgctgaa tctccca 17

645

17

DNA

Homo Sapiens

645
ctctgctgga tctccca 17

646

17

DNA

Homo Sapiens

646
aagcaaagac tgattca 17

647

17

DNA

Homo Sapiens

647
aagcaaagtc tgattca 17

648

17

DNA

Homo Sapiens

648
aggcagctag agggaga 17

649

17

DNA

Homo Sapiens

649
aggcagctcg agggaga 17

650

17

DNA

Homo Sapiens

650
ttccattccg ttcaatt 17

651

17

DNA

Homo Sapiens

651
ttccattctg ttcaatt 17

652

17

DNA

Homo Sapiens

652
tattgttact gattttg 17

653

17

DNA

Homo Sapiens

653
tattgttatt gattttg 17

654

17

DNA

Homo Sapiens

654
gagctttcag aggctga 17

655

17

DNA

Homo Sapiens

655
gagctttcgg aggctga 17

656

17

DNA

Homo Sapiens

656
gggggaagat atggagt 17

657

17

DNA

Homo Sapiens

657
gggggaaggt atggagt 17

658

17

DNA

Homo Sapiens

658
catggcctcg tgggttt 17

659

17

DNA

Homo Sapiens

659
catggccttg tgggttt 17

660

17

DNA

Homo Sapiens

660
gggkagggag accagct 17

661

17

DNA

Homo Sapiens

661
gggkaggggg accagct 17

662

17

DNA

Homo Sapiens

662
gcagtgtcag tgtgggt 17

663

17

DNA

Homo Sapiens

663
gcagtgtctg tgtgggt 17

664

17

DNA

Homo Sapiens

664
acaccagcac tttgatc 17

665

17

DNA

Homo Sapiens

665
acaccagcgc tttgatc 17

666

17

DNA

Homo Sapiens

666
ccttctgcaa ccacacc 17

667

17

DNA

Homo Sapiens

667
ccttctgcga ccacacc 17

668

17

DNA

Homo Sapiens

668
aaattcgcag gagccga 17

669

17

DNA

Homo Sapiens

669
aaattcgcgg gagccga 17

670

17

DNA

Homo Sapiens

670
aggtctagac gctcacc 17

671

17

DNA

Homo Sapiens

671
aggtctaggc gctcacc 17

672

17

DNA

Homo Sapiens

672
ggaggaacac ttcaaac 17

673

17

DNA

Homo Sapiens

673
ggaggaacgc ttcaaac 17

674

17

DNA

Homo Sapiens

674
tttgtgctat accttga 17

675

17

DNA

Homo Sapiens

675
tttgtgctgt accttga 17

676

17

DNA

Homo Sapiens

676
atgatgcaca caccctg 17

677

17

DNA

Homo Sapiens

677
atgatgcata caccctg 17

678

17

DNA

Homo Sapiens

678
tattgctccg cctcctc 17

679

17

DNA

Homo Sapiens

679
tattgctctg cctcctc 17

680

17

DNA

Homo Sapiens

680
ctcagagact gtgtgcc 17

681

17

DNA

Homo Sapiens

681
ctcagagagt gtgtgcc 17

682

17

DNA

Homo Sapiens

682
atcttctgcg tcactca 17

683

17

DNA

Homo Sapiens

683
atcttctgtg tcactca 17

684

17

DNA

Homo Sapiens

684
cagcatctag taaccac 17

685

17

DNA

Homo Sapiens

685
cagcatctgg taaccac 17

686

17

DNA

Homo Sapiens

686
attagtgcca aatacat 17

687

17

DNA

Homo Sapiens

687
attagtgcta aatacat 17

688

17

DNA

Homo Sapiens

688
tgctccacag cagccgt 17

689

17

DNA

Homo Sapiens

689
tgctccactg cagccgt 17

690

17

DNA

Homo Sapiens

690
taggggagaa tctgttt 17

691

17

DNA

Homo Sapiens

691
taggggagca tctgttt 17

Number	Name	Date	Kind
4588682	Groet et al.	May 1986	A
4829098	Hoffman et al.	May 1989	A
4946980	Halm et al.	Aug 1990	A
4963663	White et al.	Oct 1990	A
5032502	Stodolsky	Jul 1991	A
5034428	Hoffman et al.	Jul 1991	A
5043272	Hartley	Aug 1991	A
5104792	Silver et al.	Apr 1992	A
5106727	Hartely et al.	Apr 1992	A
5126239	Livak et al.	Jun 1992	A
5220004	Saiki et al.	Jun 1993	A
5445934	Fodor et al.	Aug 1995	A
5468613	Erlich et al.	Nov 1995	A
5487985	McClelland et al.	Jan 1996	A
5510084	Cros et al.	Apr 1996	A
5518900	Nikiforov et al.	May 1996	A
5545527	Stevens et al.	Aug 1996	A
5565322	Heller	Oct 1996	A
5576180	Melanon et al.	Nov 1996	A
5578443	Santamaria et al.	Nov 1996	A
5578458	Caskey et al.	Nov 1996	A
5582989	Caskey et al.	Dec 1996	A
5589330	Shuber	Dec 1996	A
5597694	Munroe et al.	Jan 1997	A
5599674	Pena et al.	Feb 1997	A
5599921	Sorge et al.	Feb 1997	A
5604097	Brenner	Feb 1997	A
5604099	Erlich et al.	Feb 1997	A
5605662	Heller et al.	Feb 1997	A
5610287	Nikiforov et al.	Mar 1997	A
5612179	Simons	Mar 1997	A
5632957	Heller et al.	May 1997	A
5633134	Shuber	May 1997	A
5639611	Wallace et al.	Jun 1997	A
5663062	Sorge et al.	Sep 1997	A
5667972	Drmanac et al.	Sep 1997	A
5667976	Van Ness et al.	Sep 1997	A
5679524	Nikiforov et al.	Oct 1997	A
5683872	Rudert et al.	Nov 1997	A
5695933	Schalling et al.	Dec 1997	A
5702890	Housman	Dec 1997	A
5707806	Shuber	Jan 1998	A
5710000	Sapolsky et al.	Jan 1998	A
5721098	Pinkel et al.	Feb 1998	A
5728524	Sibson	Mar 1998	A
5728530	Rust et al.	Mar 1998	A
5731171	Bohlander	Mar 1998	A
5738993	Fugono et al.	Apr 1998	A
5741678	Ronai	Apr 1998	A
5744305	Fodor et al.	Apr 1998	A
5759821	Teasdale	Jun 1998	A
5760130	Johnston et al.	Jun 1998	A
5762876	Lincoln et al.	Jun 1998	A
5787032	Heller et al.	Jul 1998	A
5789168	Leushner et al.	Aug 1998	A
5795722	Lacroix et al.	Aug 1998	A
5811239	Frayne	Sep 1998	A
5814444	Rabinovitch	Sep 1998	A
5817007	Fodgaard et al.	Oct 1998	A
5834181	Shuber	Nov 1998	A
5834189	Stevens et al.	Nov 1998	A
5849483	Shuber	Dec 1998	A
5856104	Chee et al.	Jan 1999	A
5858659	Sapolsky et al.	Jan 1999	A
5861245	McClelland et al.	Jan 1999	A
5866337	Schon	Feb 1999	A
5869237	Ward et al.	Feb 1999	A
5885775	Haff et al.	Mar 1999	A
5888778	Shuber	Mar 1999	A
5908978	Amerson et al.	Jun 1999	A
5910576	Bertina et al.	Jun 1999	A
5919626	Shi et al.	Jul 1999	A
5942392	Amouyel et al.	Aug 1999	A
5945283	Kwok et al.	Aug 1999	A
5945675	Malins	Aug 1999	A
5946431	Fernandes	Aug 1999	A
5981176	Wallace	Nov 1999	A
5994056	Higuchi	Nov 1999	A
6013431	Söderlund et al.	Jan 2000	A
6015675	Caskey et al.	Jan 2000	A
6027889	Barany et al.	Feb 2000	A
6037124	Matson	Mar 2000	A
6048689	Murphy et al.	Apr 2000	A
6083763	Balch	Jul 2000	A
6100030	McCasky Feazel et al.	Aug 2000	A
6383742	Drmanac et al.	May 2002	B1

Number	Date	Country
0 534 858	Mar 1993	EP
0950720	Oct 1999	EP
WO9512607	May 1995	WO
WO9617082	Jun 1996	WO
WO9617957	Jun 1996	WO
WO9638591	Dec 1996	WO
WO9712030	Apr 1997	WO
WO9729212	Aug 1997	WO
WO9731327	Aug 1997	WO
WO9739151	Oct 1997	WO
WO9743450	Nov 1997	WO
WO9812354	Mar 1998	WO
WO9818967	May 1998	WO
WO9820165	May 1998	WO
WO9824796	Jun 1998	WO
WO9830883	Jul 1998	WO
WO9831836	Jul 1998	WO
WO 9901576	Jan 1999	WO
WO 0024939	May 2000	WO
1001037	May 2000	WO

Methods and products related to genotyping and DNA analysis

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

RELATED APPLICATIONS

GOVERNMENT SUPPORT

US Referenced Citations (86)

Foreign Referenced Citations (20)

Non-Patent Literature Citations (37)

Provisional Applications (1)

Entry
Himmelbauer et al., Mammalian Genome, vol. 9, pp. 611-616, 1998.*
Telenius et al., Genomics, vol. 13, pp. 718-725, 1992.*
Beltinger, C.P. et al., “Whole Genome Amplification of Single Cells From Clinical Peripheral Blood Smears,” J. Clin. Pathol: Mol. Pathol. 50:272 (1997).
Cheung, V.G. et al., “Whole Genome Amplification Using a Degenerate Oligonucleotide Primer Allows Hundreds of Genotypes to be Genotypes to be Performed on Less Than One Nanagram of Genomic DNA,” Proc. Natl. Acad. Sci. USA 93:14676 (1996).
Paunio, T. et al., “Preimplantation Diagnosis by Whole-Genome Amplification, PCR Amplification, and Solid-Phase Minisequencing of Blastomere DNA,” Clin. Chem. 42(9):1382 (1996).
Snabes, M.C. et al., “Preimplantation Single-Cell Analysis of Multiple Genetic Loci by Whole-Genome Amplication,” Proc. Natl. Acad. Sci. USA 91:6181 (1994).
Center for Medical Genetics: Marshfield Medical Research Foundation, “Genotyping Statistics”, (1998).
Cheung, et al., “Whole genome amplification using a degenerate oligonucleotide primer allows hundreds of genotypes to be performed on less than one nanogram of genomic DNA”, Proc. Natl. Acad. Sci. USA, vol. 93, pp. 14676-14679 (1996).
Delahunty, et al. “Testing the feasibility of DNA typing human identification by PCR and an oligonucleotide ligation assay”, Am. J. Hum, Genet., 58, pp. 1239-1246 (1996).
Elango, et al., “Generation and mapping of Mus spretus strain-specific markers for rapid genomic scanning” Mammalian Genome 7, pp. 340-343 (1996).
Gilles, et al., “Single nucleotide polymorphic discrimination by an electronic dot blot assay on semiconductor microchips” Nature Biotechnology vol. 17, pp. 365-370 (1999).
Howell, et al., “Dynamic allele-specific hybridization: A new method for scoring single nucleotide polymorphisms”, Nature Biotechnology, vol. 17, pp. 87-88 (1999).
Ledbetter, et al., “Rapid isolation of DNA probes within specific chromosome regions by interspersed repetitive sequence polymerase chain reaction” Genomics 6, pp. 475-481 (1990).
Hunter, et al., “Toward the construction of integrated physical and genetic maps of the mouse genome using interspersed repetitive sequence PCR (IRS-PCR) genomics”, Genome Research, 6, pp. 290-299 (1996 ).
McCarthy, et al., “Efficient high-resolution genetic mapping of mouse interspersed repetitive sequence PCR products, toward integrated genetic and physical mapping of the mouse genome”, Proc. Natl. Acad. Sci. USA, vol. 92, pp. 5302-5306 (1995).
Risch, et al., “The future of genetic studies of complex human diseases”, Science, vol. 273, pp. 1516-1517 (1996).
Sinnett, et al., “Alumorphs-Human DNA polymorphisms detected by polymerase chain reaction using alu-specific primers”, Genomics 7, pp. 331-334 (1990).
Telenius, et al., “Degenerate oligonucleotide-primed PCR: General amplification of target DNA by a single degenerate primer”, Genomics 13, pp. 718-725 (1992).
Vos, et al., “AFLP: a new technique for DNA fingerprinting”, Nucleic Acids Research, vol. 23, No. 21 pp. 4407-4414 (1995).
Wang, et al., Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome, Science, 280:1077-1082 (1998).
Welsh, et al., “Fingerprinting genomes using PCR with arbitrary primers”, Nucleic Acids Research, vol. 18, No. 24, pp. 7213-7218 (1990).
Zietkiewicz, et al., “Linkage mapping by simultaneous screening of multiple polymorphic loci using Alu oligonucleotide-directed PCR”, Proc. Natl. Acad. Sci. USA vol. 89, pp. 8448-8451 (1992).
Winzler, et al., “Direct allelic variatio scanning of the yeast genom”, Science vol. 281, pp. 1194-1197 (1998).
Armstrong et al., “Suspension Arrays for High Throughput, Multiplexed Single Nucleotide Polymorphism Genotyping”, Cytometry 40:102-108 (2000).
Cronin et al., “Applying rapid DNA microarray optimization capability to SNP screening and genotyping”, American Journal of Human Genetics, 65(4):pA224, Oct. 1999, No. 1238.
Griffin et al., “Direct genetic analysis by matrix-assisted laser desorption/ionization mass spectrometry”, Proc. Natl. Acad. Sci. USA, vol. 96, pp. 6301-6306, May 1999, Genetics.
Holloway et al., “Comparison of Three Methods for Single Nucleotide Polymorphism Typing for DNA Bank Studies: Sequence-Specific Oligonucleotide Probe Hybridisation, TagMan Liquid Phase Hybridisation, and Microplate Array Diagonal Gel Electrophoresis (MADGE)”, Human Mutation, 14:340-347 (1999).
Iannone et al., “Multiplexed Single Nucleotide Polymorphism Genotyping by Oligonucleotide Ligation and Flow Cytometry”, Cytometry, 39:131-140, (2000).
Ruano et al., “Haplotype of multiple polymorphism resolved by enzymatic amplification of single DNA molecules”, Proc. Natl. Acad. Sci. USA, vol. 87, pp. 6296-6300, Aug. 1990, Genetics.
Sauer et al., “A novel procedure for efficient genotyping of single nucleotide polymorphisms”, Nucleic Acids Research, 2000, vol. 28, No. 5, E13-e13, Oxford University Press.
Toh et al., “Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse”, Nature Genetics, vol. 24, pp. 381-386, Apr. 2000.
Broude, et al., “Differential Display of Genome Subsets Containing Specific Interspersed Repeats” PNAS, 94: 4548-4553, (Apr. 1997).
Cheng et al., “Degenerate Oligonucleotide Primer-Polymerase Chain Reaction And Capillary Electrophoretic Analysis of Human DNA on Microchip-Based Devices”, Anal. Biochem., 257:101-106 (Mar. 1998).
Himmelbauer, et al., “Complex Probes for High-Throughput Parallel Genetic Mapping of Genomic Mouse Bac Clones”, Mammalian Genome, 9:611-616 (Aug. 1998).
Kruglyak, L, “The Use of A Genetic Map of Biallelic Markers In Linkage Studies” Nature Genetics, 17(1):22-24 (Sep. 1, 1997).
Xiong M., et al., “Biallelic Markers In Genetics Studies of Human Diseases . . . ”, American Journal of Human Genetics, 61(4):1759, 1999.
Wang, D. et al., “Large-Scale Identification, Mapping, and Genotyping . . . ”, Science, US, Am. Assoc. For the Advancement of Science, 280 (280):1077-1082 (May 1998).